Score:1

Pattern Matching using Grep

bd flag

I'm trying to find a specific value for my file heres a quick snippnet of it:

PRODUCT_TYPE_NAME,PRODUCT_CLASS_NAME,PRODUCT_SUB_CLASS_NAME,PRODUCT_MINOR_CLASS_NAME,PRODUCT_COUNTRY_ORIGIN_NAME,PRODUCT_SKU_NO,PRODUCT_LONG_NAME,PRODUCT_BASE_UPC_NO,PRODUCT_LITRES_PER_CONTAINER,PRD_CONTAINER_PER_SELL_UNIT,PRODUCT_ALCOHOL_PERCENT,CURRENT_DISPLAY_PRICE,SWEETNESS_CODE
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,CANADA,198267,COPPER MOON - MALBEC,48162013513,3,1,14,30.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE WHITE,CANADA,305375,DOMAINE D'OR - DRY,48162001886,4,1,11.5,32.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,CANADA,53017,SOMMET ROUGE,58976055050,4,1,12,29.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE WHITE,CANADA,215525,MISSION RIDGE - PREMIUM DRY WHITE,779646155251,4,1,11,33.99,1
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,UNITED STATES OF AMERICA,168971,ZINFANDEL - BIG HOUSE CARDINAL ZIN,81308001456,3,1,13.5,36.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,FRANCE,234559,LE VILLAGEOIS RED - CELLIERS LA SALLE,63657001448,4,1,11,34.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,CANADA,492314,SAWMILL CREEK - MERLOT,63657004074,16,1,12.5,119,0
LIQUOR,WINE,TABLE WINE,TABLE WINE WHITE,CANADA,587584,SOLA,63657006566,4,1,12,32.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE WHITE,CANADA,100925,GANTON & LARSEN PROSPECT - PINOT BLANC BIRCH CANOE 2011,776545400000,0.75,1,11.5,13.99,0
LIQUOR,SPIRITS,IRISH WHISKY,IRISH WHISKY,IRELAND,10157,JAMESON - IRISH,80432500170,0.75,1,40,34.99,NA
LIQUOR,WINE,TABLE WINE,TABLE WINE WHITE,ITALY,102764,PINOT GRIGIO DELLE VENEZIE - RUFFINO LUMINA,8001660197156,0.75,1,12.5,15.99,0
LIQUOR,SPIRITS,AMERICAN WHISKY,AMERICAN WHISKY,UNITED STATES OF AMERICA,103747,MAKER'S MARK - KENTUCKY BOURBON,85246139431,0.75,1,45,44.95,NA
LIQUOR,SPIRITS,GIN,DRY GIN,CANADA,1040,GORDONS - LONDON DRY,622153139040,0.75,1,40,24.49,NA
LIQUOR,WINE,TABLE WINE,TABLE WINE WHITE,CANADA,104679,CALONA - ARTIST SERIES RESERVE PINOT GRIS 2011/13,58976501656,0.75,1,13.5,12.99,0
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,UNITED STATES OF AMERICA,106476,PINOT NOIR - SIDURI RUSSIAN RIVER 11/12,626990184140,0.75,1,14.5,49.99,0
LIQUOR,SPIRITS,CACHACA,CACHACA,BRAZIL,107029,CACHACA 61,7896547500676,0.7,1,40,28.95,2
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,FRANCE,109082,CHATEAU PAVIE DECESSE 2008,,0.75,1,13,239,0
LIQUOR,SPIRITS,SCOTCH WHISKY,SCOTCH - BLEND,UNITED KINGDOM,1099,JOHNNIE WALKER - RED LABEL,622153631049,0.75,1,40,29.99,NA
LIQUOR,WINE,TABLE WINE,TABLE WINE RED,ITALY,110460,LE CONTRADE - CO.PRO.VI,8004753004010,1,1,12,9.9,0
LIQUOR,SPIRITS,RUM,DARK,CANADA,112433,BACARDI - BLACK,620213055408,0.75,1,40,23.75,NA
LIQUOR,WINE,APERITIF  DESSERT AND FORTIFIED WINE,MONTILLA,SPAIN,112789,ALVEAR - MEDIUM DRY,766238303374,0.75,1,17,17.99,3
LIQUOR,SPIRITS,SCOTCH WHISKY,SCOTCH - BLEND,UNITED KINGDOM,112896,JOHNNIE WALKER - RED LABEL,622153631070,1.75,1,40,68.99,NA

Now I need to use grep, I would prefer if the solution does not involve sed, perl, awk, or loops. I tried:

grep -E "^.*(,.*){9}[^0]+" BC_Liquor_Store_Product_Price_List.csv

But that obviously gets everything. I need to get all the rows which have a PRODUCT_LITRES_PER_CONTAINER >= 1 but I just cant quite figure out how. The .* gets everything but before the , there are words so I cant just do:

grep -E "^(,.*){9}[^0]+" BC_Liquor_Store_Product_Price_List.csv

that will only match lines starting with ,?

hr flag
*"I would prefer if the solution does not involve sed, perl, awk, or loops"* Why? Why make life harder by using the wrong tool for the job?
Yunfei Chen avatar
bd flag
@steeldriver I have a system that has certain restrictions and so I need it to be cross platform, and its embedded so I dont want to run into any issues later on... Also there are already lots of solutions with awk and perl online so this is just using grep but there are not solutions using grep...
Yunfei Chen avatar
bd flag
Surely such a thing is possible using grep??
terdon avatar
cn flag
`grep` is less portable than sed, awk or perl. Why would you want the least portable of available solutions?
gy flag
@steeldriver Most likely because it's the same assignment as https://unix.stackexchange.com/questions/653643/grep-and-cut-command-in-linux ?
Score:4
hr flag

See What is meant by “Now you have two problems”?

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

What you seem to be grasping for is

grep -E "^([^,]*,){8}[^0]" BC_Liquor_Store_Product_Price_List.csv

That is

  • anchored to the start of the line ^
  • match any number of non-comma characters followed by a comma, 8 times
  • then match a non-0 character at the start of the 9th field

However IMHO this is fragile and should not be used in any serious application. It's particularly hard to reliably match numerical values - see for example

especially the section "A Note about Matching Numbers (Hint: It's harder than you think)". Please consider instead using something like

awk -F, 'NR==1 || $9+0 > 1.0' BC_Liquor_Store_Product_Price_List.csv

or

perl -F, -lne 'print if $. == 1 || $F[8] >= 1.0' BC_Liquor_Store_Product_Price_List.csv

or (better, since it will handle complex CSV features such as quoting and embedded commas)

mlr --csv filter '$PRODUCT_LITRES_PER_CONTAINER >= 1.0' BC_Liquor_Store_Product_Price_List.csv
Yunfei Chen avatar
bd flag
What do you mean by fragile??
hr flag
Well, just off the top of my head - if the field is empty, `[^0]` will match the following `,` - the Awk and Miller versions would both (correctly) coerce the empty string to numeric 0 and hence exclude the result. There are likely other edge cases - hence the quote ;)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.