hadoop - pig script: count returns 0 on null field -
I have a dip script that loads a file with "company" json. When I count, then count 0 if the file is missing from the domain (or tap) how can I group it as an empty string and still count on it?
File example:
{"company": {"domain": "test1.com", "name": "test1 company"}} {"company {{"Domain": "test1.com", "name": "test1 company"}} {"company": {"domain": "test1 {" Company "} {{" company "}: {" domain {"Company": {"domain": "Test2.com", "name": "test3 company"}} {"company": {"company"}: "test2.com" "Domain": "test3.com", "name": "test3 company"}} {"company": {"domain" "Test3.com", "name": "test3 company"}} {"company": {"Name": "test4 company"}} {"company": {"name": "test4 company"}}}
Expected Result:
< Code> "test1.com", "test1 company", 2 "test1.com", "test2 company", 1 "test2 com", "test2 company", 1 "test2.com", "test3 company", 1 "Test3.com", "test3 company", 2 "", "test4 company", 2
Actual results:
"test1.com" , "Test1 vibration" , "Test1.com", "test2 company", 1 "test2.com", "test2 company", 1 "test2.com", "test3 company", 1 "test3.com", "test3 company" , 2, "test4 company", 0
current pig script:
data = LOAD'myfile 'USisingorg.apache.pig.piggybank.storage. JsonLoader ('Company: (Domain: chararray, name: chararray)'); Filtered = by filtered data (company is not empty); Generate incidents (= domain) as the filter filter (company); Grouped = incidents of GROUP BY (domain, name); Calculation = form the FOREACH group as a group; count as COUNT (event); Order = Order number by DEEC;
Thanks for the help!
Try COUNT instead of COUNT_STAR,
calculation = FOREACH group Generated group form, counted as COUNT_STAR (event);
Comments
Post a Comment