hadoop - pig script: count returns 0 on null field -


I have a dip script that loads a file with "company" json. When I count, then count 0 if the file is missing from the domain (or tap) how can I group it as an empty string and still count on it?

File example:

  {"company": {"domain": "test1.com", "name": "test1 company"}} {"company {{"Domain": "test1.com", "name": "test1 company"}} {"company": {"domain": "test1 {" Company "} {{" company "}: {" domain {"Company": {"domain": "Test2.com", "name": "test3 company"}} {"company": {"company"}: "test2.com" "Domain": "test3.com", "name": "test3 company"}} {"company": {"domain" "Test3.com", "name": "test3 company"}} {"company": {"Name": "test4 company"}} {"company": {"name": "test4 company"}}}  

Expected Result:

 < Code> "test1.com", "test1 company", 2 "test1.com", "test2 company", 1 "test2 com", "test2 company", 1 "test2.com", "test3 company", 1 "Test3.com", "test3 company", 2 "", "test4 company", 2  

Actual results:

  "test1.com" , "Test1 vibration" , "Test1.com", "test2 company", 1 "test2.com", "test2 company", 1 "test2.com", "test3 company", 1 "test3.com", "test3 company" , 2, "test4 company", 0  

current pig script:

  data = LOAD'myfile 'USisingorg.apache.pig.piggybank.storage. JsonLoader ('Company: (Domain: chararray, name: chararray)'); Filtered = by filtered data (company is not empty); Generate incidents (= domain) as the filter filter (company); Grouped = incidents of GROUP BY (domain, name); Calculation = form the FOREACH group as a group; count as COUNT (event); Order = Order number by DEEC;  

Thanks for the help!

Try COUNT instead of COUNT_STAR,

calculation = FOREACH group Generated group form, counted as COUNT_STAR (event);


Comments

Popular posts from this blog

java - Can't add JTree to JPanel of a JInternalFrame -

javascript - data.match(var) not working it seems -

javascript - How can I pause a jQuery .each() loop, while waiting for user input? -