STORE output to a single CSV?

  • To set the number of reducers for all Pig opeations, you can use the default_parallelproperty – but this means every single step will use a single reducer, decreasing throughput:set default_parallel 1;
  • Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the PARALLEL 1keyword to denote the use of a single reducer to complete that command:GROUP a BY grp PARALLEL 1;

LEAVE A COMMENT