Tag Archives: mapreduce

STORE output to a single CSV?

To set the number of reducers for all Pig opeations, you can use the default_parallelproperty – but this means every single step will use a single reducer, decreasing throughput:set default_parallel 1; Prior to calling STORE, if one of the... Read More | Share it now!

MapReduce:详解Shuffle过程

http://langyu.iteye.com/blog/992916 Shuffle过程是MapReduce的核心，也被称为奇迹发生的地方。要想理解MapReduce，... Read More | Share it now!

mapreduce的二次排序 SecondarySort

关于二次排序主要涉及到这么几个东西：在0.20.0 以前使用的是 setPartitionerClass setOutputkeyComparatorClass setOutputValueGroupingComparator 在0.20.0以后使用是 job.setPartitionerClass(Partitioner... Read More | Share it now!

MapReduce jobs get stuck in Accepted state never Run

The settings I have (on my very small experimental boxes) in my yarn-site.xml: <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2200</value> <description>Amount of physical memory, in MB,... Read More | Share it now!

Hive：简单查询不启用Mapreduce job而启用Fetch task

如果你想查询某个表的某一列，Hive默认是会启用MapReduce Job来完成这个任务，如下： 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 hive> SELECT id,... Read More | Share it now!