hadoop - Spark: Increase the number of tasks/partitions -


the number of tasks in spark decided total number of rdd partitions @ beginning of stages. example, when spark application reading data hdfs, partition method hadoop rdd inherited fileinputformat in mapreduce, affected size of hdfs blocks, value of mapred.min.split.size , compression method, etc.

the screenshot of tasks

the tasks in screenshot took 7, 7, 4 seconds, , want make them balanced. also, stage split 3 tasks, there ways specify spark number of partitions/tasks?

the task dependents on partition. can set partitioner rdd, in partitioner can set number of partitions.


Comments

Popular posts from this blog

python - Operations inside variables -

Generic Map Parameter java -

arrays - What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? -