hadoop - Spark: Increase the number of tasks/partitions -
the number of tasks in spark decided total number of rdd partitions @ beginning of stages. example, when spark application reading data hdfs, partition method hadoop rdd inherited
fileinputformatin mapreduce, affected size of hdfs blocks, value ofmapred.min.split.size, compression method, etc.

the tasks in screenshot took 7, 7, 4 seconds, , want make them balanced. also, stage split 3 tasks, there ways specify spark number of partitions/tasks?
the task dependents on partition. can set partitioner rdd, in partitioner can set number of partitions.
Comments
Post a Comment