hadoop - Spark: Increase the number of tasks/partitions -
the number of tasks in spark decided total number of rdd partitions @ beginning of stages. example, when spark application reading data hdfs, partition method hadoop rdd inherited
fileinputformat
in mapreduce, affected size of hdfs blocks, value ofmapred.min.split.size
, compression method, etc.
the tasks in screenshot took 7, 7, 4 seconds, , want make them balanced. also, stage split 3 tasks, there ways specify spark number of partitions/tasks?
the task dependents on partition. can set partitioner rdd, in partitioner can set number of partitions.
Comments
Post a Comment