hadoop - Spark: Increase the number of tasks/partitions -


the number of tasks in spark decided total number of rdd partitions @ beginning of stages. example, when spark application reading data hdfs, partition method hadoop rdd inherited fileinputformat in mapreduce, affected size of hdfs blocks, value of mapred.min.split.size , compression method, etc.

the screenshot of tasks

the tasks in screenshot took 7, 7, 4 seconds, , want make them balanced. also, stage split 3 tasks, there ways specify spark number of partitions/tasks?

the task dependents on partition. can set partitioner rdd, in partitioner can set number of partitions.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -