Spark Dataset: Reduce, Agg, Group or GroupByKey for a Dataset<Tuple2> Java -
i have dataset <tuple2<string, double>>
follows:
<a,1> <b,2> <c,2> <a,2> <b,3> <b,4>
and need reduce string sum values using spark java api final result should below:
<a,3> <b,9> <c,2>
shall use reduce, agg, group or groupbykey? , how?
consider have dataset
dataset<tuple2<string, double>> ds = ..;
then can call groupby
function , sum
below
ds.groupby(col("_1")).sum("_2").show();
or can convert dataset<row>
, call groupby
function
dataset<row> ds1 = ds.todf("key","value"); ds1.groupby(col("key")).sum("value").show();
Comments
Post a Comment