Spark Dataset: Reduce, Agg, Group or GroupByKey for a Dataset<Tuple2> Java -
i have dataset <tuple2<string, double>> follows:
<a,1> <b,2> <c,2> <a,2> <b,3> <b,4> and need reduce string sum values using spark java api final result should below:
<a,3> <b,9> <c,2> shall use reduce, agg, group or groupbykey? , how?
consider have dataset
dataset<tuple2<string, double>> ds = ..; then can call groupby function , sum below
ds.groupby(col("_1")).sum("_2").show(); or can convert dataset<row> , call groupby function
dataset<row> ds1 = ds.todf("key","value"); ds1.groupby(col("key")).sum("value").show();
Comments
Post a Comment