groupMapReduce usage

pengyh · January 18, 2022, 7:59am

Hello community

For this reduceByKey case in pyspark:

>>> rdd.collect()
[('a', 3), ('b', 1), ('c', 2), ('a', 9)]
>>> 
>>> rdd.reduceByKey(lambda x,y: x+y).collect()
[('b', 1), ('c', 2), ('a', 12)]

How can I write it by scala’s groupMapReduce?

Thanks

ndas1971 · January 18, 2022, 9:37am

From def,

List[A].groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): Map[K, B]

Here A = (Char, Int)
K= Char, B=Int so key=Given Tuple, how do we get first element
f: Given Tuple, how do we get value
reduce= this is our original reduceByKey fn

List(('a', 3), ('b', 1), ('c', 2), ('a', 9)).groupMapReduce( _._1)(_._2)(_ + _).toList