Hello community
For this reduceByKey case in pyspark:
>>> rdd.collect() [('a', 3), ('b', 1), ('c', 2), ('a', 9)] >>> >>> rdd.reduceByKey(lambda x,y: x+y).collect() [('b', 1), ('c', 2), ('a', 12)]
How can I write it by scala’s groupMapReduce?
Thanks
From def,
List[A].groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): Map[K, B]
Here A = (Char, Int) K= Char, B=Int so key=Given Tuple, how do we get first element f: Given Tuple, how do we get value reduce= this is our original reduceByKey fn
List(('a', 3), ('b', 1), ('c', 2), ('a', 9)).groupMapReduce( _._1)(_._2)(_ + _).toList