How to implement group with MR

pengyh · January 20, 2022, 9:10am

Hello

Sorry I am typing on mobile.
Simple to say I can implement reduceByKey and countByGroup etc by using map, reduce and group functions. My job:

But, how can I implement the group itself with M and R? Thanks.

BalmungSan · January 20, 2022, 1:06pm

groupBy can be implemented using a foldLeft

def groupBy[A, K](data: List[A])(key: A => K): Map[K, List[A]] =
  data.foldLeft(Map.empty[K, List[A]]) {
    case (acc, a) =>
      acc.updatedWith(key = a) {
        case Some(group) => Some(a :: group)
        case None => Some(a :: Nil)
      }      
  }

(requires Scala 2.13)

pengyh · January 21, 2022, 2:05am

def groupBy[A, K](data: List[A])(key: A => K): Map[K, List[A]] =

what does this mean for the definition of function?
I know this kind of definition:

def xxx(str: String)(f: Int => Int)

but:

def xxx(str:String)(f: Int => Int): Map(…)

How does the Map appear here? I am confused.

Thanks for your explaining.

Regards.

SethTisue · January 21, 2022, 5:12am

Map[K, List[A]] is the return type of the method.

pengyh · January 21, 2022, 7:23am

Thank you Seth.

In scala I declared a func such as:

scala> def strOps(s:String)(f:String=>String) = {
     |       if(s==null) s else f(s)
     |  }
def strOps(s: String)(f: String => String): String

scala> strOps("hello"){x => x.reverse}
val res0: String = olleh

I didn’t define the returned type. I am guessing the returned type is decided automatically by the language?

So when we will, and when we won’t declare the returned type when defining a func?

Thanks again.

ndas1971 · January 21, 2022, 7:41am

Mostly no, Scala can infer return type. But for recursion and for explicit “return …” inside function , explicit return type is required

siddhartha-gadgil · January 21, 2022, 7:42am

The compiler does try to determine the return type. A few situations where one may want/need to specify the return type:

Just for safety/documentation: to ensure that you are returning a function of the correct type, and to let a reader of the code know the return type (in this case tooling can fill in types automatically).
A recursive (or mutually recursive) definition - here type inference is generally impossible, and it is necessary to fill in the type.
If you want an implicit conversion to be applied to the value you return to get the type you want, e.g. def n: Double = 2.
Sometimes it is clearer to specify the type of the result and have other types inferred, e.g. def f: Int => Int = n => 2 * n may be more readable than def f = (n : Int) => 2 * n, and def f = n => 2 * n is not a valid definition as the type of n is not clear.

Siddhartha

BalmungSan · January 21, 2022, 1:08pm

Personal opinion, always.

It helps as a documentation of your code.
It also helps the compiler, you will already be using inference most of the time inside your function definition, adding a couple of type hints for the inputs and output will make its work easier. Which means faster compile times.
It also ensures safety, that way if when you change / refactor a function you made a mistake and your function not longer returns what it used to do, you get a localized and clear error message instead of multiple ones on the usage sites.