The file content:
$ cat latest |head -5
[email protected],20
[email protected],4
[email protected],1
[email protected],2
[email protected],1
I want to get the domain counts from this file.
So I wrote this:
val li = Source.fromFile(file).getLines().toList
li.map( _.split(",")(0).split("@")(1) ).groupBy(x=>x).map{ case(x,y) => (x,y.size) }.toList.sortBy(-_._2)
It does work. The outputs:
val res17: List[(String, Int)] = List((gmail.com,5076), (redhat.com,172), (apache.org,166), (163.com,114), (hotmail.com,92), (gnu.org,88), (googlegroups.com,78), (freebsd.org,77), (qq.com,68), (google.com,62), (yahoo.com,61), (outlook.com,61), (intel.com,56),...
My questions are:
- Is there any better statement for this purpose?
- here groupBy(x=>x) works, if I replace it as groupBy(_), why doesn’t work?
- Is there a native reduceByKey function in scala (as in spark)?
Thanks in advance.