Spark-Scala, RDD, counting the elements of an array by applying conditions

4Ben4 · May 18, 2022, 8:20am

var rows = sc.textFile("Data/info.csv")
val dataX = rows.map(_.split(";"))
val data = dataX.map(array => array(2).toInt)
    
val low = data.count(_ < 100)
val medium = data.count(x => x >= 101 && x <= 200) 
val high = data.count(_ > 200)

print(low, medium, high)

the variable data contains the array - Array(20, 102, 50, 80, 140, 2036, 568), the elements of the array are of type int.
I got the code having the conditions and count from my discussion on discord,

My expected output is: (3,2,2) which is just the count of ranged numbers.

When I run the line val low = data.count(_ < 100)

I get an error:

 scala> val low = array.count(_ < 100)
<console>:23: error: ambiguous reference to overloaded definition,
both method array in object functions of type (colName: String, colNames: String*)org.apache.spark.sql.Column
and  method array in object functions of type (cols: org.apache.spark.sql.Column*)org.apache.spark.sql.Column
match expected type ?
       val low = array.count(_ < 100)
                 ^

SethTisue · May 18, 2022, 10:10am

Let’s continue on Spark-Scala RDD, group by count from array of array rather than using a new topic.