Hi I am trying to count max value in given problem using an input string.
Problem description: Given two months x and y, where y > x, find the hashtag name that has increased the number of tweets the most from month x to month y. We have already written code in your code template that reads the x and y values from the keyboard. Ignore the tweets in the months between x and y, so just compare the number of tweets at month x and at month y. Report the hashtag name, the number of tweets in months x and y. Ignore any hashtag names that had no tweets in either month x or y. You can assume that the combination of hashtag and month is unique. Print the result to the terminal output using println. For the above small example data set the output should be the following:
Input x = 200910, y = 200912
Output hashtagName: mycoolwife, countX: 1, countY: 500
Data Fomrat:
Token type Month count Hash Tag Name
hashtag 200910 2 Babylove
hashtag 200911 2 babylove
hashtag 200912 90 babylove
My attempt:
// Load the input data and split each line into an array of strings
val twitterLines = sc.textFile("hdfs:///user/ashhall1616/bdc_data/twitter-small.tsv")
val twitterdata = twitterLines.map(_.split("\t"))
// Each month is a string formatted as YYYYMM
val x = scala.io.StdIn.readLine("x month: ")
val y = scala.io.StdIn.readLine("y month: ")
val matchmonth= twitterdata.map(r => (r(0)== x ,r(0)==y, r(2), r(3))).sortBy(_._3, false)
if(matchmonth.(r => (r(0))) < matchmonth.(r => (r(1)))
{
val ht1 = matchmonth.map(r => (r(2), r(3))).take(1)
val ht2 = matchmonth.map(r => (r(2), r(3))).take(1,2)
println("[" + ht1 + "," + ht2 + "]")
}
errors getting:
val matchmonth= twitterdata.map(r => (r(0)== x ,r(0)==y, r(2), r(3))).sortBy(_._3, false)
matchmonth: org.apache.spark.rdd.RDD[(Boolean, Boolean, String, String)] = MapPartitionsRDD[20] at sortBy at <console>:32
scala> if(matchmonth.(r => (r(0))) < matchmonth.(r => (r(1)))
<console>:1: error: identifier expected but '(' found.
if(matchmonth.(r => (r(0))) < matchmonth.(r => (r(1)))
^
<console>:1: error: identifier expected but '(' found.
if(matchmonth.(r => (r(0))) < matchmonth.(r => (r(1)))
^
scala> {
| val ht1 = matchmonth.map(r => (r(2), r(3))).take(1)
| val ht2 = matchmonth.map(r => (r(2), r(3))).take(1,2)
| println("[" + ht1 + "," + ht2 + "]")
| }
<console>:36: error: (Boolean, Boolean, String, String) does not take parameters
val ht1 = matchmonth.map(r => (r(2), r(3))).take(1)
^
<console>:36: error: (Boolean, Boolean, String, String) does not take parameters
val ht1 = matchmonth.map(r => (r(2), r(3))).take(1)
^
<console>:37: error: (Boolean, Boolean, String, String) does not take parameters
val ht2 = matchmonth.map(r => (r(2), r(3))).take(1,2)
^
<console>:37: error: (Boolean, Boolean, String, String) does not take parameters
val ht2 = matchmonth.map(r => (r(2), r(3))).take(1,2)
can someone have a look what is wrong here?