How to pass a variable as a expression for selectExpr

Hello I am new to Scala and not sure if this question should be in this group.

I am using the following code to create a val/var to hold the expression for the selectExpr

For example the val/var will pass this to the selectExpr “NPI”, “stack(2,‘License Number 1’, LicenseNumber1, ‘License Number 2’, LicenseNumber2) as (code,number)”

If I hard code the expr that the val/var provides its fine. However when I use the val/var it is not fine.

When I run the code below I get the following error. I shortened the entire sql it was very long.

Exception in thread “main” org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ‘,’

== SQL ==
“NPI”, s"stack(15,
-----^^^

Here is the code.

def getAllLicense(npiDF:DataFrame): DataFrame =
    {
        var srchExp = ""
        for (i <- 1 to 15 )
        {
            srchExp = srchExp + s"'License Number $i',`Provider License 
                      Number_$i`,'Provider License Number State Code_$i ',`Provider 
                      License Number State Code_$i`,"
            if (i == 15) {
                val srchLen = srchExp.length
                srchExp = srchExp.substring(0,srchLen-1)

            }
        }
        val v_npi = "\""+ "NPI" + "\"" + ", "+ "s\"" + "stack(15, " + srchExp + " ) as 
                    (License,LicNumber,Code,State)" + "\""
        println(v_npi)
        val retDF = npiDF.selectExpr(v_npi)
        .where("LicNumber is not null")
        retDF
     }

From a quick search for “dataframe selectexpr scala”, it seems that selectExpr needs to be called like this:

val retDF = npiDF.selectExpr("NPI", s"stack(15, $srchExp) as (License,LicNumber,Code,State)")

or, if you want to use a variable:

val v_npi = Seq("NPI", s"stack(15, $srchExp) as (License,LicNumber,Code,State)")

val retDF = npiDF.selectExpr(v_npi: _*)

Notice that the method accepts multiple strings as parameters, you cannot join the arguments into a single string separated by commas.

For the syntax, see here: Scala varargs syntax (and examples) | alvinalexander.com

The selectExpr documentation I found when searching is: Spark select() vs selectExpr() with Examples — SparkByExamples

Thanks, that worked like a charm. This is what I get for playing darts with a list of popular programming languages. I ended up hitting Scala. Yes, I know its sounds idiotic to learn a programming language via darts, but its been fun so far.

1 Like

Trust me that the Spark Dataframe API is not very representative of the fun that can be had in Scala. Quite the opposite.

3 Likes

Agreed. Spark in general is a very specialized use case of Scala that is pretty unique. (Like, I’ve been doing Scala full-time for the past nine years, and I’ve barely touched Spark.) It has a lot of important uses, but it’s not the way I’d recommend learning the language unless you’re specifically trying to do Big Data…

1 Like

My background is SQL(Oracle, Sql Server etc.) and I have been doing that for years. I have also been done reporting with different tools and also ETL with SSIS and some Informatica. I am thinking Data Engineering is my next progression, so the programming aspect of that was Scala, Java, Python and some others. I basically threw a dart and said where it lands thats what I will learn. I know thats not the most efficient way to pick a language, but I wanted to start somewhere.