How to pass a variable as a expression for selectExpr

Hello I am new to Scala and not sure if this question should be in this group.

I am using the following code to create a val/var to hold the expression for the selectExpr

For example the val/var will pass this to the selectExpr “NPI”, “stack(2,‘License Number 1’, LicenseNumber1, ‘License Number 2’, LicenseNumber2) as (code,number)”

If I hard code the expr that the val/var provides its fine. However when I use the val/var it is not fine.

When I run the code below I get the following error. I shortened the entire sql it was very long.

Exception in thread “main” org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ‘,’

== SQL ==
“NPI”, s"stack(15,

Here is the code.

def getAllLicense(npiDF:DataFrame): DataFrame =
        var srchExp = ""
        for (i <- 1 to 15 )
            srchExp = srchExp + s"'License Number $i',`Provider License 
                      Number_$i`,'Provider License Number State Code_$i ',`Provider 
                      License Number State Code_$i`,"
            if (i == 15) {
                val srchLen = srchExp.length
                srchExp = srchExp.substring(0,srchLen-1)

        val v_npi = "\""+ "NPI" + "\"" + ", "+ "s\"" + "stack(15, " + srchExp + " ) as 
                    (License,LicNumber,Code,State)" + "\""
        val retDF = npiDF.selectExpr(v_npi)
        .where("LicNumber is not null")

From a quick search for “dataframe selectexpr scala”, it seems that selectExpr needs to be called like this:

val retDF = npiDF.selectExpr("NPI", s"stack(15, $srchExp) as (License,LicNumber,Code,State)")

or, if you want to use a variable:

val v_npi = Seq("NPI", s"stack(15, $srchExp) as (License,LicNumber,Code,State)")

val retDF = npiDF.selectExpr(v_npi: _*)

Notice that the method accepts multiple strings as parameters, you cannot join the arguments into a single string separated by commas.

For the syntax, see here: Scala varargs syntax (and examples) |

The selectExpr documentation I found when searching is: Spark select() vs selectExpr() with Examples — SparkByExamples

Thanks, that worked like a charm. This is what I get for playing darts with a list of popular programming languages. I ended up hitting Scala. Yes, I know its sounds idiotic to learn a programming language via darts, but its been fun so far.

Trust me that the Spark Dataframe API is not very representative of the fun that can be had in Scala. Quite the opposite.


Agreed. Spark in general is a very specialized use case of Scala that is pretty unique. (Like, I’ve been doing Scala full-time for the past nine years, and I’ve barely touched Spark.) It has a lot of important uses, but it’s not the way I’d recommend learning the language unless you’re specifically trying to do Big Data…

My background is SQL(Oracle, Sql Server etc.) and I have been doing that for years. I have also been done reporting with different tools and also ETL with SSIS and some Informatica. I am thinking Data Engineering is my next progression, so the programming aspect of that was Scala, Java, Python and some others. I basically threw a dart and said where it lands thats what I will learn. I know thats not the most efficient way to pick a language, but I wanted to start somewhere.