Hello, I have a question regarding dataframe and functions in zeppelin (when writing scala over spark, hope I use the terms correctly). Let’s say that I have a dataframe that I want to repeat the following process:
- Select pair of columns
- Do some calculations using them (union, join, etc…).
- Return some statistics
Obviously, the logical thing to do is to write a function that and pass it the names of the columns
as strings (or the columns itself but I assume that the former should be easier). However, as I’m new to scala (and zeppelin) I wasn’t sure if it is possible to do it. I saw that Zeppelin has option for user defined functions, however if I write something like (assuming the dataframe name is DF)
def myfunc ( string col1name, string col2name)
val MyDf = DF.select(col1name,col2name)
//rest of the code…
This may work only if this calculation is done on running time I assume, otherwise it will
give sone error as col1name and col2name are not known when I define the function. I’m sure
there is a way to make such calcultions (maybe symbolic function or something?) but I will appreciate your help and any recommendation on how to do it correctly. Thanks!