Hi all, I hope this is correct place to ask.
I’m very absolutely new to scala. I use existing code written by another developer to fetch data from postgres database into dataframe.
%scala
import org.postgresql.Driver
import org.apache.spark.sql.DataFrame
val url1= [secret credentials]
val driver = "org.postgresql.Driver"
var sql_query = """ Select 1 as t1, 2 as t2 """
val df = spark.read
.format("jdbc")
.option("url", url1)
.option("query", sql_query)
.load()
df.createOrReplaceTempView("df")
It works fine as is, now I need to create user-defined function where input parameter would be sql query text (String) and output would be dataframe. I know how to do this in Python with pandas, but pandas doesn’t support processing with multiple instances, so I need to do this in scala instead.
How do I start?
Maybe someone could generously show how to implement function mentioned above or at least help in this direction?..
Please advise.
upd: found answer
%scala
import org.postgresql.Driver
import org.apache.spark.sql.DataFrame
def fn_monolith_to_scala_df(query: String) = {
val url1= dbutils.secrets.get("monolith", "dw_user_pass_scala_url")
val driver = "org.postgresql.Driver"
spark.read
.format("jdbc")
.option("url", url1)
.option("query", query)
.load()
}