Hi everyone,
I am trying to compile a scala script to submit a job through spark-submit. I am using sbt from Windows command line to compile. Directory structure is as defined by sbt.
Here is my build file:
build.sbt
name := "TestQuery"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= {
val sparkVer = "2.1.0"
Seq("org.apache.spark" %% "spark-core" % sparkVer % "provided" withSources(),
"org.apache.spark" %% "spark-hive" % sparkVer % "provided" )
}
My TestQuery.scala file is under ./test/src/main/scala/TestQuery.scala
From Windows cmd, I switch directory to ./test and run sbt. When I run compile
command, sbt gives the following error:
[error]./test/src/main/scala/TestQuery.scala:2:29:
object hive is not a member of package org.apache.spark.sql
sbt uses maven2 repository and spark-hive exists under:
https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.11/1.2.0/
Also, this import command works in spark-shell. (spark-shell runs Spark 2.1.1. and scala 2.11.8).
Why can’t it find it?
That’s normal.
you’re adding a “provided” tag, which means that you’re telling sbt that it shouldn’t download this dependency and that it should expect it in its classPath. Removing the “provided” tag should solve your issue.
Provided dependencies are on the compile classpath.
Hi, Spark dependencies always must be “provided”. Because the dependencies will be on the spark cluster when you run your job.
You tests should perfectly work with “provided” libraries, look the example:
ThisBuild / name := "almaren-framework"
ThisBuild / organization := "com.github.music-of-the-ainur"
lazy val scala212 = "2.12.10"
lazy val scala211 = "2.11.12"
crossScalaVersions := Seq(scala211, scala212)
ThisBuild / scalaVersion := scala212
val sparkVersion = "2.4.0"
libraryDependencies ++= Seq(
"com.typesafe.scala-logging" %% "scala-logging" % "3.9.2",
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % "provided" excludeAll(ExclusionRule(organization = "net.jpountz.lz4")),
"org.apache.spark" %% "spark-avro" % sparkVersion,
"com.databricks" %% "spark-xml" % "0.6.0",
"com.github.music-of-the-ainur" %% "quenya-dsl" % s"1.0.2-$sparkVersion",
This file has been truncated. show original
I think you are missing “spark-sql” dependency in your build.