SBT cannot compile .scala file with spark-hive dependency

user3254054 · August 15, 2017, 12:22pm

Hi everyone,

I am trying to compile a scala script to submit a job through spark-submit. I am using sbt from Windows command line to compile. Directory structure is as defined by sbt.

Here is my build file:
build.sbt

name := "TestQuery"
version := "1.0"
scalaVersion := "2.11.8"

libraryDependencies ++= {
   val sparkVer = "2.1.0"
   Seq("org.apache.spark" %% "spark-core" % sparkVer % "provided"   withSources(),
   "org.apache.spark" %% "spark-hive" % sparkVer % "provided" )
}

My TestQuery.scala file is under ./test/src/main/scala/TestQuery.scala

From Windows cmd, I switch directory to ./test and run sbt. When I run compile command, sbt gives the following error:

[error]./test/src/main/scala/TestQuery.scala:2:29: 
object hive is not a member of package org.apache.spark.sql

sbt uses maven2 repository and spark-hive exists under:
https://repo1.maven.org/maven2/org/apache/spark/spark-hive_2.11/1.2.0/

Also, this import command works in spark-shell. (spark-shell runs Spark 2.1.1. and scala 2.11.8).

Why can’t it find it?

mahmoudmahdi24 · June 6, 2018, 11:09am

That’s normal.
you’re adding a “provided” tag, which means that you’re telling sbt that it shouldn’t download this dependency and that it should expect it in its classPath. Removing the “provided” tag should solve your issue.

Jasper-M · June 6, 2018, 12:47pm

Provided dependencies are on the compile classpath.

mantovani · August 23, 2020, 4:24pm

Hi, Spark dependencies always must be “provided”. Because the dependencies will be on the spark cluster when you run your job.

You tests should perfectly work with “provided” libraries, look the example:

github.com

music-of-the-ainur/almaren-framework/blob/master/build.sbt

ThisBuild / name := "almaren-framework"
ThisBuild / organization := "com.github.music-of-the-ainur"

lazy val scala212 = "2.12.10"
lazy val scala211 = "2.11.12"

crossScalaVersions := Seq(scala211, scala212)
ThisBuild / scalaVersion := scala212

val sparkVersion = "2.4.0"

libraryDependencies ++= Seq(
  "com.typesafe.scala-logging" %% "scala-logging" % "3.9.2",
  "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % "provided" excludeAll(ExclusionRule(organization = "net.jpountz.lz4")),
  "org.apache.spark" %% "spark-avro" % sparkVersion,
  "com.databricks" %% "spark-xml" % "0.6.0",
  "com.github.music-of-the-ainur" %% "quenya-dsl" % s"1.0.2-$sparkVersion",

This file has been truncated. show original

I think you are missing “spark-sql” dependency in your build.