I need the help again with sbt package :)

In the test env we are using the latest hadoop 3.3.1, and I got the build.sbt config from this mavenrepo.

My build.sbt:

name := "hadoop writer"

version := "1.0"

scalaVersion := "2.13.8"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "3.3.1" % Test

And the demo program:

$ cat src/main/scala/Myjob.scala 
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import java.net.URI
import scala.collection.immutable.Stream

object Myjob extends App {
// nothing here yet
}

Then I run “sbt package” I got the error:

[error] /home/xxx/ops/scala/job1/src/main/scala/Myjob.scala:1:12: object apache is not a member of package org
[error] import org.apache.hadoop.conf.Configuration
[error]            ^
[error] /home/xxx/ops/scala/job1/src/main/scala/Myjob.scala:2:12: object apache is not a member of package org
[error] import org.apache.hadoop.fs.{FileSystem, Path}
[error]            ^
[error] two errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 2 s, completed Feb 18, 2022, 4:48:28 PM

Need your kind helps again. Thanks in advance.

I now resolved the compiling issue by providing the below dependencies:

scalaVersion := "2.13.8"


libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.1"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.3.1"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "3.3.1" % Test

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs-client" % "3.3.1"

But, it got error during the running time:

java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at Myjob$.main(Myjob.scala:8)
	at Myjob.main(Myjob.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at scala.reflect.internal.util.RichClassLoader$.$anonfun$run$extension$1(ScalaClassLoader.scala:101)
	at scala.reflect.internal.util.RichClassLoader$.run$extension(ScalaClassLoader.scala:36)
	at scala.tools.nsc.CommonRunner.run(ObjectRunner.scala:30)
	at scala.tools.nsc.CommonRunner.run$(ObjectRunner.scala:28)
	at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
	at scala.tools.nsc.CommonRunner.runAndCatch(ObjectRunner.scala:37)
	at scala.tools.nsc.CommonRunner.runAndCatch$(ObjectRunner.scala:36)
	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:70)
	at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:91)
	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:103)
	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:108)
	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Can you help? Thanks.

update: I have added these environment variables:

export HADOOP_HOME=/opt/hadoop
export HADOOP_HDFS_HOME=/opt/hadoop/share/hadoop/hdfs
export HADOOP_MAPRED_HOME=/opt/hadoop/share/hadoop/mapreduce
export HADOOP_COMMON_HOME=/opt/hadoop/share/hadoop/common
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

But still got the above running error.
The compilation has no problem.

Any further idea?

I tried it with sbt 1.6.2 and Java OpenJDK 11 and it’s working for me. After making changes to build.sbt, you have to run “reload” and then “compile” again. Or maybe there is an issue with your Java version?

I have tried sbt reload still got no luck. please see the operations below:

$ sbt reload
[info] welcome to sbt 1.6.1 (Ubuntu Java 11.0.13)
[info] loading project definition from /home/pyh/ops/scala/job1/project
[info] loading settings for project job1 from build.sbt ...
[info] set current project to hadoop writer (in build file:/home/pyh/ops/scala/job1/)
[info] welcome to sbt 1.6.1 (Ubuntu Java 11.0.13)
[info] loading project definition from /home/pyh/ops/scala/job1/project
[info] loading settings for project job1 from build.sbt ...
[info] set current project to hadoop writer (in build file:/home/pyh/ops/scala/job1/)

$ sbt package
[info] welcome to sbt 1.6.1 (Ubuntu Java 11.0.13)
[info] loading project definition from /home/pyh/ops/scala/job1/project
[info] loading settings for project job1 from build.sbt ...
[info] set current project to hadoop writer (in build file:/home/pyh/ops/scala/job1/)
[success] Total time: 1 s, completed Feb 18, 2022, 6:02:07 PM

$ cd target/scala-2.13/classes/
$ scala Myjob
java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at Myjob$.main(Myjob.scala:8)
	at Myjob.main(Myjob.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at scala.reflect.internal.util.RichClassLoader$.$anonfun$run$extension$1(ScalaClassLoader.scala:101)
	at scala.reflect.internal.util.RichClassLoader$.run$extension(ScalaClassLoader.scala:36)
	at scala.tools.nsc.CommonRunner.run(ObjectRunner.scala:30)
	at scala.tools.nsc.CommonRunner.run$(ObjectRunner.scala:28)
	at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
	at scala.tools.nsc.CommonRunner.runAndCatch(ObjectRunner.scala:37)
	at scala.tools.nsc.CommonRunner.runAndCatch$(ObjectRunner.scala:36)
	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:70)
	at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:91)
	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:103)
	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:108)
	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

$ scala -version
Scala code runner version 2.13.8 -- Copyright 2002-2021, LAMP/EPFL and Lightbend, Inc.

$ java -version
openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.18.04)
OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)

so how to fix it? :slight_smile:

That’s interesting. I’ve never used “sbt package” before. I’ve also never tried to run the class file with scala like that. With sbt:hadoop writer> active, I can run compile and run without problems.

I have tried the sequence of commands you used, and they work (I added hello world):

 ➜ sbt package
[info] welcome to sbt 1.6.2 (Ubuntu Java 11.0.13)
[info] loading global plugins from /home/spamegg/.sbt/1.0/plugins
[info] loading project definition from /home/spamegg/Public/scratch/myjob/project
[info] loading settings for project myjob from build.sbt ...
[info] set current project to hadoop writer (in build file:/home/spamegg/Public/scratch/myjob/)
[success] Total time: 1 s, completed Feb 18, 2022, 1:10:25 PM

➜ cd target/scala-2.13/classes/

➜ scala MyJob
hello world!

What is your

java --version

and your

echo $JAVA_HOME

What do you get from sbt compile?

$ java -version
openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.18.04)
OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)

$ echo $JAVA_HOME
/usr

compile as well as package.

Not sure if this changes anything. But your JAVA_HOME probably should be /usr/lib/jvm/java-11-openjdk-amd64 Also $JAVA_HOME/bin should be on PATH.

maybe not that case. my hadoop requires this JAVA_PATH settings.

OK, I was able to run with the versions and settings I mentioned. I won’t be able to help more, good luck.

1 Like

update: I changed to scala 2.12.15 with below build.sbt:

name := "my hadoop client"

version := "1.2"

scalaVersion := "2.12.15"


libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "3.3.1"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "3.3.1"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "3.3.1" % Test

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs-client" % "3.3.1"

And this source code:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.io.PrintWriter

/**
* @author ${user.name}
*/
object Myjob {

  def main(args:Array[String]):Unit = {
    val conf = new Configuration()
    conf.set("fs.defaultFS", "hdfs://127.0.0.1:8020")
    val fs= FileSystem.get(conf)
    val output = fs.create(new Path("/tmp/test/mySample.txt"))
    val writer = new PrintWriter(output)
    try {
      writer.write("this is a test") 
      writer.write("\n")
    }
    finally {
      writer.close()
      println("Closed!")
    }
    println("Done!")
  }
}

The compilation is ok:

[success] Total time: 1 s, completed Feb 18, 2022, 8:01:38 PM

But when running the class it gets another error:

$ scala Myjob
java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataOutputStream
	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	at Myjob.main(Myjob.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at scala.reflect.internal.util.RichClassLoader$.$anonfun$run$extension$1(ScalaClassLoader.scala:101)
	at scala.reflect.internal.util.RichClassLoader$.run$extension(ScalaClassLoader.scala:36)
	at scala.tools.nsc.CommonRunner.run(ObjectRunner.scala:29)
	at scala.tools.nsc.CommonRunner.run$(ObjectRunner.scala:27)
	at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:46)
	at scala.tools.nsc.CommonRunner.runAndCatch(ObjectRunner.scala:36)
	at scala.tools.nsc.CommonRunner.runAndCatch$(ObjectRunner.scala:35)
	at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:46)
	at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:73)
	at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:92)
	at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:103)
	at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:108)
	at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

I am really confused about this. Does anybody know this issue?
Thanks.

package will only include your code, none of your dependencies is added to your JAR you need to manually add them to the CLASSPATH

You may look into sbt-assembly & sbt-native-packager for alternative ways of producing an uber JAR or other kinds of self-contained executables.
But, no idea if that is a good idea with HDFS, AFAIK, your cluster should already provide those JARS; my only experience with Hadoop related projects is that the classpath is a nightmare and you will always do something wrong.

2 Likes