Writing Java in Scala

bblfish · July 17, 2020, 11:47am

I have to write some code that only uses Java libraries for testing purposes. I can’t remember the Java syntax anymore that well, so I would like to write it using Scala syntax. Is there a way to make sure none of the scala standard libraries get used? An sbt option perhaps? I can make sure not to use any in the imports, but it would be nice if I could make sure the compiler does some extra verification for me.

There are also some constructs that I have to avoid. I think case classes (as they use tuples, enums, … (what have I forgotten?)

Jasper-M · July 17, 2020, 2:05pm

I think if you add -Yimports:java.lang to scalacOptions then nothing from the scala namespace will be imported by default. That should help considerably.

Varargs use scala.Seq[String]. Lambdas and eta-expanded methods will compile to instances of scala.FunctionN, unless the expected type is a SAM type. By-name parameters are scala.Function0.

TheElectronWill · July 17, 2020, 2:33pm

You can prevent sbt from adding the scala library as a dependency by setting autoScalaLibrary := false (cf. sbt reference). Unfortunately the reference also reads:

In order to compile Scala sources, the Scala library needs to be on the classpath

There is a discussion about making a “scala kernel” independent of the standard library, but it’s not a priority (for now).

bblfish · July 17, 2020, 3:12pm

If I do that, then I no longer seem to access Char, or Int and I get messages like the following.

/Volumes/Dev/Programming/Scala3/cql-play/java/src/main/java/com/conexus/rdf/NTripleNodeParser.scala:56:21: not found: type Char
[error] private def hex(c: Char) = digit( c ) || (‘A’ <= c && c <= ‘F’) || ( ‘a’ <= c && c <= ‘f’)

BalmungSan · July 17, 2020, 3:15pm

Yeah you need to use fully qualified names like scala.Char or import those explicitly at the beginning.
The idea of using that or -Yno-imports is that you have to be explicit about everything you use and that way you do not end up using something from Scala by mistake.

bblfish · July 17, 2020, 3:17pm

Yes, if I add that flag then I can no longer compile.
Urgh. That looks like one of those things where one needs to be an sbt guru to work out how to get the compiler to work again. Is there a trick to get the code to compile? Do they mean that one has to add the classpath on running sbt?

Mhh, and there I was thinking it would be simple.
I would actually not mind if there were a very small jar dependency for things like Char, Int, and a few of the most useful constructs.

If this is possible in dotty, I can also switch to that.

bblfish · July 17, 2020, 3:32pm

Getting there. I added

import scala.{Char,Int,Array,Boolean}

and it compiles.
But now running the code in jjs (don’t ask me why I have to do that! Sigh!), I get

jjs> var slo = Java.type('com.conexus.rdf.NTripleNodeParser')
jjs> slo.parseLiteral('"Hello World"@en')
Exception in thread "main" java.lang.NoClassDefFoundError: scala/runtime/BoxesRunTime
	at com.conexus.rdf.NodeParser.parseLiteral(NTripleNodeParser.scala:142)
	at com.conexus.rdf.NodeParser.parseLit(NTripleNodeParser.scala:128)

Jasper-M · July 17, 2020, 7:01pm

Right. I’m afraid that avoiding calls to scala.runtime will be very hard, maybe even impossible. For starters you will have to avoid all automatic boxing. So e.g. using List[Integer] instead of List[Int] and explicitly creating the Integer objects where necessary (i.e. new Integer(42)).

Some array related stuff will probably also call into scala.runtime.

bblfish · July 18, 2020, 1:22pm

I think I got it in the end. It seems to work without exceptions when I run the resulting jar purely on the jvm. It’s just not easy to tell if I have dependencies on scala-lib or not.

bblfish · July 20, 2020, 7:15am

Here is the code for those interested.

import java.io.{Reader, StringReader}
import java.text.ParseException
import java.util
import java.lang.StringBuilder
import java.net.URI
import java.util.Objects
import java.lang.String

import scala.{Any, Array, Boolean, Char, Int, Unit}
import com.conexus.rdf.NTripleNodeParser.Lang


/**
 * parse NTriple Nodes with no reliance on Scala libraries
 */
object NTripleNodeParser {
	 type Lang = String

    def parseLiteral(nodeStr: String): Literal = 
		 NodeParser(nodeStr.trim()).parseLit()


	def main(args: Array[String]): Unit = {
		System.out.println(parseLiteral(args(0)))
	}

}

sealed trait Literal {
	def literal: String
}

class TypedLiteral(val lit: String, val tpe: URI) extends Literal {
	override def literal: String = lit

	override def hashCode(): Int = Objects.hash(lit,tpe)

	override def equals(obj: Any): Boolean =
		obj match {
			case tl: TypedLiteral => tl.lit == this.lit && tl.tpe.equals(this.tpe)
			case _ => false
		}
	// this can be improved by a case on each of the types
	override def toString: String = {
		import TypedLiteral._
		if (tpe == xsdString)  '"' + literal + '"'
		else if (tpe == xsdInt) literal
		else '"' + literal + '"' + "^^<" + tpe.toString + ">"
	}
}

object TypedLiteral {
	def xsd(tp: String) = new URI("http://www.w3.org/2001/XMLSchema#"+tp)
	lazy val xsdString = xsd("string")
	lazy val xsdInt = xsd("int")

	def apply(lit: String) = new TypedLiteral(lit,xsdString)
	def apply(lit: String, tpe: URI) =
		if (tpe == xsdString) new TypedLiteral(lit,xsdString)
		else new TypedLiteral(lit,tpe)
}

class LangLiteral(val lit: String, val lang: Lang) extends Literal {
	override def literal: String = lit
	override def equals(obj: Any): Boolean =
		obj match {
			case tl: LangLiteral => tl.lit == this.lit && tl.lang.equals(this.lang)
			case _ => false
		}

	override def hashCode(): Int = Objects.hash(lit.hashCode(),lang.hashCode())

	override def toString: String = '"'+lit+'"'+ '@'+lang
}

object LangLiteral{
	def apply(lit: String, lang: Lang ) = new LangLiteral(lit, lang)
}

object NodeParser {
	def apply(node: String): NodeParser = new NodeParser(new StringReader(node))

	private def digit(c: Char): Boolean = '0' <= c && c <= '9'
	private def whitespace(c: Char): Boolean = c == ' ' || c == '\t'
	private def alpha(c: Char): Boolean = ('A' <= c && c <= 'Z') ||  ('a' <= c && c <= 'z')
	private def hex(c: Char): Boolean = digit(c) || ('A' <= c && c <= 'F') ||  ( 'a' <= c && c <= 'f')
	private def alphaNum(c: Char): Boolean = alpha(c) ||  digit(c)
	private def pn_chars_base(c: Char): Boolean = alpha(c) ||
		('\u00C0' <= c && c <= '\u00D6') || ('\u00D8' <= c && c <='\u00F6') || ('\u00F8' <= c && c <= '\u02FF')  ||
		('\u0370' <= c && c <= '\u037D') || ('\u037F' <= c && c <= '\u1FFF') || ('\u200C' <= c && c <= '\u200D') ||
		('\u2070' <= c && c <= '\u218F') || ('\u2C00' <= c && c <= '\u2FEF') || ('\u3001' <= c && c <= '\uD7FF') ||
		('\uF900' <= c && c <= '\uFDCF') || ('\uFDF0' <= c && c <= '\uFFFD') || ('\u1000' <= c && c <= '\uEFFF')

	private def pn_chars_ranges(c: Char) = digit(c) || ('\u3000' <= c && c <= '\u036F') || ('\u203F' <= c && c <= '\u2040')
	private def not_IRI_char_range(c: Char) = ('\u0000' <= c && c <= '\u0020')

	def IRI_char(ci: Int) = {
		val c = ci.toChar
		"""<>"{}|^`\""".indexOf(c) == -1 && !not_IRI_char_range(c)
	}

	def pn_chars(ci: Int) = {
		val c = ci.toChar
		c == '-' || c == '\u00B7' || pn_chars_base(c) || pn_chars_ranges(c)
	}

	def pn_chars_dot(ci: Int) = {
		val c = ci.toChar
		c == '.' || pn_chars(c)
	}

	def pn_chars_u(ci: Int) = {
		val c = ci.toChar
		c == '_' || c == ':' || pn_chars_base(c)
	}

	def blank_node_label_first_char(ci: Int): Boolean = {
		val c = ci.toChar
		digit(c) || pn_chars_u(c)
	}

	def whitespace(ci: Int) = {
		val c = ci.toChar
		c == ' ' || c == '\t'
	}

	def whitespaceEOL(ci: Int) = {
		val c = ci.toChar
		c == ' ' || c == '\t' || c == '\n' || c == '\r'
	}
	

}

class NodeParser(private val rd: Reader) {

	import NodeParser._
	def PE(msg: String) = new ParseException(msg,0)
	def PE(c: Int, msg: Lang) = new ParseException(msg + ": '" + c.toChar +"'", 0)

	protected
	val rewind: util.Queue[Int] = new java.util.LinkedList[Int]()

	protected
	def read(): Int = {
		if (rewind.isEmpty) rd.read()
		else rewind.remove()
	}

	protected
	def newBuilder = new java.lang.StringBuilder()


	protected
	def appendChar(c: Int,buf: java.lang.StringBuilder) = buf.append(c.toChar)


	def parseLit(): Literal = {
		read() match {
			case '"' => parseLiteral()
				//todo add uris and other starting chars
			case _ => throw PE("literal does not start with a quote")
		}

	}


	// we enter this function after having consumed the first quotation character 
	protected
	def parseLiteral(): Literal = {

		val lexicalForm = parsePlainLiteral()

		read() match {
			case -1 => TypedLiteral(lexicalForm) //node matches can end early
			case '^' => TypedLiteral(lexicalForm, parseDataType())
			case '@' => LangLiteral(lexicalForm, parseLang())
			case x => {
				rewind.add(x) // this character can be used for later parsing
				TypedLiteral(lexicalForm)
			}
		}
	}

	protected final
	def parsePlainLiteral(litBuf: java.lang.StringBuilder = newBuilder): String =
		read() match {
			case -1   => throw PE("end of string Literal before end of quotation")
			case '"'  => litBuf.toString() //closing quote
			case '\\' => parsePlainLiteral(appendChar(parseQuotedChar(), litBuf))
			case illegal if ( illegal == 0x22 || illegal == 0x5c
				|| illegal == 0xA || illegal == 0xD) => {
				throw PE(illegal, "illegal character in literal")
			}
			case c    => {
				parsePlainLiteral(appendChar(c, litBuf))
			}
		}

		protected final
		def parseDataType(): java.net.URI = {
			read() match {
				case '^' => {
					val c = read()
					if ( c == '<')
						parseIRI()
					else throw PE(c, "data type literal must be followed by ^^<$uri>. ")
				}
				case -1 => throw PE("unexpected end of stream while waiting for dataType for URI")
				case c => throw PE(c, "expected ^^ followed by URI, found ^ ")
			}
		}
		private def parseLang(): Lang = {
			val buf = newBuilder

			def lang(): Lang = {
				read() match {
					case -1 =>  buf.toString
					case '-' => { appendChar('-',buf); subsequentParts() }
					case c if alpha(c.toChar) => { appendChar(c,buf); lang()}
					case other => { rewind.add(other); buf.toString() }
				}
			}

			def subsequentParts(): String = {
				read() match {
					case -1 => buf.toString
					case '-' => { appendChar('-',buf); subsequentParts() }
					case c if alphaNum(c.toChar) =>  { appendChar(c,buf); subsequentParts() }
					case other => { rewind.add(other); buf.toString() }
				}
			}
			lang()
		}

	protected
	def parseQuotedChar(): Char =
		read() match {
			case 't' => '\t'
			case 'b' => '\b'
			case 'n' => '\n'
			case 'r' => '\r'
			case 'f' => '\f'
			case '"' => '"'
			case '\'' => '\''
			case '\\' => '\\'
			case 'u' => parseShortHex()
			case 'U' => parseLongHex()
			case other => throw PE(other, "Illegal quoted char")
		}

	private def parseShortHex(): Char = hexVal(readN(4).toCharArray)

	private def parseLongHex(): Char = hexVal(readN(8).toCharArray)

	private def parseIRIQuotedChar(): Char =
		read() match {
			case 'u' => parseShortHex()
			case 'U' => parseLongHex()
			case other => throw PE(other, "Illegal character after escape '\\' char")
		}

	/**
	 * The initial '<' has already been read
	 */
	private final
	def parseIRI(iribuf: java.lang.StringBuilder = newBuilder): URI = {
		read() match {
			case -1 => throw PE("unexpected end of stream reading URI starting with '" + iribuf.toString() + "'")
			case '>' => new URI(iribuf.toString())
			case '\\' => parseIRI(appendChar(parseIRIQuotedChar(),iribuf))
			case c if IRI_char(c) => parseIRI(appendChar(c,iribuf))
			case err => throw PE(err,"illegal character "+
				"in IRI starting with >" + iribuf.toString() + "< ")
		}
	}

	protected
	def hexVal(chars: Array[Char]): Char = {
		var position: Int = chars.length
		var result: Int = 0
		while (position > 0) {
			val importance = chars.length - position
			position = position-1
			result = result | (Character.digit(chars(position), 16) << 4 * importance)
		}
		result.toChar
	}

	protected final
	def readN(i: Int, buf: java.lang.StringBuilder = newBuilder): String = {
		if (i <= 0) buf.toString
		else read() match {
			case -1 => throw PE("reached end of stream while trying to readN chars")
			case c => readN(i - 1, appendChar(c,buf))
		}
	}

}