I found a usage of=:=, but i can't understand

Aclliceon · December 16, 2024, 2:59pm

/**
   * Transforms each vertex attribute in the graph using the map function.
   *
   * @note The new graph has the same structure.  As a consequence the underlying index structures
   * can be reused.
   *
   * @param map the function from a vertex object to a new vertex value
   *
   * @tparam VD2 the new vertex data type
   *
   * @example We might use this operation to change the vertex values
   * from one type to another to initialize an algorithm.
   * {{{
   * val rawGraph: Graph[(), ()] = Graph.textFile("hdfs://file")
   * val root = 42
   * var bfsGraph = rawGraph.mapVertices[Int]((vid, data) => if (vid == root) 0 else Math.MaxValue)
   * }}}
   *
   */
  def mapVertices[VD2: ClassTag](map: (VertexId, VD) => VD2)
    (implicit eq: VD =:= VD2 = null): Graph[VD2, ED]

I’m new to Scala. The above code is from spark-graphx.

My question is that now that the default value of eq is set to null, why still keep the code( (implicit eq: VD =:= VD2 = null)) .

Does this mean there is some method to enable type check even when the default value of eq is null?

jducoeur · December 16, 2024, 3:19pm

That’s a pretty weird signature, honestly – I’ve never seen anything quite like it. (And using null is usually an anti-pattern in Scala, although it might be the least-bad approach here, depending on precisely how they’re using it.)

But it appears to be saying (based purely on the signature, knowing nothing about the semantics), “If and only if VD and VD2 are actually the same type, summon an equality comparison operator eq so I can check whether they are the same value; if they aren’t the same type, don’t bother.”

So basically, I would expect this code to behave differently depending on whether VD and VD2 are the same type.

sangamon · December 16, 2024, 4:20pm

eq == null is used as a runtime predicate for type preservation. Semantics (hopefully) are the same either way, this seems to be about runtime behavior optimization.

github.com

apache/spark/blob/576caec1da85c4451fe63e2a5923f2dbf136e278/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L126


      
          }
          
          override def reverse: Graph[VD, ED] = {
            new GraphImpl(vertices.reverseRoutingTables(), replicatedVertexView.reverse())
          }
          
          override def mapVertices[VD2: ClassTag]
            (f: (VertexId, VD) => VD2)(implicit eq: VD =:= VD2 = null): Graph[VD2, ED] = {
            // The implicit parameter eq will be populated by the compiler if VD and VD2 are equal, and left
            // null if not
            if (eq != null) {
              vertices.cache()
              // The map preserves type, so we can use incremental replication
              val newVerts = vertices.mapVertexPartitions(_.map(f)).cache()
              val changedVerts = vertices.asInstanceOf[VertexRDD[VD2]].diff(newVerts)
              val newReplicatedVertexView = replicatedVertexView.asInstanceOf[ReplicatedVertexView[VD2, ED]]
                .updateVertices(changedVerts)
              new GraphImpl(newVerts, newReplicatedVertexView)
            } else {
              // The map does not preserve type, so we must re-replicate all vertices
              GraphImpl(vertices.mapVertexPartitions(_.map(f)), replicatedVertexView.edges)

Looks somewhat fishy, indeed, but off my head I can’t think of a “cleaner” (and still concise) way to accomplish this check, so “least-bad” may just be it.

Jasper-M · December 16, 2024, 4:21pm

Indeed I suspect they use this pattern to be able to use an optimized path when the type of the graph stays the same (if eq != null) and fall back to an unoptimized version when the graph needs to be mapped to another type.

Aclliceon · December 17, 2024, 3:20am

Thanks for answering. I have another question that is what should i pass to “eq”.
Suggested by ChatGPT, i tried something as below

import scala.reflect.ClassTag

object Test {
  def main(args: Array[String]): Unit = {
    val x = 1
    val y= "test"
    testFunc(x,y)(implicitly[=:=[Int,Int]])// error
  }

  def testFunc[VD: ClassTag, VD2: ClassTag](x:VD, y:VD2)(implicit eq: (VD =:= VD2) = null): Unit = {
    println(x)
    println(y)
  }
}

the error is

Found: Int =:= Int Required: scala. reflect. ClassTag[Int]

jducoeur · December 17, 2024, 3:53am

I wouldn’t expect you to ever pass anything there. In general, you usually don’t specify implicit arguments explicitly – the compiler figures them out from context. In this case, it’s figuring out whether Int is the same as Int.

(And by specifying it explicitly, you’re confusing the compiler, which is also trying to synthesize a ClassTag there – the : ClassTag basically tells it to silently add another implicit parameter.)

So I would expect you to just call testFunc(x,y) there, without the extra parameter list.

Also – this is very advanced Scala you’re playing around with here, the sort of stuff I don’t teach my engineers until they’ve been working in the language for months, if ever; it’s rare for real application code to need to do this sort of stuff. It’s probably not a great way to start out.

sangamon · December 17, 2024, 10:05am

Just elaborating on @jducoeur’s answer: The #testFunc() signature is syntactic sugar for

def testFunc[VD, VD2](
  x: VD, y: VD2)(
  implicit vd: ClassTag[VD], vd2: ClassTag[VD2], eq: (VD =:= VD2) = null
): Unit

So you’d have to explicitly pass all generic parameters, including the class tags:

testFunc(x, x)(implicitly[ClassTag[Int]], implicitly[ClassTag[Int]], implicitly[Int =:= Int])
testFunc(x, y)(implicitly[ClassTag[Int]], implicitly[ClassTag[String]], null)

…but the whole purpose of this construct is to have the compiler figure things out, so it should just be:

testFunc(x, x)
testFunc(x, y)

sangamon · December 17, 2024, 10:10am

This is the best I could quickly come up with - in Scala 3, not sure about a Scala 2 equivalent, and probably it breaks on some border cases:

enum ParamCmp[X, Y]:
  case Equal[A, B]() extends ParamCmp[A, B]
  case NotEqual[A, B]() extends ParamCmp[A, B]

object ParamCmp:
  given[X, Y](using X =:= Y): ParamCmp[X, Y] = Equal[X, Y]()
  given[X, Y](using NotGiven[X =:= Y]): ParamCmp[X, Y] = NotEqual[X, Y]()

def condEq[X, Y, Z](fEq: => Z)(fNEq: => Z)(using cmp: ParamCmp[X, Y]): Z =
  import ParamCmp.*
  cmp match
    case Equal() => fEq
    case NotEqual() => fNEq

def dumpEq[X, Y](using cmp: ParamCmp[X, Y]): String =
  condEq[X, Y, String]("eq")("neq")

println(dumpEq[Int, Int])
println(dumpEq[String, Int])

sangamon · December 17, 2024, 10:17am

In this case the construct is exposed on the framework API. Then I guess you need to understand it at least sufficiently to know how to invoke it - and to understand that you don’t need to understand it.

Aclliceon · December 17, 2024, 10:43am

I see . That’s really cool.

jducoeur · December 17, 2024, 2:20pm

This raises an important point: there are, sort of, two “halves” to the Scala language – features that everybody needs all the time, and features that are primarily for writing libraries, less for application programming.

One of the very first things I teach folks is “Don’t try to learn all of Scala, at least not soon.” The language is chock-full of features that are really cool, and really powerful, but often a bit brain-breaking and often mostly irrelevant for application developers. They exist primarily to enable Scala’s exceptionally-powerful library ecosystem, where a lot of features that get hard-coded into many languages instead are reduced to their underlying abstractions here, so that libraries can implement them.

(We used to have an unofficial but helpful breakdown of which language features tended to be useful in which contexts, the so-called “Odersky levels”. Those are long since obsolete, but I sometimes think an updated version would be helpful.)

As @sangamon points out, though, this all gets fuzzy when you’re talking about API signatures. Many of these features are visible in library APIs, so it can be helpful to know what they mean, even if you generally won’t use them yourself.

To that end:

implicit ev: T1 =:= T2 means essentially “only compile if these two types are the same”. It’s typically used as a way to define a method that is only available under those circumstances. (I’d never seen the null trick before, after 16 years of working in Scala – it’s super-rare AFAIK.) The parameter itself is generally not used – it’s just evidence that the two types are the same, which is why it is often named ev.
T: ClassTag tells the compiler to take some of the information that it knows about at compile time, and reify it into runtime data. This is mainly needed as a way to work around “erasure”, which is the fact that at runtime we don’t know what type you actually have in a generic. (Intentionally: it’s a tradeoff, but this is one of the big differences between the JVM and .NET.) You basically never need to do anything about ClassTag at the call site – it’s just an instruction to the compiler.