Introducing Gallia: a library for data manipulation

Quick update:

  • Scala 2.13: I’ve migrated the codebase to Scala 2.13
    • I added a few comments on the github commit to describe the experience
    • There is one hack left due to source incompatibilities with ArrayDeque[T] and SeqView[T, Seq[_]]
    • I think for the time being I’ll create a 2.12 branch to accommodate said hack (open to suggestions!)
    • Interestingly the dbNSFP example runs ~5 times faster now, I’m not sure how come yet
  • Biostar announcement: I made another announcement for Gallia on Biostars, tailored to bioinformatics concerns
    • In the process I added a new example: re-processing Clinvar’s VCF file
    • and added an example input row and output object for the dbNSFP example, with the permission of the data owner
  • Upcoming example: The next example will showcase wide transformations to highlight how Spark RDDs can be leveraged in Gallia, when necessary
  • License: I’m leaning towards using MariaDB’s Business Source License (BSL), with Additional Use Grant terms along the lines of “free unless you can largely afford it”; also see CockroachDB’s interesting take on BSL
  • Contact: Some people have reached out to me directly, which is great, but don’t hesitate to provide input for others to see!
1 Like