Announcing Compute.scala for scientific computing with N-dimensional arrays in parallel on GPU, CPU and other devices

Happy April Fool’s Day everyone,

We just released Compute.scala v0.3.1, a Scala library for scientific computing with N-dimensional arrays in parallel on GPU, CPU and other devices. It will be the primary back-end of the incoming DeepLearning.scala 3.0, to address performance problems we encountered in DeepLearning.scala 2.0 with ND4J.

  • Compute.scala can dynamically merge multiple operators into one kernel program, which runs significantly faster when performing complex computation.
  • Compute.scala manages data buffers and other native resources in a determinate approach, consuming less memory.
  • All dimensional transformation operators (permute, broadcast, reshape, etc) in Compute.scala are views, with no additional data buffer allocation.
  • N-dimensional arrays in Compute.scala can be converted from / to JVM collection, which support higher-ordered functions like map / reduce, and still can run on GPU.

We created some benchmarks between Compute.scala and ND4J, showing that when computing complex expressions or for large arrays, Compute.scala is faster than ND4J’s cuBLAS backend (really not an April Fools’ Day prank).

Check the Github page of Compute.scala for detail:

Note the current version of this project is just a minimum viable product. Many important features are still under development.

Contribution is welcome.
Thank you!

1 Like

This is so cool! Might be a good backend for my experimental typesafe deep learning library (currently using C++ core of PyTorch as backend).

The life cycle of data buffers might be a problem when using a C++ backend, as JVM has neither native RAII feature nor reference counting based garbage collector like Python.

Torch C++ core has reference counting in it. When writing in JVM, follow their retain and free would do.

I mean it might be hard to determine when to call free, if you don’t want to depend on finalizers or cleaners.