Happy April Fool’s Day everyone,
We just released Compute.scala v0.3.1, a Scala library for scientific computing with N-dimensional arrays in parallel on GPU, CPU and other devices. It will be the primary back-end of the incoming DeepLearning.scala 3.0, to address performance problems we encountered in DeepLearning.scala 2.0 with ND4J.
- Compute.scala can dynamically merge multiple operators into one kernel program, which runs significantly faster when performing complex computation.
- Compute.scala manages data buffers and other native resources in a determinate approach, consuming less memory.
- All dimensional transformation operators (
permute
,broadcast
,reshape
, etc) in Compute.scala are views, with no additional data buffer allocation. - N-dimensional arrays in Compute.scala can be converted from / to JVM collection, which support higher-ordered functions like
map
/reduce
, and still can run on GPU.
We created some benchmarks between Compute.scala and ND4J, showing that when computing complex expressions or for large arrays, Compute.scala is faster than ND4J’s cuBLAS backend (really not an April Fools’ Day prank).
Check the Github page of Compute.scala for detail:
Note the current version of this project is just a minimum viable product. Many important features are still under development.
Contribution is welcome.
Thank you!