First release of jsm4s - an obscure but powerful ML algorithm


#1

Part of my Ph.D thesis is related to working with an interesting but not well known ML algorithm - JSM method.
The main problem with JSM is that there is not a single implementation with good out of the box experience, that is until now.

This is my first serious project in Scala, so any comments are warmly welcome:

How well it performs?
I made a test run on Mushroom datset with 80% training and 20% validation set, got 100% accuracy quite consistently.
It is intriguing to see how it works on larger datasets, but keep in mind it’s quite slow ATM.

Also there is plenty of knobs to tune it that I haven’t exposed yet.


#2

Wow, that is an obscure one! I really hope it ends up being practical; though I’m not terribly familiar with JSM, that style (inductive/falsified) of learning in principle seems like it would be suited to address a variety of problems that aren’t well addressed by other techniques (some Bayesian methods come close, I guess).


#3

As far as practicality goes it has 3 problems that I’m working on with various degree of success.

1st is speeeed, most implementations I’ve seen a horribly slow and the algorithm itself is prone to exponental explosion. My idea here is using a clustered approach where dataset is split by hamming distance and each cluster is processed in parallel.

2nd general obscurity, despite the fact that my institute used it for decades featuring numerous research projects, there is 0 reusable libraries even internally. jsm4s aims to fill that void.

3rd is adapting datasets to the strict boolean logic of JSM. I plan a short blog post that will touch on this topic, in short it’s rather involved but quite doable.