I have converted most of our ETL scripts to Scala

I have counted them today, 22 scripts were converted into scala.
Still remain some which can’t be converted due to:

  1. leaks libraries such as the data handler for ruby on rails
  2. dependencies issues such as the Hadoop packages.
  3. poor performance on regex comparing to perl. For one of the ETL job Perl can finish with less than 10 mins, but scala need more than 30 mins. Can reference this writting.

scala’s collection is very powerful. the higher functions to collection are really great. I like them. Thanks scala world.


I have got an idea, scala’s streaming IO api is suitable for cloud file system such as the object storage system. The AKKA framework is also suitable for a distributed app. So we maybe can combine them to make an open source object storage system. The well known ones are here.

I know the distributed protocol Raft well. If someone have the interests to make this a project, I can contribute my part of work. :slight_smile:

1 Like

Cool! Glad that (mostly) worked out!

Note that Scala doesn’t have its own regex implementation; it uses the one provided by the JVM. So if you’re looking to see whether anyone else has done regex engine performance comparisons, you don’t need to limit your search to Scala-specific sources.


Nice! Out of curiosity: are any them publicly viewable?

they are biz-related methods, it makes no sense to publish them IMO.

BTW, recent days I am researching to write the logstash filters with scala.
logstash is written in JRuby, as well as scala both run on JVM.
So maybe I can migrate the scala script to integrate with logstash as well.