Trembita - library for modeling complex data transformation pipelines


#1

Happy to anounce version 0.7.2-SNAPSHOT of Trembita https://github.com/vitaliihonta/trembita.

Trembita is a project that allows to model complex data transofmation pipelines (locally, in parallel, using akka streams or on spark cluster, streaming and non-streaming). It also provides typesafe DSL for stateful transformations (FSM style like in Akka FSM actor) and QL (like frameless does but not only for spark).

It is implemented in pure-functional way (in most cases) using Cats (Effect), Shapeless, Spire.
I’ve already integrated it with Akka Streams, Apache Spark (RDD, SQL, Streaming), implemented caching (locally and using Infinispan).
The goal of this project is to be a glue for techonologies zoo you see in business projects.
It also should provide easy-to-use, typesafe API.

Project contains many examples for different cases, scripts for running pipelines on spark cluster using docker, etc.
I’ve already integrated trembita into project for Israel Aerospace Industries (real-time plane data analysis) as a test-drive =)
Contributions are welcome!


#2

Why not announce this on the akka streams discussion as well?


#3

Why not =)
Can you please share the link of akka streams discussion?


#4

here’s a link to a post that gives both the discussion URL and typical keyword marking that posters use for topic indication as akka streams related:

https://discuss.lightbend.com/t/transform-a-csv-file-into-multiple-csv-files-using-akka-stream/3142