Use case of immutable List

as a 14 years java developer, i did not find any reason why should i use immutable list/data structure instead of mutable ones. please bring some example in which immutable one is preferred from design or performance point of view.
immutable list access and appended are done in o(n) but List in java has o(1) append and access.
i see some expression like:

var l =  List(1,2,3) 
l = l.someMehtod

so why we should create a new list and assign it to previous one. whats the matter with mutable one?
I know there are mutable counterparts for every immutable one. but i dont know the philosophy of immutability.

Before going into the reasons to favor immutability, it’s worth pointing out that Scala’s List is a concrete type, implemented as a linked list, while Java’s List is an abstract type with various implementations, so you can’t really compare them directly. For instance, a Java list may very well be a java.util.LinkedList, which is implemented as a doubly-linked list and has o(n) random access, not o(1).

The equivalent of a java List in Scala is a Seq; Vector is the usual array-based (rather than linked-list based) implementation of an immutable Seq, useful if you need, say, fast random access. You can check out the performance characteristics of the standard Scala collections here : https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html.

3 Likes

There is basically an entire course’s worth of subject matter in your question. (I used to do Scala training, and this topic takes at least an hour or two to even begin to do justice.)

But just to start off with the simplest reason for most serious use cases: immutable is vastly better for any sort of multi-threaded application. If you have multiple threads, then those threads can be trying to access a data structure at the same time. If that data structure is mutable, you can easily have one thread trying to access the data structure at the same time that another one is changing it, or worse, two threads trying to change it simultaneously, frequently leading to hard-to-debug inconsistencies, non-reproducibility, and data corruption. This is one of the most common sources of critical real-world bugs in mutable-centric programs, and costs enormous amounts of time, money, and reputational damage.

This can be worked around to a limited degree by extremely careful use of locks, yes. But getting locks right is much harder than most programmers realize, and the result is that most locking code is at least a little bit broken. And locking scales poorly: while you can build locks for a single data structure, it is easy to wind up with deadlocks when you have multiple lock-protected data structures, as inevitably happens when the program grows. Again, this causes huge problems.

None of which has much to do with Lists per se, mind – the point is that mutability, in general, is quite dangerous in any sort of serious multi-threaded code. (And in my experience, that’s most enterprise-grade code: I haven’t worked on a server-side single-threaded project in decades.)

By contrast, immutable data structures are inherently much more thread-safe – you can share them around, and each thread will see a single version at a time. The threads might wind up working with different “snapshot” versions of that data structure, but you generally won’t experience the sort of corruption that often arises when contending for a mutable structure. There are simpler rules that you can follow in order to make your threads play nicely together, and it is much easier to build seriously reliable large-scale systems – the sorts of bugs that plague mutable-centric code just don’t happen.

One additional point:

Correct, but mostly irrelevant – you should never append to a proper linked list. (As Scala’s implementation is.) You should always, always prepend instead – that’s what that data structure is for, and that’s O(1). If you really need to append, you should use a different data structure. In practice, that’s actually not terribly common – prepending usually works fine if you design your algorithm around it.

Generally speaking, if you’re getting O(n) behaviour without good reason, you’ve chosen the wrong data structure. There are many data structures available for exactly this reason: you pick what suits your needs.

3 Likes

if a data structure should be intact (immutable) so what’s the use case. i know immutable list cannot not be changed in a multi-thread env but what is the advantage. all thread should have the same copy of our data structure and work on it to reach some result . if data structure is immutable and each thread takes its copy of our data structure, so what? we should aggregate the result and need an immutable ds. if thread should not change the ds (data structure), who should change the ds? when?
i hope i describe my concern clearly

my intention was ArrayList.
I’m reading the Martins’s book for second time and see his main focus is on immutable list. maybe i’m wrong.
what is the use case that immutable list is outperforms and shines over other lists or sequnces?

Another reason to use ListBuffer instead of List is to prevent the po-
tential for stack overflow. If you can build a list in the desired order by
prepending, but the recursive algorithm that would be required is not tail
recursive, you can use a for expression or while loop and a ListBuffer
instead. You’ll see ListBuffer being used in this way in Section 22.2.

can’t we use other data structure instead of immutable list in mentioned paragraph?

Read from it to produce new values.
You can see in the scaladoc that such immutable List provides a very rich API of operations like map, filter, folds, exists, find, collect, etc. All of those leave the original data intact and produce new values, you can express very easily any kind of algorithms by chaining those operations.
And if you are worried that you are producing a lot of garbage then. You shouldn’t worry since the runtime takes care of that; but, if you find that is a bottle neck you can use an Iterator or a LazyList or any other lazy collection to perform that chain of operations in one go.

You have three options:

  1. Redesign the solution so you do not need to modify shared by multiple threads. Each thread has its own data and processes it locally, global data is used as read-only and if you need, you pass modified copies of the data to other threads.
  2. You move mutability outside of your system; e.g. to a database. So you do not need to worry about locks, synchronization, etc.
  3. When you absolutely need to mutate global data for any reason, then you have to use any concurrent model like: old locks, concurrent mutable data structures (e.g. java.util.concurrent.ConcurrentMap), actors, cats.effect.Ref, etc.

List is used a lot in the book probably for two reasons: First, it is a pretty simple ds and it is common to implement it yourself as an exercise or to explain other concepts like pattern matching and recursion or even FP concepts like map and fold in terms of List, because it is easy to understand. Second, List is pretty common as a default ds for Scala programmers, probably again because it is simple to understand and easy to use.

what is the use case that immutable list is outperforms and shines over other lists or sequnces?

Prepending, pattern matching and tail-recursive algorithms.
Also, since usually, you will be processing your data by using higher-order functions like the ones I described above, you usually do not care which sequence you have since all of them will perform similarly for simple iterations.

can’t we use other data structure instead of immutable list in mentioned paragraph?

The mentioned paragraph uses a mutable ListBuffer, not an immutable List.
Anyways, yeah you may use any other ds, the things is that such change by impact either complexity / performance or even semantics.

2 Likes

Adding a little to @BalmungSan’s excellent answer: immutability often isn’t something you do for specific use cases, it’s the way you architect your whole systems. It is not unusual for me to build large-scale enterprise services using no mutable data structures at all, at least at the business-logic level. It requires designing your system differently, but it works efficiently and reliably.

And I should note: it is fairly common for there to be a small amount of mutability concealed deeply inside the guts. But that mutability should be limited in scope, tested extremely rigorously, and should not “leak” outside of specific APIs. That sometimes gains you performance where you actually need it (the very hot paths), without risking larger-scale problems.

The key point is that immutability should be the default for robust programming, with mutability used only in tactical, tightly-controlled ways. This is a huge mindset shift from traditional Java programming, and takes most folks some serious effort (it’s one of the major things we spend time teaching at work), but the result is usually much more reliable code.

1 Like

I got something but i feel i came from another island.
I want to understand the real benefit of FP and its concepts. is there any project, code or something like that which compares two approaches and show benefits of FP.
You now until i understand it deeply i cannot use it with my heart! and convince others to use it.
Thanks very much @jducoeur @guilgaly @BalmungSan

As has been pointed out by others, it really (IMO) boils down to being sure “others” don’t mess with your data, which is important in distributed systems.

Functional programming is programming with functions. A function is a mapping A => B. Given the same A, you will always get the same B.

This property is useful, because it allows you to refactor with confidence. Particularly, it allows you to always make the following replacement:

If you have some val a = foo(), you can always replace a and foo() with one or the other, and the program will be equivalent. Of course, that also goes for non-trivial right-hand sides.

Now imagine foo() modifies some mutable list. That would mean replacing foo() with a is no longer possible, because the meaning of the program would change.

It turns out equational reasoning is a very potent tool in working out complex algorithms and scenarios. It reduces the number of things you need to keep track of, and everything that you need to keep track of is explicit.

Such advantages may sound far fetched and theoretical to some people, but people who tried it out and worked with it find that it’s a tool they don’t ever want to miss out on again.

1 Like

Excellent point!

All of thing that mentioned is just an abstract descriptions and i need some in production concrete code snippet that show the power and benefit of FP and scala.
Guys i’m a fan of Scala because of its conciseness and echo system such as Akka and Play and reactive systems. my questions are just for learning basic concepts in depth.

Functional code is much easier to maintain because you can inline any definition and factor out any sub-expression without worrying that it will change what your program does. A corollary is that unless there is a data dependency between computations the order of evaluation doesn’t matter, so you can reorder things without worry. These things (plus lawful abstractions that you get from FP libraries like Cats) allow for very aggressive refactoring, which is where the real benefit of FP lies. It makes your programs more likely to start out correct, and more likely to stay correct as they evolve.

I think it probably takes some experience to really appreciate. Just looking at code is unlikely to convince you.

2 Likes

The power and benefit is not in the finished code, it’s in the process of writing it. As such, there is no finished code that can show you the benefit.

1 Like

Yeah, this.

Let me put it this way – many programmers (and worse, many managers) misunderstand what makes software hard. They think it’s all about knowing the syntax, knowing how to write an individual function, knowing how to write little things in isolation. That’s basically completely wrong: that’s the easy stuff.

The hard part is reasoning about your code: being able to look at it and say, “Yes, I understand what is going on here”. That’s very easy on the small scale, and because the small scale is what folks mistakenly focus on, they think it’s easy. But reasoning like that scales very poorly: as your code gets bigger and bigger (and as it ages), the chances that you actually understand it, and actually understand what’s going on, go way, way down.

That’s why most code is so ferociously buggy, and why it gets much buggier with age: the programmers working on it don’t actually understand their code. That’s not because they’re bad programmers – it’s because you really can’t truly understand most large-scale old-fashioned code, because the moving parts are interacting in deeply unpredictable ways. In many cases, it is literally (and I suspect provably) impossible to truly understand how the code is working in a complex environment.

The “killer app” of serious functional programming is that it makes the code more predictable, more comprehensible, and more maintainable. It’s considerably more rigorous, and reduces the complexity space a great deal. It turns your functions into true building-blocks, where each one means precisely what it says on the label, without weird side-effecty stuff that winds up changing other parts of the code. And thus, you can say with a good more confidence that this higher-level function, which is made of those lower-level functions, does what you think it does. It’s not perfect, mind – but it’s a lot better.

Yes, that’s kind of abstract, because the important benefits aren’t about running faster or anything like that. (It sometimes does run faster under some circumstances, and runs slower under others – depends on the situation.) What really matters is that it makes code easier to modify without unpredictably breaking things – which makes it safer and faster to maintain, and reduces bugs. Which can reduce overall costs (which is the big benefit from a management POV) and result in fewer instances of being woken up in the middle of the night due to things crashing (which is perhaps the biggest benefit from an engineer’s POV).

(And personally, I think it’s way more fun. I’ve been in the field for 40+ years, in many languages, and the more I work in the FP style, the more I find it just plain more satisfying than anything I’ve found in ages. But that’s subjective…)

5 Likes

Complementing what Rob, Martijn and Justin said.

I once decided to write with my very own words Why FP, hope it helps in giving you an idea of its benefits.

1 Like

Even in a single-threaded system, being able to safely share data among different parts of a system is invaluable. It avoids so many defensive copies. In a class with, say, private var data: List[...] = ...:

  • I can call any foreign method m(data) and not worry what it might do to my list.
  • I can return from a method of the class a reference on data and again not worry what the caller might do with it.

Using mutable structures, both scenarios would require (possibly expensive) copies of the list (or immutable wrappers, like Java’s Collections.immutable...).

2 Likes

Since you’re asking about List specifically. There are two settings.

One is small lists. There, List has somehow been converged upon as the default. Other structures could do the same job.

Second, large lists. The primary use-case of immutable linked lists is memory-sharing of common tails. That is, the big advantage is that you have O(1) copy/snapshot (provided by immutability) and O(1) prepend and tail.

If you don’t need O(1) copy/snapshot, then you should use a mutable.ArrayBuffer. If you don’t need O(1) prepend/tail, then you should use ArraySeq. If you need more operations than List provides (fast indexed access, fast tail, fast iteration if you cannot guarantee that the List is in L1) then you should consider Vector.

List gives an OK compromise between small-collection and medium-collection behavior (you should not use List for large collections, outside of very specific specialist applications – firstly, JVM object overhead will kill you, compared to ArraySeq, secondly iteration speed is miserable because of long dependency chains over memory loads, thirdly it is too easy to accidentially write O(N^2) algorithms, because e.g. list.size is O(N)).

As some others noted, O(1) copy/snapshot is really really useful for multithreaded applications. First, it helps to avoid bugs; second, it is damn expensive to communicate between cores (thread 1 writes and thread 2 reads from the same cache-line, or even locks/synchronize that you might otherwise need).