Global data and functional programming

Is global data ever appropriate in functional programming? I’d be interested in what Scala FP experts think about that. I am not an FP expert, but I do try to apply the basics. Virtually all of my classes are immutable case classes, for example.

I am developing prototype software in Scala for an air traffic control automation concept that I proposed several years ago called Trajectory Specification. I recently had a paper published on it in the AIAA Journal of Air Transportation for anyone who might be interested:

https://arc.aiaa.org/toc/jat/27/2

Although my software is a prototype, I try to apply the same basic standards that would apply to actual operational software in the field. No, I can’t develop operational ATC software by myself, but my goal is to develop software that can be used as a starting point for operational software (i.e., not just a “throw-away” prototype).

In any case, here is the reason for my question. I have come to the conclusion that global data is sometimes appropriate in FP, but I’d like to get feedback to make sure I am not missing something important.

I have two examples of where I think global data is justified in my application: airspace data and wind data.

Airspace data is static data about the airspace, including airspace boundaries, named waypoints, and runway parameters. It seems to me that this data should be conveniently available without the tedious boilerplate of references passed down the stack (which can be several levels deep).

Wind data is slightly more dynamic. It is currently updated once per hour (and possibly at a higher rate in the future), but it seems to me that it too should be available without having to pass it down the stack. Care must be taken when it is updated, but that is not particularly difficult with basic locks.

I realize that immutable case classes cannot contain references to global data, but that should not be a problem. References to global data cannot be included as fields in immutable classes, but they can be accessed in methods of those classes when necessary. For example, if I need to know airspeed, I can pass groundspeed to a method that subtracts the along-track component of wind speed to obtain airspeed.

What do you think?

Hmm. What do you mean by “functional programming” here? I mean that seriously – the term is used both loosely (various forms of “programs that involve higher-order functions”) and strictly (pure FP).

Also, what do you mean by “global data”? Are you thinking about global variables in the old-fashioned programming sense, or simply data outside of function parameters?

My response differs a bit depending on the answers to these – broadly speaking, global variables and pure FP are a flat contradiction – but I’d generally suggest that, if you’re thinking about global variables, don’t. They can be an easy shortcut for small prototypes, but are a recipe for all sorts of nightmares in real code – threading problems, difficulties reasoning about the code, and other problems. They’re just a lose.

(The fact that you are talking so casually about “basic locks” suggests that you’re being a bit too casual about this problem. Locks always look easy in small programs – it’s when you try to scale, and those locks begin to interact in more complex ways, that everything tends to fall apart.)

Instead, I recommend thinking about your “global data” as an external data source – I/O, essentially. That’s much more conventional, and a solved problem with all sorts of programming. There’s no contradiction between that and FP. Basically, treat your airspace data and your wind data as databases conceptually – not necessarily implemented as literal DBs, but treated as such – instead of trying to stick them into old-fashioned global variables. I suspect you’ll have better luck building something that can grow to real code with an approach like that…

2 Likes

Thanks for the reply. Your questions perplex me a bit because I thought they were essentially answered in my post, but let me try to answer them more explicitly.

No, I am not referring to “pure” FP in the Haskell sense. As I suggested in my post, I mean basic FP principles such as immutable case classes, returning values rather than modifying function arguments in place, etc. (And no, I don’t mean using an IO monad, but I do mean keeping IO separate from algorithms as common sense suggests.)

I am NOT asking if global VARIABLES should be used extensively. I really don’t need a lecture on that any more than I need a lecture on the pitfalls of vars. Do you honestly think I would use all immutable classes if I did not understand the risks of global VARIABLES?

My question is akin to the question of where and when vars are appropriate. Pure FP forbids variables, but experienced Scala programmers know when and where they are harmless and appropriate. Variables are never absolutely necessary, but in some cases local variables can actually be used for better and clearer code design. Similarly, global data is never absolutely necessary, but until someone can provide a good reason to believe otherwise, I think it can sometimes be used for better code design.

The airspace data that I mentioned is static and never changes at runtime. It may change once every few weeks or months, but the program would be restarted perhaps daily if not more often. Yes, it could be in a database or read from a file, but the actual source of that data is irrelevant to my question. My question relates to how the data is accessed by the classes that need it. Unless I am missing something, it can be accessed directly from anywhere in the program through global data, or it can be passed around through the stack as constructor and function arguments. My contention is that IN THIS PARTICULAR CASE there is no benefit to sticking with the “pure” FP approach of avoiding global data. I am willing to consider arguments to the contrary, but I am not looking for “pure” FP dogma.

As for the wind data, as I said before it is slightly more dynamic than the airspace data because it gets updated once per hour or so. So yes, a global variable is ultimately required here. I also said that the update is relatively easy to handle with locks, and you chided me for that statement. Again, I understand why locks can quickly get complicated, but I fail to see how that could happen in this case. The wind data is updated at one place in the code and is otherwise read-only. All that is needed is a lock to prevent any other thread from trying to read it while it is being updated once per hour. Moreover, the actual update can be done in the background, and the actual switchover can be near instantaneous, so the lock would be active for perhaps a fraction of a second once per hour. I fail to see how that can become unwieldy, unmanageable, or risky, but please let me know if I am missing something. Thanks.

I might suggest that using shared immutable global variables (i.e. defined inside an object and publicly accessible) is considered absolutely acceptable.
The problem arises when such global data is mutable: sharing and mutating such data (with or without locks) has many times shown to be the cause of any sort of unexpected problems.
Local mutable variables can be ok, right because they’re locally defined and used, and never escape the abstract data type or the function using them, thus avoiding the risk of data races and data corruption.

In your particular case, you seem to have a clear idea already of how harmless sharing an “almost-read-only” variable, but if you’re willing to be open for discussion, I’d be the first to argue that this kind of assumptions have demonstrated though time to be deceiving, at least in my experience. The reasons are generally that:

  • the requirements for the system almost surely will change with time, as it gets more sophisticated, usually breaking the assumptions that initially led to consider mutability “harmless”
  • even if a single global mutable var will be updated once a month, the code would need to verify the correctness of every read operation, with some sort of concurrency fences, be it a mutex, a double-read, defensive copying, whatever.

The solutions are there, but they have proven in time to be more error-prone than expected, as you already agree to accept. Any FP practitioner usually agrees that using some sort of “impure” container (whatever that may be) that will essentially provide you with an immutable value on-demand, that cannot be concurrently modified without explicit effects, is generally much more easier to handle.

Mentally you can imagine that you’re looking at a temporal “snapshot” of the value, and explicitly know that, so you can take for granted that, from the pov of your local code, the value will never change once you got a local handle on it. That’s the greatest advantage, to be free from the need to consider “what might happen” to that value in your code, behind your back, so to speak.

That said, you’re free to think otherwise, and maybe your experience will prove me wrong.
Obviously you’re allowed to make the call for yourself, thought I would personally choose differently from what you suggested, and I hope I explained you my reasons.

“I might suggest that using shared immutable global variables (i.e. defined inside an object and publicly accessible) is considered absolutely acceptable.”

OK, thanks. That’s good to know. It takes care of my airspace data, which is static and immutable.

As for the wind data, which updates at a very low rate, I don’t yet see any good way around a global variable, but I’ll give it more thought. Fortunately, it is not an issue at this point since my simulations don’t actually change the wind data yet. I am just planning for the future and trying to understand the best practices.

An advantage of threading the data through your functions is that it’s a lot easier for testing. Other than that I don’t think there’s anything wrong with having global immutable data.

2 Likes

Conceptually, application “state” would include any data available through the reflection API. Reflective data is immutable. So, yes, it is valid to have global immutable state since all Scala apps already have access to global immutable (reflection) state.

Brian Maso

Though some would say that accessing the runtime reflection API is bad.

1 Like

Global mutable data (no matter how frequently mutated) is generally completely incompatible with functional programming. Functional programming is fundamentally about avoiding mutation. From Wikipedia:

Mutable state is usually as limited as it’s feasible to do. In many cases mutable state can live just as long as method invocation making the method (as a whole) functionally pure for an external observer. Look e.g. at scala.List.take method scala/src/library/scala/collection/immutable/List.scala at 6ac6da8b61a3a427089a166c7802a940eac71064 · scala/scala · GitHub :

  override def take(n: Int): List[A] = if (isEmpty || n <= 0) Nil else {
    val h = new ::(head, Nil)
    var t = h
    var rest = tail
    var i = 1
    while ({if (rest.isEmpty) return this; i < n}) {
      i += 1
      val nx = new ::(rest.head, Nil)
      t.tl = nx
      t = nx
      rest = rest.tail
    }
    h
  }

As you see it uses many variables and even mutates data inside it (in this line: t.tl = nx), but none of the effects are visible outside of that method. The original List isn’t mutated at all in this method, the resulting List becomes immutable as soon as it’s returned. Variables are reinitialized every time based only on input, not on external mutable state.

I often use Akka actors to manage mutable state. This mutable state can be observed from outside world (well, an actor is mutable anyway, that’s his job to change state), but you cannot modify it directly. Instead, you pass immutable messages to an actor and also receive immutable messages as an answer. This way mutability management is contained within one place and not scattered over potentially entire codebase.

2 Likes