From Python to Scala

kiuhnm · June 16, 2024, 8:35am

I’ve been using Python for a while and playing with its recent static type system.

When I discovered ways to do type-level programming (e.g. I implemented a list that knows its length, updates it correctly, does bounds checking, etc… all at type-checking time) I was told that I was “abusing” Python’s type system.

Indeed, I discovered some interesting tricks that then went away in later versions of the type checker I was using. The author of that type checker was quite dismissive of my work, so I gave up instead of looking for eventual workarounds.

TypeScript’s type system, although unsound, is quite powerful for type-level programming (e.g. I implemented a simple language in it (i.e. parser + interpreter)), but, unfortunately, I don’t like JS very much.

Before diving into Scala, I’d like to know whether what I want to do is possible or not (but I’ll probably move to Scala anyway).

Instead of avoiding mutability, I’d like to control it. I don’t need anything too fancy. I’d simply like to split classes’ (or types’, in general) interfaces into at least 2 (overlapping) versions: read-only (r) and read-write (w).

(Sorry for using Python terminology.) As an example, let’s say we have a list[int]. After the split, we get list_r[int] and list_w[int]. The former is read-only and the latter read-write. Note that, since list is a pre-existing type, we’ll need a lift function to lift a regular list to a list_*. Now let’s define the following function:

def f(xs: list_r[int], ys: list_w[int]) -> list_w[int]:
    ...

Here’s an example of the behavior I want:

rl: list_r[int] = lift([1, 2, 3])     # also `lift([1, 2, 3], 'r')`
wl: list_w[int] = lift([4])           # also `lift([4], 'w')`

f(rl, rl)        # error: can't convert a list_r into a list_w
f(rl, wl)        # error: write access needs explicit permission
f(rl, w(wl))
f(wl, w(wl))     # note the w -> r coercion
f(rl, w(rl))     # error: can't convert a list_r into a list_w

Also, note that list_r[T] is covariant in T, as it should.

The actual system had 4 access modes:

r: read-only access
w: read-write access
rk: keep in read-only mode
wk: keep in read-write mode

“keep” tells a function that it can keep the value. Without “keep”, a function must completely forget about that value before returning. This holds recursively, of course.

This is also useful to eliminate redundant copies. For instance, a constructor may require a list_wr[T], which means that it may keep and use the list indefinitely, without having to make a copy.

The read-only and read-write qualities are actually enforced:

def f(xs: list_r[int], ys: list_w[int]) -> list_w[int]:
    xs.sort()            # error: no in-place sorting allowed
    xs[0] = 4            # error
    ...
    ys.sort()            # error: permission needed
    w(ys).sort()
    w(ys)[0] = 3
    return xs            # error: can't convert list_r to list_w
    return xs.copy()     # NOTE: `copy` always returns a list_w

Note that, at runtime, list_r and list_w (and list_rk and list_wk) are just regular lists, and lift and w just identity functions. This is all just a static type mechanism.

The “permission part” is the trickiest to implement. I used an ugly hack:

@lock
def f[L](_: L, xs: list_r[int], ys: list_w[int, L]) -> list_w[int, L]:
    ...

I use lock to create an “external” signature for f. The callers will see the following signature:

def f(xs: list_r[int], ys: list_w[int, _Unlocked]) -> list_w[int, None]

L is a type param for a “lock”. When created by lift or directly by list_w, L is just None. To appease the external signature, the caller must use w for ys, which converts ys from a list_w[int, None] to a list_w[int, _Unlocked]. Note that the return type must have L = None or that list_w would be pre-authorized, which is something we don’t want.

OTOH, the implementation of f sees a generic list_w[int, L] and so it must use w to pass ys to other functions. In particular, f can’t call itself by forwarding its args:

@lock
def f[L](_: L, xs: list_r[int], ys: list_w[int, L]) -> list_w[int, L]:
    f(xs, ys)           # type error: needs explicit write permission
    f(xs, w(ys))

Of course, the example above recurses forever, but that’s beside the point

In case you’re wondering, that _: L is a hack I need to get a hold of the type param in lock so that I can specialize it into _Unlocked for the args, and None for the return value. In Python there’s no way to manipulate a signature in a general way.

I can’t get the latest version of the type checker to replace L with the intended types anymore, which broke my code That’s apparently my fault, since I was relying on undefined behavior. In my defense, most of Python type system’s behavior is undefined. Unless I’m doing really basic stuff, I always get different results with different type checkers. I’ve had enough.

BTW, implementing w (+ r, rk and wk) is easy enough, but lift is problematic because it operates on a type we don’t control. The obvious solution is function overloading (list → list_w, set → set_w, etc…), but it’s an inefficient one, as overloading resolution is usually cubic in the number of overloads, and, even worse, one would’ve to import all the supported types at once. I opted for local lifts. For instance, let’s say a file uses list, set, and dict. We just create a local lift as follows:

lift = Lift[list_w | set_w | dict_w]()

Lift’s implementation is full of “abuses”, of course!

Hoping you’re still with me, my question is the following:

Can I implement something like that in Scala?

spamegg1 · June 16, 2024, 10:24am

Welcome to the Scala community @kiuhnm

Yes you can do this fairly easily in Scala, using mostly Scala built-in mechanisms. The Python pseudo-code you wrote can be turned into working Scala code with some work.

Scala has lots and lots of different types of collection, some mutable, some immutable. You can parametrize all of them with type parameters, and combine them and convert them to each other easily with built-in methods. You can also let the compiler check if something you don’t want converted is converted, since mutable and immutable collections are separate.

You might want to start here Scala for Python Developers | Scala 3 — Book | Scala Documentation

Here are some online courses that teach “immutable first” approach and also type parametrization Online Courses (MOOCs) from The Scala Center | Scala Documentation

kiuhnm · June 16, 2024, 11:38am

Thank you!

I’d like to clarify a few things to make sure we’re on the same page.

Everything I wrote is (well, “was”) working Python code No pseudo-code there! Even `…` is part of the language.

In my approach, all types are mutable and I only add access controls.

This is different from having mutable and immutable types. The advantage of my approach is that there’s no runtime at all, i.e. at runtime nothing happens. If I have a vector, I can pass it to you in read-only mode and you can only read it, but the vector is still mutable and there’s no actual conversion or copy at runtime.

If I want, I can define the r and w “parts” of a type so that the r part only references r parts of contained/referenced/reachable types. Imagine a big graph of connected objects. If the only thing I give you is read access to a single node, you will never be able to modify any node accessed through that single node (if that’s what I want). Yet, all this is just a compile-time thing.

In my system, an object you only have read-only access to is virtually immutable.

The r and w parts need to be defined, of course. If you own the type, you can just implement the two parts (in general, type_w <: type_r, and type_r is covariant in its type params). If you don’t own the type, then you only create interfaces and you have to lift the objects.

My system is unusual so I’d need to implement it through some type-level magic. It’s also independent of Scala’s built-in collections.

sageserpent-open · June 16, 2024, 2:37pm

That is similar to how the standard Scala collections work - traits in collection are essentially your read parts and those in collection.mutable extend them to add in-place mutation operations. That fits in with your desire to have the write part <: the read part.

You will also observe that there is a collections.immutable. A type that lives there is not the same as say, a Python frozen set or a Java non-writeable snapshot type that is a logical copy of some mutable instance. The inhabitants of that world make new instances to embody change operations, sharing a lot of internal structure to make things reasonably performant and reasonably sparing of heap.

Non writable collection traits are covariant on their element type, with the notable exception of sets and maps (on their keys). I’ll leave it to others to go over this, it’s been done to death already.

Sure, if it’s in Python it’s independent of the Scala collections, but if I were you, I’d give the standard stuff in Scala a quick whirl first before porting your magnum opus. You might find it quite nice.

spamegg1 · June 16, 2024, 2:59pm

Agreed with @sageserpent-open that’s what I was trying to say.
It seems like your design is trying very hard to overcome Python limitations, which don’t exist in Scala, just built-ins do the same job.

If you still want to pursue your idea, first look into the well-known advanced libraries like Cats / ZIO (first study basics of Scala), they probably have a more general version of it already implemented. For your “no runtime” idea, you might want to look at the “programs as values” paradigm too, which is implemented in ZIO & Cats I believe Programs as Values, Part I : Intro & Compositionality You can write programs and pass them around as values without being evaluated.

jducoeur · June 16, 2024, 4:13pm

Just to underscore a point that @sageserpent-open made: “immutable” generally doesn’t just means “you can’t alter this” – it means “this data structure is highly optimized to create copies with added/removed elements”.

A large fraction of Scala programming is done entirely with immutable data structures: where you’re never actually changing it, you’re making a modified copy. (Essentially all of my programming works this way, and has done for many years now – it’s surprisingly beneficial, especially in a multi-threaded world. I more or less never use mutable data structures.)

Everything you’re describing is totally doable, but in most situations is less useful than conventional immutable data structures. I don’t know what your use case is, so it’s hard to say whether those data structures would suit your needs or not, but it would probably be worthwhile for you to read into them before spending too much effort.

One major difference worth keeping in mind: the actual implementations of mutable vs immutable are often wildly different. So for example, Scala’s standard library has a common Map trait that is neutral, with both mutable and immutable implementations. (Quite a lot of different implementations, actually.) But the optimal implementation of mutable vs immutable are very different, so there can be performance costs of trying to be super-flexible back and forth at runtime.

Ichoran · June 16, 2024, 6:01pm

From what you’ve said, your system is not unusual in goal, only in implementation.

In Scala, it’s more customary to model capability-hiding using inheritance. For instance, the collections define scala.collection.Seq which could be backed by a scala.collection.mutable.ArrayBuffer. (Which you could view as a scala.collection.mutable.Seq.)

If you need your own hierarchy, you can implement it the same way: abstract class Read() plus final class Write() extends Read().

And the locking can be implemented by a lightweight (no-runtime) type-tagging functionality, even if you can’t add type parameter that models locked/unlocked. It’s especially easy in Scala 3 with opaque types. Typically people will try to accomplish the same goal by using some monadic construct (e.g. IO), but that might be overkill for your case.

trait Guarded {}
trait Open extends Guarded {}
trait Closed extends Guarded {}

opaque type Guard[A, G <: Guarded] = A
object Guard:
  extension [A, G <: Guarded](g: Guard[A, G])
    def use[B](f: A => B): B = f(g: A)
  extension [A](g: Guard[A, Open])
    def get: A = (g: A)

I think you have control of your hierarchy so you probably don’t need this; you can just make Thing[A, Guarded].

kiuhnm · June 17, 2024, 2:21am

In other words, the collections in collections.immutable are persistent. They’re certainly useful, but sometimes I want the change to be globally visible without having to update any reference or pointer. That requires a mutable collection. Yet, I still want to enforce read-only access to it most of the time. From what you said, Scala already does what I want in that regard as there’s an implicit conversion from mutable to immutable for all the standard collections.

That’s the easy part. The hard part is the “permission feature” I described in my initial post.

Ichoran · June 17, 2024, 4:08am

No, that’s not correct. Immutable collections use different data structures, many of which allow inexpensive creation of a new immutable collection that reuses almost all of the old collection. Thus you can have your data-is-always-self-consistent cake and eat your can-produce-updated-versions-inexpensively cake too (though there is some performance hit for each). This generally is hard to make work for graphs, but for trees and lists it’s doable. The internal consistency really simplifies things: if you can’t change anything, you don’t need permissions.

However, both mutable and immutable collections are subclasses of collection superclasses that only have accessor methods and no mutation methods. This allows you to use a mutable collection in a read-only mode (because you do not have a method visible that would allow you to alter it). So you also have the mode you’re talking about built in: take collection.Seq (for example) as your read-only flavor, but collection.mutable.Seq as your writable one (both with the same underlying implementation, perhaps collection.mutable.ArrayBuffer).

If you want a permissions system, the easiest way these days is probably to use opaque types with Scala 3 if you need to wrap pre-existing classes, or just add another type argument if you are creating new classes. In order to create permission-specific capabilities, either use implicit evidence on the methods or write an extension method specific to one of the type parameters.

Here’s a full example of the inheritance model on a new class that also has the permission as a second type parameter: Scastie - An interactive playground for Scala.

kiuhnm · June 17, 2024, 4:53am

That’s exactly what I do in Python.

I don’t completely understand the code you posted, but I suspect it doesn’t do what I want.

Here’s the behavior I want:

def g(xs: set_w[int], ys: vector_r[int]) -> set_w[int]:
    ...
    return xs

def f(xs: vector_r[int], ys: vector_w[int], zs: set_w[int]):
    f(xs, ys, zs)            # type error: permission needed
    f(xs, w(ys), w(zs))      # OK
    zs2 = g(zs, ys)          # type error: permission needed
    zs2 = g(w(zs), ys)       # OK
    f(xs, w(ys), zs2)        # type error: permission needed
    f(xs, w(ys), w(zs2)      # OK
    ...

Can your code achieve that?

Note that:

It’s all automatic: there’s no manual wrap/unwrap of any kind.
f and g are just regular functions/methods that can have any form and appear anywhere without any restrictions.

EDIT: I’ve just seen you posted a full example. I’ll have a look at it ASAP.

Ichoran · June 17, 2024, 5:15am

Here’s an opaque type version (only works in Scala 3) that allows you to reveal only part of the full functionality that might be available: Scastie - An interactive playground for Scala.

The read/write is still implemented with an inheritance hierarchy, but if you were willing to do more work you could use an opaque type and reveal only the non-mutating methods on a class with interior mutability and no inheritance hierarchy.

Ichoran · June 17, 2024, 5:20am

Huh? What is w if not a manual wrap/unwrap?

sageserpent-open · June 17, 2024, 5:22am

A quick meta-question, @kiuhnm …

Your initial post stated that w is at runtime an identity function, so you want a change in static typing only.

If that’s right, then what purpose does it serve over and above the having a list_w? Regardless of whether you implement the conceptual split via some kind of runtime coercion, or are simply looking at different views through the inheritance hierarchy, you (in the role of programmer) have to make a conscious choice upfront to say, “I want a read-only view” versus “I want a writeable view”.

So having made that choice, what does wrapping the writeable view in w add to the mix?

I could understand if w were to perform some runtime permission check based on identity management - so list_w is a way for the code to say - “If the user wants to play with this, they need to be cleared”.

Otherwise w just seems to be a syntactic “Are you sure?” note to the programmer.

Am I missing something obvious here?

Are you trying to encode a dynamic aspect into list_w so you can pass one down through multiple calls without any of the levels knowing whether or not said list is cleared for mutation or not? So the outermost caller decides whether or not the mutation performed by the inner calls is permitted at runtime?

Your original example has comments stating that you expect errors when list_w is not given the w-treatment, but I’m unsure whether this is a static type error or a nested runtime error you have in mind….

kiuhnm · June 17, 2024, 8:26am

Manual (easy):

def f(xs: list_w[int]):
    xs_r = r(xs)
    f(xs_r)          # type error

Automatic (hard):

def f(xs: list_w[int]):
    f(xs)            # type error

The hard thing is to force a type error in a situation where there shouldn’t be any, almost by definition. I explained how I got that effect in Python.

kiuhnm · June 17, 2024, 9:06am

I appreciate the meta-question

No, there’s no runtime behavior whatsoever.

It’s like when you are the admin but all the applications you run still run with reduced privileges by default.

Let’s say g wants to modify xs:

def g(xs: list_w[int]):
    ...

The first thing g does is get a sorted copy of xs:

def g(xs: list_w[int]):
   xs_sorted = xs.sort()
   ...

Oops, that’s already a bug since sort sorts in place. That’s still valid code, though, as xs_sorted will be simply assigned the value None.

That may seem an exaggeration, but imagine this situation:

def f(xs: list_r[int]) -> int:
    ...

def g(xs: list_w[int]):
    x = f(xs)
    ...

Then we or someone else refactor the code and the signature of f changes from (list_r) -> int to (list_w) -> int for whatever reason. That might introduce a bug.

To be safe, we should always restrict xs to a read-only list:

def g(xs: list_w[int]):
    x = f(r(xs))
    ...

That’s tedious, noisy, inefficient (a no-op call is still slow in Python), and error-prone (you might forget r).

Basically, what I want is for r to be implicit. In other words, f(xs) always means f(r(xs)).

sageserpent-open · June 17, 2024, 9:35am

Hmm - you didn’t mention w in your last response, so I’m still a bit puzzled as to you’re trying to achieve.

At face value, you have the desired implicit ‘conversion’ via subtyping in Scala - so a collection allowing mutation can be referred to via a read-only supertrait, thus protecting it from accidental mutation. Job done.

I read this to mean that you want a double-check in your code - your concern is that because the mutable type is a sub-type of the read-only view type, someone might change the callee’s signature to allow it to mutate the argument, unbeknown to the call site.

That leads to having a w to confirm at the call site that yes, it’s OK to pass a collection using a type that allows mutation - and there is a corresponding r, but that doesn’t really add anything to the mix as the read-only view is safe anyway, so you’ve left that out as a design decision.

The thing is, if in your example, g is taking a mutable argument (because it uses list_w), why not just relax and accept that anything in the execution path in g might mutate it. That what g’s signature tells me.

You seem to want to use static typing, but you don’t quite trust the programmer to actually use the correct types, so there needs to be this “Are you sure?” step - namely w - when you pass a mutably-typed thing to a call…

If I’ve got that right, then I think you’re making a rod for your own back. It’s your party, but I’d just stick with having the mutable type as a subtype of the read-only view and trust the developer to do the right thing.

Ichoran · June 17, 2024, 7:28pm

Yes, it’s very hard to force a type error where there shouldn’t be any, because it only happens when you aren’t trusting your types to convey what you’re allowed to do (which is what types are for).

This does not seem like a wise design decision to me, not necessarily because a priori it’s a terrible way to do things, but because it’s so unlike all the normal strategies used to improve safety. Basically you’re asking for an interface that has a different outward-facing type than inward-facing one. This seriously violates expectations about how you can rearrange and refactor your code, especially since Scala works heavily with lambdas.

Inside the body of the function, you don’t actually want list_w to pass to other functions as list_w. But what about something like

extension [A](a: A)
  inline def tap(f: A => Unit): A = { f(a); a }

Should f see the list_w as a list_w or only a list_r? On the one hand, it’s just a function. On the other hand, it acts like a built-in operation on A.

I know for exactly your use case it probably works fine, but it doesn’t interact well with the rest of the ecosystem. If your main problem is people randomly changing function signatures to mutate when they’re not allowed to, but almost all operations being read-only so that specifying the type is too great a burden, I recommend requiring an explicit permission token for all write operations. If you don’t have the correct token, you can’t do anything. The token doesn’t have to have any content if you don’t want it to. You can arrange things so that you can only get a permit at object creation time.

The advantage of this strategy, despite being a little bit more work than what you are doing, is that it isn’t so unexpected. People understand permissions tokens, and because it is explicitly separate from the writable object itself, you don’t expect, nor can you, just successfully pass the writable object around instead of the readable version by changing the type signature. It’s all the same object! It’s just that you need a token to unlock certain behaviors (and by your request, that token is explicit).

This is most secure with dependent typing, so that for any mutable variable m it requires a particular instance of a permission p that has type p.type.

It isn’t as painless as the undefined behavior you stumbled into with Python, but it is principled, comprehensible, and not an outrageous amount of work to get at least basic functionality. Here’s an example: Scastie - An interactive playground for Scala.

This isn’t the only possibility; there are various others which have their own tradeoffs.

But, more fundamentally, you should think very carefully about whether your existing solution is actually optimal, or whether it had some advantages and it worked so you adopted it, but might not be the ideal behavior that induces people to write code that isn’t error-prone.

kiuhnm · June 18, 2024, 2:42am

I believe you’re not thinking recursively.

Please read my explanation until the end even if you think I’m not addressing your objection. It should be clear by the end.

g’s signature is a contract between g and its callers. It says that it may mutate its arg.

That doesn’t mean that, inside g, mutations to that arg shouldn’t be controlled.

If I give you full authority, I know you can do whatever you want, but that doesn’t mean that you will give full authority to your subordinates. The same is true for your subordinates if they have subordinates of their own.

Authority is re-modulated at each level. The only rule is that a level can’t have authority higher than its parent level.

As I said in previous posts, when you call a function f that promises not to mutate its arg, you, the caller, should give f the least privileges necessary to do the job. You do this by passing r(xs) instead of just xs.

r(xs) protects you from changes to f’s signature (just like changes in loyalty), whereas xs do not. If you give a subordinate full access and they change their allegiance later on, you’re scr**ed. If you follow the least privilege principle, you’re still safe.

r(xs) also protects you from misremembering signatures or resurrecting old code that it’s not valid anymore.

Here comes the most important part.

Why am I talking about r instead of w? Because w was introduced to avoid r.

Since 99% of the time, at least in my code, functions only require read-only access, instead of assuming write-access and having to call r to restrict it, I prefer the opposite: read-access is assumed, and I have to call w to grant write access.

With this system, xs is as safe as r(xs) since xs also implies read-only access. Since xs is passed in read-only mode 99% of the time, no access modifiers will be needed most of the time. Only when a function requires write access, will I need a modifier, i.e. w.

In my full system, there are 4 modes: R, W, RK, WK. When we receive a WK value, that value will be implicitly restricted to R, so we’ll have to lift the restriction to W, RK or WK as necessary. Obviously, if we receive, say, a W value, that will be restricted to R, but we won’t be able to lift the restriction to higher than W.

sageserpent-open · June 18, 2024, 5:39am

@kiuhnm Thanks for your follow-up; what you’re describing tallies with what I thought you had in mind.

In my view, anything in the call tree of g forms part of g’s abstraction - g’s caller shouldn’t have to worry about that. If the implementation of g really wants to restrict mutation of one of its arguments passed further down the call tree, it can cast that argument to the narrower read-only super type and pass that down. That’s what your hypothetical r would do.

My standpoint is that having to give an argument the r-treatment is cheap and cheerful, but not generally required unless g wants to enforce some strict local reasoning about state changes. You worry about breaking changes occurring as code in the call tree evolves, but these aren’t just about mutation versus non-mutation - your non mutative calls may just yield wrong results.

That what tests are for, after all.

No bother, this is a case of differing approaches, that’s all. Just wait until you come across the whole imperative versus pure-functional debate…

Anyway, welcome, and thanks for taking the time to go through this in detail.

sageserpent-open · June 18, 2024, 6:13am

Just occurred to me, I’ll shut up after this:

It seems that control of accidental mutation is the primary concern here. I’d strongly recommend experimenting with writing some code using the stuff in collections.immutable as per others’ advice.

I suspect one reason that I’m so laissez-faire about mutation control is that for me, most of the time there is are no mutations to worry about in the first place.