[SOLVED] Using `Future` combined with `synchronized`

In my application, I need to access a ZIP file, extract it to a temporary folder and provide the path of this folder. The respective ZIP files can get quite large, the ArchiveManager should implement the following features:

  • there should be only one instance of a manager per ZIP file (singleton)
  • the ZIP archive should be only extracted if the path to the extracted content is requested
  • the ZIP archive should be only extracted once
  • the extraction should be asynchronous

First thing, the implementation of the ArchiveManager:

final class ArchiveManager private(zipFile: Path) {
  private var extractedPath: Option[Path] = None

  def extracted(implicit ex: ExecutionContext = ExecutionContext.global): Future[Path] = synchronized {
    if (extractedPath.isDefined) {
      Future(extractedPath.get)
    } else {
      val f = extract(zipFile)
      f.foreach(path => extractedPath = Some(path))
      f
    }
  }

  private def extract(zipFile: Path): Future[Path] = ???
}

object ArchiveManager {
  private val instances = mutable.Map[Path, ArchiveManager]()

  def apply(path: Path): ArchiveManager = synchronized {
    if (!instances.contains(path)) {
      instances.put(path, new ArchiveManager(path))
    }

    instances(path)
  }
}

I have issues with the extracted function: Somehow it does not look right, that I provide a Future[T] but the whole thing is syncronized. But on the other hand, I don’t really know how I can ensure that a ZIP file is only once extracted.

Any ideas to improve my design?

How about you refactor it to a lazy future?

lazy val extractedPath: Future[Path] = extract(zipFile)

The first time you access it, it will trigger extract, but for all consecutive calls the path will be already extracted. Personally, I wouldn’t worry about a case when two first calls happen so quickly that there’s a risk they will trigger extract twice. If it’s really important, you can wrap the call in synchronized: lazy val extractedPath: Future[Path] = synchronized { extract(zipFile) } but I would first ensure if this is needed.

3 Likes

I did not know that Future[T]s can be consumed multiple times. If so, this could be a solution.

For all later calls, extractedPath will return an already completed future. But that looks like what you want, unless I’m missing something.

1 Like

Yes, that’s exactly what I want. Thank you!

Futures don’t have support for shared state, so you’ll have to pull in some additional construct. I’d prefer something higher level than synchronized, though. The Scala stdlib doesn’t have much much in this area, it’s common to drop to the Java stdlib. akka, cats-effect, etc. provide a variety of abstractions.

Your code contains (at least) one race condition - #extracted() will return an (incomplete) future before extractedPath is updated, so an immediate subsequent call for the same path will trigger a concurrent extraction.

Naive stab, untested, there will be some bug in there - there always is on the first take when I do concurrency, no matter how trivial… :expressionless:

class ArchiveManager(implicit execCtx: ExecutionContext) {

  import java.util.concurrent.atomic._

  private val extracted: AtomicReference[Map[Path, Future[Path]]] =
    new AtomicReference(Map.empty)

  def apply(path: Path): Future[Path] = {
    val paths =
      extracted.updateAndGet { 
        _.updatedWith(path)(_.orElse(Some(extract(path))))
      }
    paths(path)
  }

  private def extract(zipFile: Path): Future[Path] = ???

}
2 Likes

I just tested my implementation, and of course, you are right: Since Future is not blocking and returns immediately, the synchronized has virtually no effect.

But I don’t really understand your proposal: Why would the map of instances need to be an AtomicReference and why would it be in the class rather than in the object?

AtomicReference encapsulates atomic updates to the zipped/extracted map - it “replaces” synchronized.

My class replaces the singleton object in your code, “your” class is gone - I don’t see the need for a dedicated wrapper class for zipped/extracted pairs. It’s a class because I’d consider singletons that maintain mutable state a code smell - how do you get a clean instance for testing, what if at some point you want multiple instances, etc.

I added the singleton because I want that during one application session a managed archive is only extracted once. If within the application, two components within the application need the archive extracted, it should e extracted only once. Therefore, I thought, I’d need a singleton holding the ArchiveManager instances and so that no additional instance is created.

Then you should create one instance and share its reference with the two components. :slight_smile:

2 Likes

I now implemented that skeleton to understand what Future and synchronize do.:

I don’t get why supplyValue is invoked twice. Shouldn’t the synchronize prevent the other thread to enter the block before the first has finished?

With synchronized { Future { ??? } } you are again just creating the Future inside the synchronized block - the actual Future code runs elsewhere.

You can fix this by turning it around to Future { synchronized { ??? } }. However, I’d recommend against mixing in synchronized and plain mutable state altogether - this is subtle and error-prone. Future, AtomicXY, etc. are there exactly to avoid this.

As an example, the “fix” above should work, because value read/write are both synchronized on the Foo instance, so you get a “happens-before” relationship between them. In your main code, there’s no such link between the finished read/write, so the reading main thread is not guaranteed to ever see the write from the Future executor pool thread. (At least I think so - it’s been a while since I’ve been using low level threads, I may be confused…)

Just use higher-level concepts - that is, use AtomicReference for the value, sequence (or flatMap) and then Await the Futures in main.

3 Likes

Thank you for your response! It was late and I did not realize, that I haven’t moved the synchronized block inside the Future block.

I will use e higher-level concepts, I just wanted to understand why my solution did not work, and now I understand :slight_smile:

Doesn’t lazy val take care of that? Its initialization is implemented in a thread-safe matter.

1 Like

Probably? :slight_smile: I’m just not 100% sure. But I think so.

How could anyone use lazy if it wasn’t thread-safe? Maybe someone could clarify this. I’ll start a new thread.

I just never looked so closely into the implementation. Very often I use lazy in places where I don’t care about that (and then, in fact, if a lazy val is thread-safe, that’s an unnecessary overhead for me). As I wrote earlier, I’d use a simple lazy val, no synchronized, and only think about synchronized or another solution if this one was not enough.