Do I really need to close a file when finished?

jimka · July 9, 2019, 8:22am

I’m using several function based on the following model. IntelliJ warns me that Source is not closed, and suggests that I use try/finally to manage the resources.

Is this really a problem? So I need to store Source.fromFile(fileName) into a var so I can close it with finally? Or should I just let the garbage collector take care of that?

  def parseFileByName(fileName: String): DimacsFile = {
    parseBufferedContent(Source.fromFile(fileName).buffered, Nil, Nil, Problem("", 0, 0))
  }

sangamon · July 9, 2019, 9:47am

Use scala-arm, Using, or similar.

I don’t see a finalizer declared for Source/BufferedSource, so GC won’t trigger a #close(). But relying on GC to clean up non-memory resources is not a good idea, anyway - given sufficient memory, you could run out of file handles long before a GC is triggered, for example.

tarsa · July 9, 2019, 11:06am

Forgetting to close file will also (in addition to leaking resources) prevent Windows from deleting a file. Consider following scenario:

you create a temporary folder, put some files in it
you iterate over it using java.nio.file.Files.newDirectoryStream
you forget to close the directory stream
now you can’t delete that temporary folder as long as your process is living (unless finalizers close the stream, but you can’t count on that)

I had such problem when developing tests and it was hard to spot.

jimka · July 9, 2019, 11:10am

I never work with Windows, so clues like that are pretty useful. Thanks @tarsa.

curoli · July 9, 2019, 12:43pm

It is considered best practice to be on the safe side and yes, use try/finally.

Is it necessary? Good question.

Opening a file consumes resources, because the system will provide input and output buffers, which can be substantial. On some systems, I heard, there is a hard limit on the number of files one can open. I suspect the limit is quite large on modern systems. Memory consumption will effectively limit the number of open files, though I am not sure, but how much. This, by the way, is operating system memory, not JVM heap space.

If you close the source as soon as you are done, you know the resources have been released again, i.e. the buffers have been flushed and de-allocated.

If you do not close it yourself, the JVM will close when the file handle is garbage collected, or when the JVM terminates. You may not know when that will happen, and it may not happen soon. AFAIK, garbage collection is triggered by heap space consumption, not by number of file handles used.

jimka · July 9, 2019, 12:53pm

Why doesn’t the file just close when the object goes out of scope? Shouldn’t it ideally get closed whenever it can no longer be accessed?

sangamon · July 9, 2019, 2:01pm

On the JVM, there simply is no deterministic runtime concept of “going out of scope” like there is in C++. There’s only finalizers, and good reasons to stay away from them.

jducoeur · July 9, 2019, 3:50pm

Yeah – it’s generally accepted that any sort of external resource should be formally closed when you’re done with it; otherwise, you’re just begging for trouble, as leaks kill your system unpredictably. There are plenty of different ways to ensure that you close things properly (a few are alluded to above), but you should be sure to do it somehow; it usually won’t just happen magically, because scope doesn’t work that way in the JVM world unless you deliberately build that into your code.

(The second-most serious bug Querki has ever had was because of this. For months, my entire cluster would occasionally crash. I eventually traced the problem to a missing .close() call.)

curoli · July 9, 2019, 5:40pm

A Source, if connected to a file, contains a FileInputStream, which contains a FileDescriptor. Once the FileDescriptor gets GC’ed, the file will be closed.

Objects are not guaranteed to be GC’ed imediately after their last reference gets out of scope. Although they may be. They may be collected soon after, as a GC typically does some partial collections that remove newer objects first, but there is no guarantee.

sangamon · July 9, 2019, 6:24pm

More precisely, the FileInputStream has a #finalize() implementation that calls its #close(), which in turn calls the FileDescriptor’s #closeAll(). This finalizer may be invoked at some point in time after the stream instance (and thus its referring source instance) have become unreachable, but it’s only guaranteed to be invoked before the storage previously occupied by the source instance is reused (JLS §12.6) - or maybe even not at all, if the memory is not reused and/or the JVM exits without running finalizers. That’s a pretty weak guarantee.

Furthermore, this is just the status quo of the current internal implementation of Source, which is nothing I’d rely on. From the public contract of Source, i.e. its interface and Scaladoc, all we know is that it implements Closeable and thus should be (explicitly) closed.

tyohDeveloper · July 9, 2019, 6:55pm

The finalize capability is just to reduce the number of problems the OS has to deal with. It might seem like a cool thing for the developer. It isn’t anything a developer can use. There isn’t anyway for the vm to determine if they have been successfully run even if it has a chance to run a full GC.

It’s one of the many reasons we insisted that parsers ran in separate processes, not just in different threads. At least on our bigger systems. Being able to kill the majority of processes that might have I/O hang ups makes us able to keep the core running while under attack or other I/O problems. The large system equivalent of, “Have-you-rebooted-your-system?”

That’s massive overkill for most small systems and individual programs. Having the programmer close files when done with them, or in a try/finalize block at worst was a standard part of design reviews.

cbley · July 10, 2019, 6:30am

In Scala 2.13, there is a Using helper one can use to manage resources and release them automatically.

def parseFileByName(fileName: String): DimacsFile = {
  Using.resource(Source.fromFile(fileName).buffered) { source => 
    parseBufferedContent(source, Nil, Nil, Problem("", 0, 0))
  }
}

tyohDeveloper · July 11, 2019, 2:24am

It looks like the jvm people are killing finalize.

https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8212050

curoli · July 11, 2019, 11:14am

From what I heard, that FileInputStream.close() also closes the FileDescriptor was a bug, because some one else might still be using that FileDescriptor.

I also think that the JVM treats FileDescriptor special in that it will close a file as soon as it discovers that the FileDescriptor is unreachable, but I can’t find any documented guarantee for that, so I agree it’s good not to rely on it.

philipschwarz · July 13, 2019, 6:33am

Hello @jimka.

The latest edition of Programming in Scala (PiS) predates Scala 2.13 (which introduces Using), and so tells us how to roll our own control structure to implement a coding pattern that it calls the loan pattern.

From PiS section 9.4:

Consider now a more widely used coding pattern: open a resource, operate on it, and then close the resource. You can capture this in a control abstraction using a method like the following:

  def withPrintWriter(file: File, op: PrintWriter => Unit) = {
    val writer = new PrintWriter(file)
    try {
      op(writer)
    } finally {
      writer.close()
    }
  }

Given such a method, you can use it like this:

  withPrintWriter(
    new File("date.txt"),
    writer => writer.println(new java.util.Date)
  )

The advantage of using this method is that it’s withPrintWriter, not user code, that assures the file is closed at the end. So it’s impossible to forget to close the file. This technique is called the loan pattern , because a control-abstraction function, such as withPrintWriter, opens a resource and “loans” it to a function. For instance, withPrintWriter in the previous example loans aPrintWriter to the function, op. When the function completes, it signals that it no longer needs the “borrowed” resource. The resource is then closed in a finally block, to ensure it is indeed closed, regardless of whether the function completes by returning normally or throwing an exception.

It then uses currying to provide a version of withPrintWriter that is nicer to use

    def withPrintWriter(file: File)(op: PrintWriter => Unit) = {
      val writer = new PrintWriter(file)
      try {
        op(writer)
      } finally {
        writer.close()
      }
    }

it then provides a sample usage:

  val file = new File("date.txt")
  
  withPrintWriter(file) { writer =>
    writer.println(new java.util.Date)
  }

sangamon · July 14, 2019, 7:30pm

That’s a good start, but this is a pattern that needs to be reimplemented for every usage scenario where in the long run one probably wants a reusable library that ideally supports these concerns:

Applicable to arbitrary “releaseable” resources
Returns a (properly typed) value from execution
Reasonable non-nesting syntax for multiple resources
Proper exception suppression chaining
Failure mode wrapping to Either or Try

All of these are covered by both scala-arm and Using. I haven’t worked with Using, yet, and will give it a try, but I think personally prefer scala-arm’s “monadic” (#flatMap()/for-expression) based approach to point 3 above.