Why can't the Iterator work correctly?

curoli · September 13, 2018, 8:46pm

Reasons to use Iterator:

(1) There are potentially so many elements, you don’t want to have to keep them all in memory at the same time (e.g. traversing lines of a very large file)

(2) You want to start retrieving the first elements while later ones are not yet available (e.g. results from a database query or webservice)

(3) The elements are expensive to create, and maybe you don’t need them all (e.g. some expensive calculation)

(4) The number of elements is very large or even indefinite, but you only need the first ones (e.g. calculate prime numbers in sequence, keep going until I find one I like)

curoli · September 13, 2018, 9:23pm

Yes, I was wrong, you are right and the book is wrong.

MikuChaser · September 14, 2018, 5:08am

I think Martin Odersky may not want to create a lazy map method. Is it a bug?

martijnhoekstra · September 14, 2018, 7:50am

What Martin Odersky wants to create or not is not really germane to the discussion.

The behaviour I’m showing is exactly the behaviour promised by the Iterator contract. It’s not a bug.

If you want to be able to iterate over an iterator twice, duplicate it.

Welcome to Scala 2.12.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161).
Type in expressions for evaluation. Or try :help.

scala> val src = Iterator("a", "number", "of", "words")
src: Iterator[String] = non-empty iterator

scala> val (i1, i2) = src.duplicate //invalidate src, get two for the price of one
i1: Iterator[String] = non-empty iterator
i2: Iterator[String] = non-empty iterator

scala> val target = i1.map(_.length) //invlidate i1
target: Iterator[Int] = non-empty iterator

scala> target foreach println
1
6
2
5

scala> i2 foreach println
a
number
of
words

MikuChaser · September 14, 2018, 7:57am

OK. I will remember this usage.
Thanks.

hmf · September 14, 2018, 9:07am

No it is not a bug. For an explanation of why, look at:

Its seems this will change in 2.13 and map should retain its laziness.

siddhartha-gadgil · September 14, 2018, 10:26am

One may want to use an iterator only once to save memory, e.g. to iterate over all permutations.

regards,

Siddhartha

MikuChaser · September 14, 2018, 5:15pm

A new problem! If a iterator uses duplicate method, the old one is still non-empty. And if I exhaust the old one, the new two will become empty. Amazing!

SethTisue · September 14, 2018, 6:37pm

That seems to me like the expected behavior for duplicate. I don’t see any other way it could behave.

Jasper-M · September 14, 2018, 7:32pm

After calling duplicate on it you should discard it and only work with iterators that duplicate gives you.

MikuChaser · September 15, 2018, 3:21am

After using the duplicate method, shouldn’t it be exhausted automatically? In other words, if I accidentally use the old one, there will be a serious mistake, is it? But the book says “By contrast the original iterator, it, is advanced to its end by duplicate and is thus rendered unusable”. I expected the interpreter throws an exception if I use the old one rather than I should remember that the old one can’t be used. When the code is long, programmer may make a careless mistake.

MikuChaser · September 15, 2018, 3:23am

Why? I expected the interpreter throws an exception that prohibits me to use the old iterator.

SethTisue · September 17, 2018, 3:04am

You can certainly imagine a design where invalidated iterators are forcibly invalidated by either rendering them empty (but without actually consuming any items), or by making them throw a runtime error if any of their methods are called.

Such a design hadn’t occurred to me. I’m not sure if such a design was ever considered.

martijnhoekstra · September 17, 2018, 10:06am

Expecting anything after you used a method on an Iterator that’s not isEmpty, next or hasNext is wrong – it’s undefined behaviour.

This coincidentally is the third thread about iterator re-use I’ve seen in a short period of time. What’s the underlying cause of this question?

curoli · September 18, 2018, 2:29pm

Not sure about the timing, but Iterators can be confusing to Scalani, because the API looks so much like that of an immutable standard Scala collection (i.e. things we use every day), but it does not behave like one.

Ironically, you do not need Iterators. Something like a lazy linked list could do the job at least as well.

Jasper-M · September 18, 2018, 2:44pm

Isn’t the problem with a lazy list or stream always that it’s way too easy to hold on to the head by accident, preventing all subsequent elements from being garbage collected?
Isn’t a View (in 2.13 at least…) supposed to be more or less an Iterator that can be reused?

martijnhoekstra · September 18, 2018, 4:10pm

Not always, lazy list still memorizes, and prevents collection if you hold on to its head. Iterators don’t do that.

curoli · September 18, 2018, 4:42pm

Right, you’d have to always discard the head and keep the tail to replace an iterator.

curoli · September 19, 2018, 2:15pm

Absolutely right. You would typically want to obtain the lazily linked list and immediately consume and discard it - just like you would with an iterator.

If you start holding onto a lazily linked list and pass it around like candy, then the benefit of the laziness gets lost. On the other hand, do the same with an iterator and likely you will find it in a state different from what you expected.

AKAIK, views keep the entire collection in memory, so they are no substitute.

If the Scala community prefers iterators to lazily linked lists, that would be the first time I am more FP-minded than the community.

charpov · September 20, 2018, 1:06pm

I suppose you could do something similar to Java’s fail-fast iterators: use a flag to detect that a method has been called and check for it everywhere to throw an IllegalStateException explicitly. Small runtime cost but would avoid incorrect code that appears to be working fine until an internal implementation changes.