Fetch sub-collection of List

I have a very big list. I want to take some action as this:

list.sample(xx) // return a sub-list who contains xx random elements from the list

how to implement this? thank you

This may not be the fastest implementation but it would be quick and easy to code.

You can use a combination of Random.shuffle and take to implement this.

See this blog post: How to shuffle (randomize) a list in Scala (List, Vector, Seq, String) | alvinalexander.com

The shuffling algorithm should have a complexity of O(log n) IIRC.

A more efficient way would be to randomly select indices and stick them in a Set until the Set is a given size.

1 Like

I am afraid shuffle is not effective since the list is very big (more than 10million elements). thank you.

This might work for you.

Whoops made a little mistake with it. this should be better Scastie - An interactive playground for Scala.

1 Like

Well, you may try to search for know algorithms to do this.
I would guess that a simple approach would be something like this: Scastie - An interactive playground for Scala.

Let me know if you have any questions about the implementation.

2 Likes

I see that there’s a second thread about this over at How do you think of my subset function? — I’ll respond there.

can scala officially add some functions which are available in breeze? currently i have to import breeze for such data analysis requirement. for instance, the sample() function.

It seems normal to me that you would use a separate library for that.

1 Like