[SOLVED] Regular Expressions in Scala - Capture group problem

I have issues to get all capture groups from a regular expression and I believe I am doing it wrong, but I quite don’t know what the correct approach would be.

Given regular expression: ^(([A-Za-z0-9]+)(?::((?:[a-z]{2})|\*))?\s*(=)\s*)?("?)([^"\s]+)\5$
Given value to match: property:xy = "value"

In Java I would do something like that:

import java.util.regex.Pattern;

class Scratch {
    public static void main(String[] args) {
        var regex = "^(([A-Za-z0-9]+)(?::((?:[a-z]{2})|\\*))?\\s*(=)\\s*)?(\"?)([^\"\\s]+)\\5$";
        var string = "property:xy = \"value\"";

        var matcher = Pattern.compile(regex).matcher(string);

        if (matcher.matches()) {
            System.out.println(">>> Count: " + matcher.groupCount());
            System.out.println(matcher.group(2));
            System.out.println(matcher.group(3));
            System.out.println(matcher.group(4));
            System.out.println(matcher.group(6));
        }
    }
}

Now, over here in Scala I tried that:

val regex = "^(([A-Za-z0-9]+)(?::((?:[a-z]{2})|\\*))?\\s*(=)\\s*)?(\"?)([^\"\\s]+)\\5$".r
val string = "property:xy = \"value\""
val result = regex.findAllIn(query)

The size of the result is 1, but I would expect 6.

How can I accomplish in Scala what I did in Java?

findAllIn(query) returns a MatchIterator, which is “a special scala.collection.Iterator that returns the matched strings but can also be queried for more data about the last match”. The size is the number of full matches, so 1 is the correct result. If you check for result.groupCount, you will get the 6 elements you expected (but note that, as it is an Iterator, calling a method like size will empty it).

If your inputs should always contain only one match of the whole regex, you can instead use findFirstMatchIn, which returns a Match object, that can be queried for the groups like a Java Matcher. If you are looking for several matches, but still want the Match objects, there is findAllMatchIn.

Also note, that regexes in Scala can be used as extractor objects, i.e. you can use them in pattern matching:

string match {
    case regex(g1, g2, g3, g4, g5, g6) =>
      // g1 ... g6 contain the matched strings here, unmatched optional groups will be null
    case _ =>
      // regex didn't match
}

This is the way recommended by the scala.util.matching.Regex documentation.

2 Likes

Thank you, I will try that!