here is the test program: import com.oroinc.text.regex.*; import java.io.*; public class bug_report { public static void main(String[] args) throws Exception { String regex = "\010[(]GAME +GID:([^;]+); +GDATE:([^;]*); +GSTART:([^;] *); +GSITE:([^;]*); +GNEUTRAL:([^;]*); +GSTAT:([^;]*); +GPERIOD:([^;]*);[^\r\n]* [\r\n]+" +"(" +"(\010[(]TEAM +TNAME:([^;]*);( +[^:]+:[^;]*;){3} +THOME: *([Yy][Ee][Ss]); +TSCORE:([^;]*); +TSTAT:([^;]*)[^\r\n]*[\r\n]+)" +"|" +"(\010[(]TEAM +TNAME:([^;]*);( +[^:]+:[^;]*;){3} +THOME: *([Nn][Oo]); +TSCORE:([^;]*); +TSTAT:([^;]*)[^\r\n]*[\r\n]+)" +"){2}"; String input = "(GAME GID:13805; GDATE:11/01/2000; GSTART:19:30; GSITE:Charlotte Coliseum; GNEUTRAL:NO; GSTAT:Final; GPERIOD:4; \n" +"(TEAM TNAME:Hornets; TLOCALE:Charlotte; TCONF:Eastern; TDIV:Central; THOME:YES; TSCORE:77; TSTAT:LOST; TID:9;)\n" +"(TEAM TNAME:Wizards; TLOCALE:Washington; TCONF:Eastern; TDIV:Atlantic; THOME:NO; TSCORE:95; TSTAT:WON; TID:7;))\n"; String input2 = "(GAME GID:13789; GDATE:10/31/2000; GSTART:19:30; GSITE:TD Waterhouse Centre; GNEUTRAL:NO; GSTAT:Final; GPERIOD:4; \n" +"(TEAM TNAME:Magic; TLOCALE:Orlando; TCONF:Eastern; TDIV:Atlantic; THOME:YES; TSCORE:97; TSTAT:WON; TID:5;)\n" +"(TEAM TNAME:Wizards; TLOCALE:Washington; TCONF:Eastern; TDIV:Atlantic; THOME:NO; TSCORE:86; TSTAT:LOST; TID:7;))\n"; Perl5Compiler p5compiler = new Perl5Compiler(); Perl5Pattern p5pattern = null; Perl5Matcher p5matcher = new Perl5Matcher(); PatternMatcherInput p5input = new PatternMatcherInput(input2); try { p5pattern = (Perl5Pattern) p5compiler.compile(regex, Perl5Compiler.SINGLELINE_MASK | Perl5Compiler.READ_ONLY_MASK ); } catch(MalformedPatternException e) { System.out.println("Error: Bad Perl5 pattern."); System.out.println(e.getMessage()); } boolean result = p5matcher.matchesPrefix(p5input, p5pattern); if( result ) { MatchResult mr = p5matcher.getMatch(); int groups = mr.groups(); int start = -1; int end = -1; String matchStr = null; for( int x = 0; x < groups; x++ ) { start = mr.beginOffset(x); end = mr.endOffset(x); //matchStr = mr.group(x); //System.out.print ("Pos: "+x+"\tStart: "+start+"\tEnd: "+end+"\tMatch: "+matchStr); System.out.print("Pos: "+x+"\tStart: "+start+"\tEnd: "+end); if( start > end ) System.out.println( " -- ERROR" ); else System.out.println(); } } else { System.out.println("No Match"); } System.out.println("Program terminating"); } } and here is some output: Pos: 0 Start: 0 End: 338 Pos: 1 Start: 11 End: 16 Pos: 2 Start: 24 End: 34 Pos: 3 Start: 43 End: 48 Pos: 4 Start: 56 End: 76 Pos: 5 Start: 87 End: 89 Pos: 6 Start: 97 End: 102 Pos: 7 Start: 112 End: 113 Pos: 8 Start: 224 End: 338 Pos: 9 Start: 224 End: 224 Pos: 10 Start: 237 End: 237 Pos: 11 Start: 280 End: 295 Pos: 12 Start: 302 End: 192 -- ERROR Pos: 13 Start: 201 End: 203 Pos: 14 Start: 211 End: 214 Pos: 15 Start: 224 End: 338 Pos: 16 Start: 237 End: 244 Pos: 17 Start: 280 End: 295 Pos: 18 Start: 302 End: 304 Pos: 19 Start: 313 End: 315 Pos: 20 Start: 323 End: 327 Program terminating if you'll notice, Pos 12 and Pos 18 share the same Start value. In the regex they have the same pattern. Granted, there are many similar sub patterns as a matter of fact lines 2 and 3 of the pattern are almost exatly the same except for [Yy][Ee][Ss] and [Nn][Oo]...
same problem for 2.0.4 version
This behavior is consistent with Perl 5.003_07 and is not a bug. The contents of a group is not guaranteed to be the last succesful match when contained within an alternation. In other words, group 18 is the valid match while group 12 did not match anything on its last attempt. In Perl5MatchResult, groups that failed to match on their last attempt as part of the NFA are indicated when the start offset is greater than the end offset (this may be a documentation bug since it may not appear in the javadocs) and when accessed via group(int) they return null. Subgroups that weren't reached a final time during the NFA execution (perhaps because an earlier subgroup failed) will retain their old values. Later versions of Perl regularized the behavior of subgroups so that they would always contain the last value matched rather than a potentially empty value based on a final failed subgroup match attempt. Perl5Matcher will implement this behavior as part of the Perl 5.6 compatibility work.