This code generates an exception when running with jdk1.3.1_17: RE r123 = new RE("((a|b){1637})"); r123.match("a"); This code works properly: RE r123 = new RE("((a|b){1638})"); r123.match("a"); This code shows that depending on the number requested, regexp switches between working and not working: boolean lastvalue = true; for(int i = 1; i < 3650; i+=1) { try { RE r = new RE("((a|b){" + i + "})"); r.match("a"); if (!lastvalue) { System.out.println("Switching from NOT to WORKING at " + i + " (" + i + " works) "+lastvalue); } lastvalue = true; } catch (Exception ex) { if (lastvalue) { System.out.println("Switching from WORKING to NOT at " + i + " (" + i + " doesn't work) "+lastvalue); } lastvalue = false; } } This behavior, if "i" was allowed past 3650, would switch back and forth a couple more times before 10000, however seen it happen above 7000 (this is as far as I let it test). In RE.java, look under the following signature: protected int matchNodes(int firstNode, int lastNode, int idxStart) Look for this line: next = node + (short)instruction[node + offsetNext]; Change it to say: next = node + (int)instruction[node + offsetNext]; Recompile and test and this problem appears to go away, however I cannot confirm that it doesn't break something else. I'm not sure why "short" would have been chosen over "int". Maybe there is a hidden reason.
instruction is an array of chars, which means it has two bytes values. Offset from one instruction to another takes one char in the array, so it must be within [Short.MIN_VALUE, MAX_VALUE]. Some of the programs (like a{8192}) in current version are compiled into code exceeding this size (more than Short.MAX_VALUE instructions), and so can not be expressed correctly. Added check for this condition to RECompiler.