When I put a '-' in a character class definition ('[...]'), there are some cases that a simple char in the definition is ignored. In such cases, instructions in REProgram objects are not as expected. This may be related to the bugs #2121 and #5212. For example, '[a-zA]' works fine, while for '[Aa-z]', 'A' is ignored, and for '[abcd\-]', 'd' is ignored. The point is that the ignored char is at 2-chars before '-'. Near Line 710 in RECompiler.java, we can see: > // If simple character and not start of range, include it > if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-') > { > range.include(simpleChar, include); > } In my understanding, idx is pointing the next char of the simpleChar in question. The simpleChar should not be included when its next char (if any) is '-' (in that case, the simpleChar turns to be a start of a new range.) Therefore, the following code seems correct: > if (idx >= len || pattern.charAt(idx) != '-') I tried this fix on the CVS'ed source tree last night, with some new testcases, and it worked fine. I'm not sure there is no side effect of this; at least all tests in RETest.txt are still successful. The diff output follows. Does this help? Ikuya Index: docs/RETest.txt =================================================================== RCS file: /home/cvspublic/jakarta-regexp/docs/RETest.txt,v retrieving revision 1.3 diff -c -r1.3 RETest.txt *** docs/RETest.txt 27 Feb 2001 08:37:05 -0000 1.3 --- docs/RETest.txt 28 Nov 2002 14:22:25 -0000 *************** *** 1011,1014 **** --- 1011,1030 ---- YES aaabc + #168 + [a-zA]+ + JakartaAnt + YES + akartaAnt + #169 + [Aa-z]+ + JakartaAnt + YES + akartaAnt + + #170 + [akrt\-]+ + Jakarta-Ant + YES + akarta- Index: src/java/org/apache/regexp/RECompiler.java =================================================================== RCS file: /home/cvspublic/jakarta- regexp/src/java/org/apache/regexp/RECompiler.java,v retrieving revision 1.4 diff -c -r1.4 RECompiler.java *** src/java/org/apache/regexp/RECompiler.java 27 Feb 2001 08:37:05 -0000 1.4 --- src/java/org/apache/regexp/RECompiler.java 28 Nov 2002 14:22:26 -0000 *************** *** 710,716 **** else { // If simple character and not start of range, include it ! if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-') { range.include(simpleChar, include); } --- 710,716 ---- else { // If simple character and not start of range, include it ! if (idx >= len || pattern.charAt(idx) != '-') { range.include(simpleChar, include); }
*** Bug 15381 has been marked as a duplicate of this bug. ***
*** Bug 15455 has been marked as a duplicate of this bug. ***
*** Bug 16434 has been marked as a duplicate of this bug. ***
*** Bug 16214 has been marked as a duplicate of this bug. ***
Fixed by patch in bug #19329 *** This bug has been marked as a duplicate of 19329 ***
Fixed by Bug #19329