Bug 22804 - ArrayIndexOutOfBoundsException on negated classes
Summary: ArrayIndexOutOfBoundsException on negated classes
Status: CLOSED FIXED
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: PC All
: P3 major (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-08-28 20:08 UTC by Fernando Schapachnik
Modified: 2004-11-16 19:05 UTC (History)
0 users



Attachments
Testcase (810 bytes, text/plain)
2003-09-01 12:48 UTC, Fernando Schapachnik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fernando Schapachnik 2003-08-28 20:08:15 UTC
I use this code as a "sanitizer" (ie, filters bad input from users) on JDK 1.3.1:

String allowed= "a-zA-Z0-9_@.: ñÑáéíóúÁÉÍÓÚ\r\n\\-";
RE r= new RE("[^"+allowed+"]");
output= r.subst(input, "_", RE.REPLACE_ALL);

When running:

sanitize("aé$.JOla^|-+_")

I get:

java.lang.ArrayIndexOutOfBoundsException: 16
        at org.apache.regexp.RECompiler$RERange.delete(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.remove(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.include(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.include(Unknown Source)
        at org.apache.regexp.RECompiler.characterClass(Unknown Source)
        at org.apache.regexp.RECompiler.terminal(Unknown Source)
        at org.apache.regexp.RECompiler.closure(Unknown Source)
        at org.apache.regexp.RECompiler.branch(Unknown Source)
        at org.apache.regexp.RECompiler.expr(Unknown Source)
        at org.apache.regexp.RECompiler.compile(Unknown Source)
        at org.apache.regexp.RE.<init>(Unknown Source)
        at org.apache.regexp.RE.<init>(Unknown Source)

This is with both 1.2 and 1.3-dev (CVS) as of 28/Aug/2003.

Everything works if I use:

String allowed= "a-zA-Z0-9_@.: ñÑáéíóúÁÉÍÓÚ\r\\-"; (removed \n)

The same happen with other characters inside de [^].

Is there's any other info needed, please let me know.
Comment 1 Vadim Gritsenko 2003-08-30 00:37:57 UTC
Fernando, you've got lots of funny characters in your program and I can't write
a program with those to test your bug. Please rewrite your test: use \uXXXX
encoding when using non ascii characters.

Alternatively, you can attach your test and I can try and compile it on my
platform but that might not work as we might have different platform encodings.

Vadim
Comment 2 Fernando Schapachnik 2003-09-01 12:48:08 UTC
Created attachment 8020 [details]
Testcase
Comment 3 Fernando Schapachnik 2003-09-01 12:50:09 UTC
Hello Vadim,
Please find attached a program with the accented chars \u encoded.
That way the example blows with or without the \n.

Thanks!

Fernando.
Comment 4 Fernando Schapachnik 2003-09-01 12:51:14 UTC
Sorry, forgot the trace:


lang.ArrayIndexOutOfBoundsException: 16
        at org.apache.regexp.RECompiler$RERange.delete(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.remove(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.include(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.include(Unknown Source)
        at org.apache.regexp.RECompiler.characterClass(Unknown Source)
        at org.apache.regexp.RECompiler.terminal(Unknown Source)
        at org.apache.regexp.RECompiler.closure(Unknown Source)
        at org.apache.regexp.RECompiler.branch(Unknown Source)
        at org.apache.regexp.RECompiler.expr(Unknown Source)
        at org.apache.regexp.RECompiler.compile(Unknown Source)
        at org.apache.regexp.RE.<init>(Unknown Source)
        at org.apache.regexp.RE.<init>(Unknown Source)
        at ar.gov.mecon.sso.testing.testre.sanitize(testre.java:25)
        at ar.gov.mecon.sso.testing.testre.main(testre.java:39)
Comment 5 Vadim Gritsenko 2003-09-01 18:20:36 UTC
Ok, this exception should happen on any (negated?) character class which is more
than 16 ranges (example: [^02468ACEGIKMOQSUWY])
Comment 6 Vadim Gritsenko 2003-09-01 18:34:56 UTC
fixed. please test against latest CVS.
Comment 7 Fernando Schapachnik 2003-09-01 20:10:46 UTC
Works Ok.

Thanks!