Bug 9153 - RE.match() hangs when using {n,m}
Summary: RE.match() hangs when using {n,m}
Status: CLOSED FIXED
Alias: None
Product: Regexp
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: PC All
: P3 normal with 5 votes (vote)
Target Milestone: ---
Assignee: Jakarta Notifications Mailing List
URL:
Keywords:
: 5243 10940 15775 23303 23338 27670 28926 31719 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-05-16 09:12 UTC by Thomas Letsch
Modified: 2007-12-14 05:59 UTC (History)
9 users (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Letsch 2002-05-16 09:12:39 UTC
Hi,

I have found a regular expression and accoording String which hangs the JVM:
RE: "^((\w|\s){1,50})$"
String to match: "Controlling &Cheques"

It will hang when I call
RE re = new RE("^((\w|\s){1,50})$");
re.match("Controlling &Cheques");

Thanks for any help.

Regards,
Thomas
Comment 1 Thomas Letsch 2002-05-16 11:24:03 UTC
Sorry, but it should be 
RE re = new RE("^([\w\s]{1,50})$");
re.match("Controlling &Cheques");


But the problem remains the same..
Comment 2 Thomas Letsch 2002-05-16 11:39:42 UTC
It seems to me that the processing time grows exponentially with the amount of 
characters in the Search String. 
Especially the amount of chars before the special char (e.g. "&").
A String "&asfbd" will get through fast, but a String "asfbd&" will take much 
more time.
Comment 3 Vadim Gritsenko 2003-04-25 18:16:31 UTC
*** Bug 5243 has been marked as a duplicate of this bug. ***
Comment 4 Vadim Gritsenko 2003-05-02 02:02:58 UTC
*** Bug 15775 has been marked as a duplicate of this bug. ***
Comment 5 Vadim Gritsenko 2003-05-02 02:04:09 UTC
*** Bug 10940 has been marked as a duplicate of this bug. ***
Comment 6 Vadim Gritsenko 2003-09-23 00:29:50 UTC
*** Bug 23338 has been marked as a duplicate of this bug. ***
Comment 7 Vadim Gritsenko 2003-09-23 00:35:53 UTC
*** Bug 23303 has been marked as a duplicate of this bug. ***
Comment 8 Sue Andarmani 2004-01-23 00:21:58 UTC
Here is an re and input combo that hang forever:
C.{1,2}C.{5,45}[VMFLWIE].C.{1,4}C.{1,4}[WYFVQHLT]H.{2}C.{5,45}[WFLYI].C.{2}C

MEMKKKINMELKNRAPEEVTELVLDNCLCVNGEIEGLNDTFKELEFLSMANVELSSLARLPSLNKLRKLELSDNIISGGL
EVLAEKCPNLTYLNLSGNKIKDLSTVEALQNLKNLKSLDLFNCEITNLEDYRESIFELLQQITYLDGFDQEDNEAPDSEE
EDDDDEDGDEDEEDEDEDEAGPPEGYEEEEDDDEDEAGSEVGEGEEEVGLSYLMKDEIQDEEDDDDYVDEGEEEEEEEEE
GLRGEKRKRDAEDDGEEDDD

Thanks in advance for a patch!
Comment 9 Sue Andarmani 2004-01-23 17:09:04 UTC
Another re that hangs match:
^.{10,115}[DENF][ST][LIVMF][LIVSTEQ]V.[AGP][STANEQPK]

this was created as re = newRE("the above pattern") and
either re.match or re.getParenEnd(0) has been hanging
for 15 hours already now.
environment is j2sdk1.4.1_02 on Linux.
To be fair, these bog down the C regex somewhat too, but
nothing like this.

smaller {n,m} ranges are matching ok.
Comment 10 Oleg Sukhodolsky 2004-03-16 08:22:29 UTC
I think that the cause of the problem is the way we compile <something>{n,m} 
constuction.  As far as I can see, program for it is equal to
<something>{n}(<something>|)...(<something>|) (this has exponentional 
complexity)
So, during matching we have no way to optimize this.
Possible way to fix the problem would be adding new operation for this 
construction, so we process it faster.
Comment 11 Vadim Gritsenko 2004-03-17 15:47:24 UTC
*** Bug 27670 has been marked as a duplicate of this bug. ***
Comment 12 Vadim Gritsenko 2004-05-12 14:55:35 UTC
*** Bug 28926 has been marked as a duplicate of this bug. ***
Comment 13 Vadim Gritsenko 2005-08-11 05:15:18 UTC
*** Bug 31719 has been marked as a duplicate of this bug. ***
Comment 14 Alexandre 2005-11-17 18:38:46 UTC
Here is another combination that hangs it for quite some time:
pattern: ^[a-zA-Z0-9 ]{1, 35}$
input : This is a sample string (remove it)

I have observed a lot of recursive activity going on in debugger while matching
this pattern. Interestingly enough the pattern ^[a-zA-Z0-9 ]*$ works really fast.
Comment 15 Vadim Gritsenko 2007-03-12 20:01:52 UTC
fixed in trunk
Comment 16 Christian Nolte 2007-05-09 01:48:48 UTC
Using jakarta-regexp 1.5 and 1.6-dev (rev. 536456) I have the same problem with
the following regexp:

Pattern: <([^<]*[^<]+)+>
Input: <a>remove it</a>


Comment 17 Christian Nolte 2007-05-09 01:56:15 UTC
Sorry. The wrong input string. The input which goes into an endless loop is:

<P align="center"><STRONG><FONT >test&nbsp;&nbsp; test</FONT></STRONG></P><P
style=TEXT-ALIGN: left><FONT>
Comment 18 Vadim Gritsenko 2007-05-09 05:09:47 UTC
(In reply to comment #16)
> Using jakarta-regexp 1.5 and 1.6-dev (rev. 536456) I have the same problem

It is not the same problem since you do not have {,} in your regexp.


> with the following regexp:
> 
> Pattern: <([^<]*[^<]+)+>

I don't understand what you meant here. Wouldn't it be exactly the same as:

  <([^<]+)+>

And I bet this one will be faster too.
Comment 19 Tilo K 2007-12-14 05:59:45 UTC
Using jakarta-regexp 1.5
the following regexp (yes, it is not what was intended, but anyway):
^((?:[!_.@?-]+)|(?:\w)){6,60}$

causes an infinite loop in RE.match using this input:
7ftt_.4q?iJoz7I8ky8c5BPwMüTge9D5-kÄtDAHöSLidNMNYbchäcsÖpLPPh

Probably a new Bug?