SA Bugzilla – Bug 416
t/lang_pl_tests.t fails
Last modified: 2002-10-10 02:11:46 UTC
Get the messages from URL: http://www.opoka.org.pl/SADoS.html Pipe it thru spamassassin -L -D -t < INPUTMSG On my machines SA take 100% of CPU and never finishes checking this mesg. This is default instalation of SA 2.20 and 2.11. No idea what is going on with this amazing software :(
Subject: Re: [SAdev] New: SA DoS? On Mon, Jun 10, 2002 at 11:04:17AM -0700, bugzilla-daemon@hughes-family.org wrote: > Get the messages from URL: > http://www.opoka.org.pl/SADoS.html > Pipe it thru spamassassin -L -D -t < INPUTMSG > On my machines SA take 100% of CPU and never finishes checking this mesg. > This is default instalation of SA 2.20 and 2.11. > No idea what is going on with this amazing software :( On my P200, SA 2.20 (virgin) will run through that test in approx 28 seconds.
Subject: Re: SA DoS? On Mon, Jun 10, 2002 at 11:02:20PM +0200, Jakub Wasielewski wrote: > > After changing my locale around to "pl", I found this behavior and traced > > it down to these two rules. They're not doing anything special, so it's > > probably something related to the backtracking involved with all of the > > ".*" statements. > > > > lang pl body PL_JEZELI_NIE /je.*li.*nie .ycz.*sobie/i > > lang pl body PL_ARTYKUL_USTAWY /Art.*25.*ust.*2.*pkt.*2/i > > Hmm, this two are very important!!! First is to match a part saying > "If you do not want to"... recive this anymore. Second is a part usu? > aly placed in SPAM explaining that SPAM is not illegal by polish law. > > > I know no Polish, but the problem may be cleared up with a more stict > > version of pattern matching. For instance: Ok, on the way home I tried some stuff out ... Indeed, if I make the regexp more strict (but still matching the description text), SA runs much much much faster... Regexp Time for SA run Comments ======================================= =============== ====================================== /je.*li.*nie .ycz.*sobie/i 21.12 # original /je.\S*li.* nie .ycz\S+ sobie/i 5.59 /Art.*25.*ust.*2.*pkt.*2/i 95+ # original, doesn't match text in desc /Art\S* 25 ust 2 p\S*kt 2/i 5.52 Since I don't know any Polish, I'm not about to go making patches for all the 25_body_tests_pl.cf rules. Basically, like with the English version, all of the regexps should be made stricter. Replace ".*" with ".{,30}" if you actually want to match anything, use "\S*" for matching non-whitespace (ie: other characters in a single word), add in "\b" at the start of the regexp (I didn't do this above) if it's a word, etc. The thing that's killed SA multiple times so far is backtracking in regular expressions, so by removing pieces that will commonly cause backtracking (".*", ".+" mostly) we can keep the hangs away. Streamlining will also cause CPU usage to go down, thereby speeding up the SA runs. :)
Subject: Re: SA DoS? On Tue, Jun 11, 2002 at 09:26:38AM +0200, Radoslaw Stachowiak wrote: > 1. does \S* matches highascii (128..255) ? Because there are polish > national characters which use these high values. Yes. \S matches non-whitespace characters, so anything that isn't space, tab, LF, or CR. (I think the definition is actually anything that doesn't match the isspace() function, but those 4 are good enough.) > 2. Can you give me more examples of proper \b use - because i dont get > it.. This is from the perlre man page: A word boundary (`\b') is a spot between two characters that has a `\w' on one side of it and a `\W' on the other side of it (in either order), counting the imaginary char- acters off the beginning and end of the string as matching a `\W'. So the idea is that instead of matching /the full moon/, which would also match (contrived) "bathe full moon", and any other string that has that set of characters in them. Doing /\bthe full moon\b/ will mean that the exact string needs to match. The \b's match any non-word character (which would match high-ascii chars as well), which also matches the beginning and end of the string. Essentially, using \b (at least for English text) will make the matches faster and more accurate at the same time. Hopefully this helps. :)
BTW, I've gone through the pl rules and replaced all .*'s with .{0,99} to limit them.
ok, now verified as fixed in CVS. test added to test suite too.
The test is broken for me in current b2_4_0 CVS: [craig@belphegore spamassassin]$ make test PERL_DL_NONLAZY=1 /usr/bin/perl5.8.0 "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/db_based_whitelist........ok t/db_based_whitelist_ips....ok t/forged_rcvd...............ok t/lang_pl_tests............. Not found: didnt_hang_at_least = Analiza zawarto¶ci: # Failed test 1 in t/SATest.pm at line 241 t/lang_pl_tests.............FAILED test 1 Failed 1/1 tests, 0.00% okay
wierd, can't reproduce that...
craig, still seeing this?
Same here. Latest CVS. t/lang_pl_tests..... Not found: didnt_hang_at_least = Analiza zawarto¶ci: t/lang_pl_tests.....FAILED test 1 Failed 1/1 tests, 0.00% okay
this is fixed