SA Bugzilla – Bug 1470
This message gets through unscathed, related to HTML::Parser call
Last modified: 2003-02-11 00:01:21 UTC
Just received 3 spams to spamtraps, all of which were basically the same. But the neat thing is that the message gets 0 points with set0 and only gets hits in set1 with net tests. The main thing is that they use obfuscating comments, multiple per word, and the obfuscation uses the email address so (for me at least) gets around the 12 char limit in our OBFUSCATING_COMMENT rule. Perhaps we ought to up the length to 30 chars? After rendering through lynx and getting text, set0 gets 3 points via: CLICK_BELOW,LIMITED_TIME_ONLY,MONEY_BACK After some debugging, all of the text is removed by the HTML::Parser call in PerMsgStatus::get_decoded_stripped_body_text_array(). I think it's related to how we call HTML::Parser, not the external module itself (I did a quick run to remove HTML comments that worked fine). The spamtrap-address-modified message will be attached shortly. :)
Created attachment 645 [details] Message causing the problems ...
Some more info:Generic 2.50 run:. 0 1.0 HTML_90_100,MIME_HTML_ONLY,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If I take the following line from PerMsgStatus: $hp->parse(pack ('C0', $text));and remove the pack part (so we just pass in $text), I get:. 2 1.0 CLICK_BELOW,HTML_80_90,HTML_COMMENT_EMAIL,HTML_MESSAGE,HTML_TAG_BALANCE_A,HTML_TAG_BALANCE_BODY,HTML_TAG_BALANCE_HTML,LIMITED_TIME_ONLY,MIME_HTML_ONLY,MONEY_BACK,__CLICK_BELOW,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If we need the pack (via bug 1417), we should use 'C0A*'. Using just 'C0' passes no data to the parser ...
Subject: Re: [SAdev] This message gets through unscathed, related to HTML::Parser call On Tue, Feb 11, 2003 at 08:50:37AM -0800, bugzilla-daemon@hughes-family.org wrote: > Some more info:Generic 2.50 run:. 0 1.0 HTML_90_100,MIME_HTML_ONLY,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If I take the following line from PerMsgStatus: $hp->parse(pack ('C0', $text));and remove the pack part (so we just pass in $text), I get:. 2 1.0 CLICK_BELOW,HTML_80_90,HTML_COMMENT_EMAIL,HTML_MESSAGE,HTML_TAG_BALANCE_A,HTML_TAG_BALANCE_BODY,HTML_TAG_BALANCE_HTML,LIMITED_TIME_ONLY,MIME_HTML_ONLY,MONEY_BACK,__CLICK_BELOW,__CT,__HAS_MSGID,__MIME_HTML_ONLY,__MIME_VERSION,__SANE_MSGID time=1044927766If we need the pack (via bug 1417), we should use 'C0A*'. Using just 'C0' passes no data to the parser ... Wow, WTF happened there? 1 big long line. Anyway, the piece of the message that's important: There's a line in PerMsgStatus for bug 1417: $hp->parse(pack ('C0', $text)); Which is supposed to say "just use chars, not unicode", except that by itself it nullifies the input string (nothing else matches in the pattern...) If we need to keep the pack, we should change it to 'C0A*'. So I'm going to commit the change. The bigger question is: ok, how many messages does this affect, and does this affect the mass-check runs?
Ok, this bug is now fixed. I committed the change to 'C0A*' ...
Subject: Re: [SAdev] This message gets through unscathed, related to HTML::Parser call On Tue, Feb 11, 2003 at 08:56:35AM -0800, bugzilla-daemon@hughes-family.org wrote: > So I'm going to commit the change. The bigger question is: ok, how many > messages does this affect, and does this affect the mass-check runs? I was worried this was in 2.50 for all the mass-check runs, but it turns out we're lucky... The broken version went in (rev 1.272) right after the code got tagged for the second set of mass-check runs (rev 1.271). Phew.
Subject: Re: [SAdev] This message gets through unscathed, related to HTML::Parser call bugzilla-daemon@hughes-family.org said: > If we need the pack (via bug 1417), we should use 'C0A*'. Using jus > t 'C0' passes no data to the parser ... argh, that was me. Sorry about that. Insufficient looking back through CVS. :( That went in *after* the GA run tagging, so that's not affected. --j.