SA Bugzilla – Bug 5965
header field with a value of "0" (zero) invisible
Last modified: 2009-08-25 08:49:10 UTC
The following rule does not get processed... See mailing list thread for further details. header LOCAL_TESTHEADER TESTHEADER =~ /^0$/ score LOCAL_TESTHEADER 1000 message: TESTHEADER: 0 From:blah@example.com <From%3Ablah@example.com> To: blah1@example.com Subject: This is a TESTHEADER: 0 test blahblah blah
Also the following message hits a MISSING_SUBJECT rule: From:blah@example.com <blah@example.com> To: blah1@example.com Subject: 0 blahblah blah The underlying reason is a nonpedantic practice in some parts of code where user-supplied strings (e.g. from messages or rules) are evaluated in places as Perl booleans, instead of being tested for being defined or for being an empty string. In both supplied cases the "Subject: 0" or "TESTHEADER: 0" is indistinguishable from an absent header field. Please try the attached patch - it should apply to 3.2.* as well as to 3.3 (with some fudge). I tried to fix the more prominent of these cases, but I'm sure there are still other more obscure ones, waiting for a next bug report.
Created attachment 4359 [details] proposed patch
Now in trunk (directly or indirectly related to this bug): r689129 | mmartinec | 2008-08-26 18:28:52 +0200 (Tue, 26 Aug 2008) bug 5965: do not treat user data as perl booleans (a string "0" is a false); differentiate between missing and empty header fields; tweak header parsing r689130 | mmartinec | 2008-08-26 18:53:14 +0200 (Tue, 26 Aug 2008) fixed previous patch (hb separator test failing) r689682 | mmartinec | 2008-08-28 02:44:56 +0200 (Thu, 28 Aug 2008) - continue work on avoiding user data to be tested as perl booleans, instead test for defined or for an empty string as appropriate; - pms->get can now distinguish between empty and nonexistent header fields, undef is returned for nonexistent header field unless a default value argument is explicitly set to some defined value like an empty string; - modified calls to pms->get to deal with undef as appropriate; - Conf.pm, Conf/Parser.pm and Plugin/Check.pm now work together and turn a rule 'exists:name_of_header' into a defined(name_of_header) instead of a name_of_header =~ /./ to match the documentation ("Define a header existence test") and make it possible to distinguish empty from nonexistent header fields; in principle the new code could allow operators like 'eq' and 'ne' or function calls in header tests in addition to regexp matching operators '=~' and '!~' (but this is currently not allowed by the parser); (plus some collateral small fixes) The whole change is too extensive for a 3.2 branch, and as the original bug report had a label of minor severity, I suggest this ticket be closed now, changes will only be in 3.3.
broadly good, but -- I'm not fond of the side-effect change to the PMS::get() API. This is a public API for third party plugins and eval rules, so this will create a lot of noise when they call ->get() in the previous manner, then match against the returned value without checking for undef first. Is there some other way we can support our own internal code getting undef returns, without breaking backwards compat on that API? I'm thinking a new, wrapper method around ->get(), which passes a new "magic" value for the optional $defval argument to the ->_get() method, indicating that returning undef is acceptable, and if that's not present then the old style of returning '' on undef is used.
(In reply to comment #4) > broadly good, but -- I'm not fond of the side-effect change to the PMS::get() > API. This is a public API for third party plugins and eval rules, so this will > create a lot of noise when they call ->get() in the previous manner, then match > against the returned value without checking for undef first. > > Is there some other way we can support our own internal code getting undef > returns, without breaking backwards compat on that API? [...] There is a better way, I had it implemented for a while but scrapped it because calls to get() could look weird. I'll make the following change to the tail section of a get() and appropriate changes to calls to retain compatibility: currently: return (defined $found ? $found : $_[2]); new: # if the requested header wasn't found, we should return a default value # as specified by the caller: if defval argument is present it represents # a default value even if undef; if defval argument is absent a default # value is an empty string for upwards compatibility return (defined $found ? $found : @_ > 2 ? $_[2] : ''); so now the: $pms->get('Subject') becomes equivalent to: $pms->get('Subject','') and if one wants to receive undef for nonexistent header, the undef needs to be specified explicitly as a second argument: $pms->get('Subject',undef)
(In reply to comment #5) > so now the: > $pms->get('Subject') > becomes equivalent to: > $pms->get('Subject','') > > and if one wants to receive undef for nonexistent header, the > undef needs to be specified explicitly as a second argument: > $pms->get('Subject',undef) perfect, thanks! I was trying to figure out a way to do that, but didn't see it. +1 from me ;)
Just to chime in a bit, the get code is a real hotspot in our profile, so please be very very very careful what what you are doing there. Run some before and after profiles to make sure you didn't slow it down.
agreed, it's a hotspot. I did a quick check of performance of this morning's SVN code, based on 10000 accesses to a cached header; the new code seemed slightly slower, about 8% (for the get() method alone). imo that's OK, and can be optimized later.
Done (compatibility): r689835 | mmartinec | 2008-08-28 16:28:02 +0200 (Thu, 28 Aug 2008) Changed PMS::get and its calls for compatibility regarding a default value: if the requested header field wasn't found, return a default value as specified by the caller: if defval argument is present it represents a default value even if undef; if defval argument is absent a default value is an empty string for compatibility
> agreed, it's a hotspot. I did a quick check of performance of this morning's > SVN code, based on 10000 accesses to a cached header; the new code seemed > slightly slower, about 8% (for the get() method alone). imo that's OK, and can > be optimized later. One must take into account the actual hit/miss cache ratio. Instrumenting the pms->get with two counters and letting it run for a while on our production mail server gives 45% hits/attempts ratio. While the hit code path is a bit slower, the miss path is much faster (original code evaluates $_[0]->{c}->{$_[1]} multiple times), which in my benchmark yields an overall improvement. With a simulated and quick _get the improvement is 30%, yet in practice the speedup is negligible because the actual _get is very slow compared to the get().
...not to mention that a meager 300 calls of get() per message check are negligible anyway...
The change seems to have settled down by now. Perhaps it's time to close it for 3.3. There are still some cases in the code where user-supplied strings are interpreted as Perl booleans. The more glaring cases have been fixed now, including the original reported case. I'll keep an eye on the issue and perhaps fix some more cases as I happen to come across them - there is no need to keep a PR open just for this purpose. As changes were nontrivial, it probably merits a voting before closing.
(In reply to comment #12) > As changes were nontrivial, it probably merits a voting > before closing. that's ok IMO -- we've had plenty of time to issue -1's if we were so inclined. go ahead and close... (PS: for reference, I've found the best way to benchmark changes to the Mail::SA::* modules is to perform a mass-check of a 50% ham, 50% spam corpus using Devel::DProf (or possibly NYTProf although I haven't yet tried it).)
Ok, closing, fixed in 3.3. If anyone feels the proposed patch should also go into 3.2.6, please reopen.
> There are still some cases in the code where user-supplied > strings are interpreted as Perl booleans. The more glaring > cases have been fixed now, including the original reported case. > I'll keep an eye on the issue and perhaps fix some more cases > as I happen to come across them Bug 5965: rewrite_report_safe - fix further cases of treating user data as booleans (in this case it was the string in "rewrite_header" configuration option) Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Committed revision 807240.
*** Bug 6184 has been marked as a duplicate of this bug. ***