Bug 5965 - header field with a value of "0" (zero) invisible
Summary: header field with a value of "0" (zero) invisible
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.2.5
Hardware: Other All
: P5 normal
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL: http://mail-archives.apache.org/mod_m...
Whiteboard:
Keywords:
: 6184 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-26 06:58 UTC by dms
Modified: 2009-08-25 08:49 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
proposed patch patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description dms 2008-08-26 06:58:33 UTC
The following rule does not get processed...
See mailing list thread for further details.

header          LOCAL_TESTHEADER        TESTHEADER =~ /^0$/
score           LOCAL_TESTHEADER        1000



message:
TESTHEADER: 0
From:blah@example.com <From%3Ablah@example.com>
To: blah1@example.com
Subject: This is a TESTHEADER: 0 test

blahblah blah
Comment 1 Mark Martinec 2008-08-26 10:29:01 UTC
Also the following message hits a MISSING_SUBJECT rule:

  From:blah@example.com <blah@example.com>
  To: blah1@example.com
  Subject: 0

  blahblah blah

The underlying reason is a nonpedantic practice in some parts of code
where user-supplied strings (e.g. from messages or rules) are evaluated
in places as Perl booleans, instead of being tested for being defined
or for being an empty string. In both supplied cases the "Subject: 0"
or "TESTHEADER: 0" is indistinguishable from an absent header field.

Please try the attached patch - it should apply to 3.2.* as well as
to 3.3 (with some fudge). I tried to fix the more prominent of these
cases, but I'm sure there are still other more obscure ones, waiting
for a next bug report.
Comment 2 Mark Martinec 2008-08-26 10:31:01 UTC
Created attachment 4359 [details]
proposed patch
Comment 3 Mark Martinec 2008-08-27 18:09:36 UTC
Now in trunk
(directly or indirectly related to this bug):

r689129 | mmartinec | 2008-08-26 18:28:52 +0200 (Tue, 26 Aug 2008)
bug 5965: do not treat user data as perl booleans (a string "0" is a false); differentiate between missing and empty header fields; tweak header parsing

r689130 | mmartinec | 2008-08-26 18:53:14 +0200 (Tue, 26 Aug 2008)
fixed previous patch (hb separator test failing)

r689682 | mmartinec | 2008-08-28 02:44:56 +0200 (Thu, 28 Aug 2008)
- continue work on avoiding user data to be tested as perl booleans,
  instead test for defined or for an empty string as appropriate;
- pms->get can now distinguish between empty and nonexistent header
  fields, undef is returned for nonexistent header field unless a
  default value argument is explicitly set to some defined value
  like an empty string;
- modified calls to pms->get to deal with undef as appropriate;
- Conf.pm, Conf/Parser.pm and Plugin/Check.pm now work together and turn
  a rule 'exists:name_of_header' into a defined(name_of_header) instead
  of a  name_of_header =~ /./  to match the documentation ("Define a
  header existence test") and make it possible to distinguish empty
  from nonexistent header fields; in principle the new code could allow
  operators like 'eq' and 'ne' or function calls in header tests
  in addition to regexp matching operators '=~' and '!~' (but this
  is currently not allowed by the parser);
(plus some collateral small fixes)

The whole change is too extensive for a 3.2 branch, and as the original
bug report had a label of minor severity, I suggest this ticket be
closed now, changes will only be in 3.3.
Comment 4 Justin Mason 2008-08-28 02:28:37 UTC
broadly good, but -- I'm not fond of the side-effect change to the PMS::get() API.  This is a public API for third party plugins and eval rules, so this will create a lot of noise when they call ->get() in the previous manner, then match against the returned value without checking for undef first.

Is there some other way we can support our own internal code getting undef returns, without breaking backwards compat on that API?  I'm thinking a new, wrapper method around ->get(), which passes a new "magic" value for the optional $defval argument to the ->_get() method, indicating that returning undef is acceptable, and if that's not present then the old style of returning '' on undef is used.
Comment 5 Mark Martinec 2008-08-28 05:41:13 UTC
(In reply to comment #4)
> broadly good, but -- I'm not fond of the side-effect change to the PMS::get()
> API.  This is a public API for third party plugins and eval rules, so this will
> create a lot of noise when they call ->get() in the previous manner, then match
> against the returned value without checking for undef first.
> 
> Is there some other way we can support our own internal code getting undef
> returns, without breaking backwards compat on that API? [...]

There is a better way, I had it implemented for a while but scrapped it
because calls to get() could look weird. I'll make the following change
to the tail section of a get() and appropriate changes to calls
to retain compatibility:

currently:
  return (defined $found ? $found : $_[2]);

new:
  # if the requested header wasn't found, we should return a default value
  # as specified by the caller: if defval argument is present it represents
  # a default value even if undef; if defval argument is absent a default
  # value is an empty string for upwards compatibility
  return (defined $found ? $found : @_ > 2 ? $_[2] : '');

so now the:
  $pms->get('Subject')
becomes equivalent to:
  $pms->get('Subject','')

and if one wants to receive undef for nonexistent header, the
undef needs to be specified explicitly as a second argument:
  $pms->get('Subject',undef)
Comment 6 Justin Mason 2008-08-28 05:50:21 UTC
(In reply to comment #5)
> so now the:
>   $pms->get('Subject')
> becomes equivalent to:
>   $pms->get('Subject','')
> 
> and if one wants to receive undef for nonexistent header, the
> undef needs to be specified explicitly as a second argument:
>   $pms->get('Subject',undef)

perfect, thanks!  I was trying to figure out a way to do that, but didn't see it.

+1 from me ;)
Comment 7 Michael Parker 2008-08-28 06:55:46 UTC
Just to chime in a bit, the get code is a real hotspot in our profile, so please be very very very careful what what you are doing there.  Run some before and after profiles to make sure you didn't slow it down.
Comment 8 Justin Mason 2008-08-28 07:45:14 UTC
agreed, it's a hotspot.  I did a quick check of performance of this morning's SVN code, based on 10000 accesses to a cached header; the new code seemed slightly slower, about 8% (for the get() method alone).  imo that's OK, and can be optimized later.
Comment 9 Mark Martinec 2008-08-28 08:16:04 UTC
Done (compatibility):

r689835 | mmartinec | 2008-08-28 16:28:02 +0200 (Thu, 28 Aug 2008)
Changed PMS::get and its calls for compatibility regarding a
default value: if the requested header field wasn't found,
return a default value as specified by the caller: if defval
argument is present it represents a default value even if undef;
if defval argument is absent a default value is an empty string
for compatibility
Comment 10 Mark Martinec 2008-08-28 08:58:07 UTC
> agreed, it's a hotspot.  I did a quick check of performance of this morning's
> SVN code, based on 10000 accesses to a cached header; the new code seemed
> slightly slower, about 8% (for the get() method alone).  imo that's OK, and can
> be optimized later.

One must take into account the actual hit/miss cache ratio.
Instrumenting the pms->get with two counters and letting it run for
a while on our production mail server gives 45% hits/attempts ratio.

While the hit code path is a bit slower, the miss path is much faster
(original code evaluates $_[0]->{c}->{$_[1]} multiple times),
which in my benchmark yields an overall improvement. With a
simulated and quick _get the improvement is 30%, yet in practice
the speedup is negligible because the actual _get is very slow
compared to the get().
Comment 11 Mark Martinec 2008-08-28 10:14:07 UTC
...not to mention that a meager 300 calls of get() per message check
are negligible anyway...
Comment 12 Mark Martinec 2009-01-06 16:57:48 UTC
The change seems to have settled down by now.
Perhaps it's time to close it for 3.3.

There are still some cases in the code where user-supplied
strings are interpreted as Perl booleans. The more glaring
cases have been fixed now, including the original reported case.
I'll keep an eye on the issue and perhaps fix some more cases
as I happen to come across them - there is no need to keep
a PR open just for this purpose.

As changes were nontrivial, it probably merits a voting
before closing.
Comment 13 Justin Mason 2009-01-07 02:23:38 UTC
(In reply to comment #12)
> As changes were nontrivial, it probably merits a voting
> before closing.

that's ok IMO -- we've had plenty of time to issue -1's if we were so inclined. go ahead and close...

(PS: for reference, I've found the best way to benchmark changes to the Mail::SA::* modules is to perform a mass-check of a 50% ham, 50% spam corpus using Devel::DProf (or possibly NYTProf although I haven't yet tried it).)
Comment 14 Mark Martinec 2009-01-07 16:39:19 UTC
Ok, closing, fixed in 3.3.
If anyone feels the proposed patch should
also go into 3.2.6, please reopen.
Comment 15 Mark Martinec 2009-08-24 07:34:29 UTC
> There are still some cases in the code where user-supplied
> strings are interpreted as Perl booleans. The more glaring
> cases have been fixed now, including the original reported case.
> I'll keep an eye on the issue and perhaps fix some more cases
> as I happen to come across them

  Bug 5965: rewrite_report_safe - fix further cases of treating
  user data as booleans (in this case it was the string in
  "rewrite_header" configuration option)
Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
Committed revision 807240.
Comment 16 Karsten Bräckelmann 2009-08-25 08:49:10 UTC
*** Bug 6184 has been marked as a duplicate of this bug. ***