SA Bugzilla – Bug 7729
[review] body rules to match body only, not including the Subject
Last modified: 2019-11-11 15:58:44 UTC
The current body rule does not only match the actual body, but also includes the Subject as pseudo-part of the body. While this behaviour is clearly documented, people do get caught be this (meta rule to match some phrases in the body AND subject). It is non-trivial to write a rule matching the body only, or actually matching both the body and the Subject. M:SA:Conf "The message Subject header is considered part of the body and becomes the first paragraph when running the rules." We should implement a tflags test flag for body rules to exclude the Subject from matching. Along with the next major branch 4.x we may consider making this the default, with Subject prepending a tflags option.
Yes it seems quite pointless shortcut to "maybe" reduce the amount of duplicate rules. It makes no sense to add separate tflag for this. If one wants to match Subject, then it should be a separate header rule! I vote +1 to remove legacy subject matching from body.
Only thing required to fix after that would probably be the schemeless uri parser, which currently grabs uris and emails from the Subject. It's a one line fix to code to add it back to it.
Ok vetoing my vote. Tested which rules hit subject, and there are maaaany. Thinking further, some people will probably stick with 3.4 for years and years, it would be nightmare to just remove Subject and try to maintain compatibility for all.. Instead here is the extremely simple code required to implement a "nosubject" tflag. Just add a small documentation blurt, and it can be committed to 3.4.3 very safely, please vote, +1 me. --- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688) +++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy) @@ -827,6 +827,9 @@ $loopid++; my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/; $max = untaint_var($max); + if ($conf->{tflags}->{$rulename} =~ /\bnosubject\b/) { + $sub .= "\n".'shift @_;'; # remove subject line + } $sub .= ' $hits = 0; body_'.$loopid.': foreach my $l (@_) { @@ -844,6 +847,9 @@ else { # omitting the "pos" call, "body_loopid" label, use of while() # instead of if() etc., shaves off 8 perl OPs. + if (($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/) { + $sub .= "\n".'shift @_;'; # remove subject line + } $sub .= ' foreach my $l (@_) { '.$self->hash_line_for_rule($pms, $rulename).'
Slightly revised patch, should not modify @_ as it's used elsewhere.. Eval string will include two extra nosubj codelines if flagged if ($scoresptr->{q{FOO}}) { my $nosubj = 1; ### IF FLAGGED foreach my $l (@_) { if ($nosubj) { $nosubj = 0; next; } ### IF FLAGGED if ($l =~ /$qrptr->{q{FOO}}/o) { $self->got_hit(q{FOO}, "BODY: ", ruletype => "body"); last; } } } --- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688) +++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy) @@ -821,6 +821,12 @@ dbg("rules-all: running body rule %s", q{'.$rulename.'}); '; } + my $nosubject = ($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/; + if ($nosubject) { + $sub .= ' + my $nosubj = 1; + '; + } if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/) { # support multiple matches @@ -830,6 +836,13 @@ $sub .= ' $hits = 0; body_'.$loopid.': foreach my $l (@_) { + '; + if ($nosubject) { + $sub .= ' + if ($nosubj) { $nosubj = 0; next; } + '; + } + $sub .= ' pos $l = 0; '.$self->hash_line_for_rule($pms, $rulename).' while ($l =~ /$qrptr->{q{'.$rulename.'}}/go'. ($max? ' && $hits++ < '.$max:'') .') { @@ -846,6 +859,13 @@ # instead of if() etc., shaves off 8 perl OPs. $sub .= ' foreach my $l (@_) { + '; + if ($nosubject) { + $sub .= ' + if ($nosubj) { $nosubj = 0; next; } + '; + } + $sub .= ' '.$self->hash_line_for_rule($pms, $rulename).' if ($l =~ /$qrptr->{q{'.$rulename.'}}/o) { $self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "body");
Created attachment 5666 [details] Complete tflags nosubject patch Here complete patch along with documentation and M::SA::Conf::has_tflags_nosubject.
There is no way this should go into 3.4 to be clear. Changing the milestone and I will -1 vote for a 3.4. TOTALLY +1 for 4.0 though I need to test the patch more and I don't have 4.0 setup at the moment, just 3.4 RC's.
Kevin, are you allowed to decide things with only 2 votes? It's currently +1 -1. The latest patch is 100% working, guaranteed. Please someone else vote.
Created attachment 5667 [details] Nosubject patch for 3.4 Patch for 3.4 attached. It 100% identical, just the sub has_tflags_nosubject { 1 } had to be put in place manually. 3.4 and trunk code are 100% identical the way the eval stuff works. There is nothing this patch does differently between them.
Honestly I don't care if 3.4 is nuked, but this patch is super simple. If we leave 3.4 without the feature, we might as well never use it, since 3.4 will be around forever.
Thanks for the clarification. I rescind my veto if the patch is different in that 3.4 requires a tflag not to eval the subject as part of the body and 4.0 by default requires a tlag to eval the subject, that's ok. How do you predict that each will handle the tflags for the other? Assuming harmlessly ignored?
(In reply to Kevin A. McGrail from comment #10) > 4.0 by default requires a tlag to eval the subject, that's ok. I think that's a bad idea. Few body rules need the subject excluded, most benefit from it, so practically all body rules will end-up needing a tflags line.
(In reply to Kevin A. McGrail from comment #10) > Thanks for the clarification. I rescind my veto if the patch is different > in that 3.4 requires a tflag not to eval the subject as part of the body and > 4.0 by default requires a tlag to eval the subject, that's ok. No, I think I will always vote -1 for 4.0 if you are suggesting to drop subject by default. People are still using even 3.3, this subject stuff has been in so long there that it would be probably silly to change now. And even sillier to commit 3.4 with a "nosubject" flag and 4.0 with a "subject" flag. I think very few rules actually would NOT want to match subject. So it makes sense to have a "nosubject" flag, instead of "subject" flag. But I'm working on so many things right now, it's hard to concentrate, I reserve the right to change my mind later. :-)
-1 for removing body-matches-subject-too behavior entirely from any version +1 for adding "tflags nosubject" to 4.0 Undecided about adding it to 3.4 as well; if it breaks 3.3 and earlier then definitely -1, but if it's just ignored then the concern becomes the effects of a rules getting interpreted (slightly) differently by different versions. If we feel we want it in 3.4 then something like "can(Mail::SpamAssassin::Conf::body_tflag_nosubject)" might be a good idea. Since rules updates are generated from trunk, then that can() might be a good idea regardless... so: +1 for adding it to 3.4 along with can(Mail::SpamAssassin::Conf::body_tflag_nosubject)
Already has the has_tflags_nosubject as mentioned. No SpamAssassin version lints tflags, you can put any crap you want in them and it won't croak.
It seems "nosubject" is accepted for trunk, committing to get it out of the way. Sending trunk/lib/Mail/SpamAssassin/Conf.pm Sending trunk/lib/Mail/SpamAssassin/Plugin/Check.pm Transmitting file data ..done Committing transaction... Committed revision 1864791. For 3.4 it's already +2 I think, I'll let Kevin say his final words, but I think it's accepted regardless?
Forgot docs.. Sending trunk/UPGRADE Transmitting file data .done Committing transaction... Committed revision 1864792.
Henrik, can you recap what the current state is for 3.4 and for 4.0? Are both the same and you have a tflag for nosubject? I just want to be clear on testing so I can get 3.4.3 to continue moving along.
- 3.4 is not touched - trunk has nosubject flag I will commit 3.4 now unless there's any real objections, there's already +1 from me and John.
Yes, I'm +1 on both after discussion. I had hoped to change the behavior for 4.0 but such is life.
All done now. I'm not working on anything on 3.4 anymore, so Kevin, feel free to try to put rc4 out. :-) Sending spamassassin-3.4/UPGRADE Sending spamassassin-3.4/lib/Mail/SpamAssassin/Conf.pm Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/Check.pm Transmitting file data ...done Committing transaction... Committed revision 1864877.
Thanks Henrik. I appreciate all the extra work!
Just a note that with rc4 or greater for 3.4.3, it worked as expected. Here's how I tested: Add this rule and an email with subject example and example not in the body #Testing Rule for NoSubject Rules - See note 58246 if (version >= 3.004003) #SHOULD HIT body NOSUBJECT_TEST_HIT /example/i describe NOSUBJECT_TEST_HIT This should hit on an email with example in the subject but not in the body because subjects are automatically prepending for testing. #SHOULD NOT HIT body NOSUBJECT_TEST_FAIL /example/i describe NOSUBJECT_TEST_FAIL This should NOT hit on an email with example in the subject not not in the body beca use the tflag nosubject will stop the automatic prepending of subjects for testing. tflags NOSUBJECT_TEST_FAIL nosubject endif