Bug 7729 - [review] body rules to match body only, not including the Subject
Summary: [review] body rules to match body only, not including the Subject
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: 4.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-01 11:50 UTC by Karsten Bräckelmann
Modified: 2019-11-11 15:58 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Complete tflags nosubject patch patch None Henrik Krohns [HasCLA]
Nosubject patch for 3.4 patch None Henrik Krohns [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Karsten Bräckelmann 2019-07-01 11:50:25 UTC
The current body rule does not only match the actual body, but also includes the Subject as pseudo-part of the body.

While this behaviour is clearly documented, people do get caught be this (meta rule to match some phrases in the body AND subject). It is non-trivial to write a rule matching the body only, or actually matching both the body and the Subject.

M:SA:Conf
  "The message Subject header is considered part of the body
   and becomes the first paragraph when running the rules."

We should implement a tflags test flag for body rules to exclude the Subject from matching.

Along with the next major branch 4.x we may consider making this the default, with Subject prepending a tflags option.
Comment 1 Henrik Krohns 2019-08-08 08:23:51 UTC
Yes it seems quite pointless shortcut to "maybe" reduce the amount of duplicate rules.

It makes no sense to add separate tflag for this. If one wants to match Subject, then it should be a separate header rule!

I vote +1 to remove legacy subject matching from body.
Comment 2 Henrik Krohns 2019-08-08 08:29:06 UTC
Only thing required to fix after that would probably be the schemeless uri parser, which currently grabs uris and emails from the Subject. It's a one line fix to code to add it back to it.
Comment 3 Henrik Krohns 2019-08-08 09:00:52 UTC
Ok vetoing my vote. Tested which rules hit subject, and there are maaaany. Thinking further, some people will probably stick with 3.4 for years and years, it would be nightmare to just remove Subject and try to maintain compatibility for all..

Instead here is the extremely simple code required to implement a "nosubject" tflag. Just add a small documentation blurt, and it can be committed to 3.4.3 very safely, please vote, +1 me.

--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -827,6 +827,9 @@
       $loopid++;
       my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
       $max = untaint_var($max);
+      if ($conf->{tflags}->{$rulename} =~ /\bnosubject\b/) {
+        $sub .= "\n".'shift @_;'; # remove subject line
+      }
       $sub .= '
       $hits = 0;
       body_'.$loopid.': foreach my $l (@_) {
@@ -844,6 +847,9 @@
     else {
       # omitting the "pos" call, "body_loopid" label, use of while()
       # instead of if() etc., shaves off 8 perl OPs.
+      if (($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/) {
+        $sub .= "\n".'shift @_;'; # remove subject line
+      }
       $sub .= '
       foreach my $l (@_) {
         '.$self->hash_line_for_rule($pms, $rulename).'
Comment 4 Henrik Krohns 2019-08-08 09:34:15 UTC
Slightly revised patch, should not modify @_ as it's used elsewhere..

Eval string will include two extra nosubj codelines if flagged

    if ($scoresptr->{q{FOO}}) {
      my $nosubj = 1; ### IF FLAGGED
      foreach my $l (@_) {
        if ($nosubj) { $nosubj = 0; next; } ### IF FLAGGED
        if ($l =~ /$qrptr->{q{FOO}}/o) {
          $self->got_hit(q{FOO}, "BODY: ", ruletype => "body");
          last;
        }
      }
    }


--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -821,6 +821,12 @@
       dbg("rules-all: running body rule %s", q{'.$rulename.'});
       ';
     }
+    my $nosubject = ($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/;
+    if ($nosubject) {
+      $sub .= '
+      my $nosubj = 1;
+      ';
+    }
     if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
     {
       # support multiple matches
@@ -830,6 +836,13 @@
       $sub .= '
       $hits = 0;
       body_'.$loopid.': foreach my $l (@_) {
+      ';
+      if ($nosubject) {
+        $sub .= '
+        if ($nosubj) { $nosubj = 0; next; }
+        ';
+      }
+      $sub .= '
         pos $l = 0;
         '.$self->hash_line_for_rule($pms, $rulename).'
         while ($l =~ /$qrptr->{q{'.$rulename.'}}/go'. ($max? ' && $hits++ < '.$max:'') .') {
@@ -846,6 +859,13 @@
       # instead of if() etc., shaves off 8 perl OPs.
       $sub .= '
       foreach my $l (@_) {
+      ';
+      if ($nosubject) {
+        $sub .= '
+        if ($nosubj) { $nosubj = 0; next; }
+        ';
+      }
+      $sub .= '
         '.$self->hash_line_for_rule($pms, $rulename).'
         if ($l =~ /$qrptr->{q{'.$rulename.'}}/o) {
           $self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "body");
Comment 5 Henrik Krohns 2019-08-08 09:46:49 UTC
Created attachment 5666 [details]
Complete tflags nosubject patch

Here complete patch along with documentation and M::SA::Conf::has_tflags_nosubject.
Comment 6 Kevin A. McGrail 2019-08-08 13:45:38 UTC
There is no way this should go into 3.4 to be clear. Changing the milestone and I will -1 vote for a 3.4.  TOTALLY +1 for 4.0 though I need to test the patch more and I don't have 4.0 setup at the moment, just 3.4 RC's.
Comment 7 Henrik Krohns 2019-08-08 13:47:49 UTC
Kevin, are you allowed to decide things with only 2 votes? It's currently +1 -1.

The latest patch is 100% working, guaranteed.

Please someone else vote.
Comment 8 Henrik Krohns 2019-08-08 13:51:30 UTC
Created attachment 5667 [details]
Nosubject patch for 3.4

Patch for 3.4 attached. It 100% identical, just the sub has_tflags_nosubject { 1 } had to be put in place manually.

3.4 and trunk code are 100% identical the way the eval stuff works. There is nothing this patch does differently between them.
Comment 9 Henrik Krohns 2019-08-08 13:56:39 UTC
Honestly I don't care if 3.4 is nuked, but this patch is super simple. If we leave 3.4 without the feature, we might as well never use it, since 3.4 will be around forever.
Comment 10 Kevin A. McGrail 2019-08-08 14:17:11 UTC
Thanks for the clarification.  I rescind my veto if the patch is different in that 3.4 requires a tflag not to eval the subject as part of the body and 4.0 by default requires a tlag to eval the subject, that's ok.

How do you predict that each will handle the tflags for the other?  Assuming harmlessly ignored?
Comment 11 RW 2019-08-08 14:28:46 UTC
(In reply to Kevin A. McGrail from comment #10)

> 4.0 by default requires a tlag to eval the subject, that's ok.

I think that's a bad idea. Few body rules need the subject excluded, most benefit from it, so practically all body rules will end-up needing a tflags line.
Comment 12 Henrik Krohns 2019-08-08 14:29:50 UTC
(In reply to Kevin A. McGrail from comment #10)
> Thanks for the clarification.  I rescind my veto if the patch is different
> in that 3.4 requires a tflag not to eval the subject as part of the body and
> 4.0 by default requires a tlag to eval the subject, that's ok.

No, I think I will always vote -1 for 4.0 if you are suggesting to drop subject by default.

People are still using even 3.3, this subject stuff has been in so long there that it would be probably silly to change now. And even sillier to commit 3.4 with a "nosubject" flag and 4.0 with a "subject" flag.

I think very few rules actually would NOT want to match subject. So it makes sense to have a "nosubject" flag, instead of "subject" flag.

But I'm working on so many things right now, it's hard to concentrate, I reserve the right to change my mind later. :-)
Comment 13 John Hardin 2019-08-08 17:12:11 UTC
-1 for removing body-matches-subject-too behavior entirely from any version

+1 for adding "tflags nosubject" to 4.0

Undecided about adding it to 3.4 as well; if it breaks 3.3 and earlier then definitely -1, but if it's just ignored then the concern becomes the effects of a rules getting interpreted (slightly) differently by different versions.

If we feel we want it in 3.4 then something like "can(Mail::SpamAssassin::Conf::body_tflag_nosubject)" might be a good idea.

Since rules updates are generated from trunk, then that can() might be a good idea regardless...

so: +1 for adding it to 3.4 along with can(Mail::SpamAssassin::Conf::body_tflag_nosubject)
Comment 14 Henrik Krohns 2019-08-08 17:21:52 UTC
Already has the has_tflags_nosubject as mentioned.

No SpamAssassin version lints tflags, you can put any crap you want in them and it won't croak.
Comment 15 Henrik Krohns 2019-08-09 12:22:01 UTC
It seems "nosubject" is accepted for trunk, committing to get it out of the way.

Sending        trunk/lib/Mail/SpamAssassin/Conf.pm
Sending        trunk/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1864791.

For 3.4 it's already +2 I think, I'll let Kevin say his final words, but I think it's accepted regardless?
Comment 16 Henrik Krohns 2019-08-09 12:23:56 UTC
Forgot docs..

Sending        trunk/UPGRADE
Transmitting file data .done
Committing transaction...
Committed revision 1864792.
Comment 17 Kevin A. McGrail 2019-08-10 15:04:38 UTC
Henrik, can you recap what the current state is for 3.4 and for 4.0?  Are both the same and you have a tflag for nosubject?

I just want to be clear on testing so I can get 3.4.3 to continue moving along.
Comment 18 Henrik Krohns 2019-08-10 15:09:39 UTC
- 3.4 is not touched
- trunk has nosubject flag

I will commit 3.4 now unless there's any real objections, there's already +1 from me and John.
Comment 19 Kevin A. McGrail 2019-08-10 15:17:58 UTC
Yes, I'm +1 on both after discussion.  I had hoped to change the behavior for 4.0 but such is life.
Comment 20 Henrik Krohns 2019-08-10 15:22:27 UTC
All done now.

I'm not working on anything on 3.4 anymore, so Kevin, feel free to try to put rc4 out. :-)


Sending        spamassassin-3.4/UPGRADE
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Conf.pm
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ...done
Committing transaction...
Committed revision 1864877.
Comment 21 Kevin A. McGrail 2019-08-10 15:29:23 UTC
Thanks Henrik.  I appreciate all the extra work!
Comment 22 Kevin A. McGrail 2019-11-11 15:58:44 UTC
Just a note that with rc4 or greater for 3.4.3, it worked as expected.

Here's how I tested:
Add this rule and an email with subject example and example not in the body

#Testing Rule for NoSubject Rules - See note 58246
if (version >= 3.004003)
        #SHOULD HIT
        body            NOSUBJECT_TEST_HIT      /example/i
        describe        NOSUBJECT_TEST_HIT      This should hit on an email with example in the subject but not in the body because
subjects are automatically prepending for testing.

        #SHOULD NOT HIT
        body            NOSUBJECT_TEST_FAIL     /example/i
        describe        NOSUBJECT_TEST_FAIL     This should NOT hit on an email with example in the subject not not in the body beca
use the tflag nosubject will stop the automatic prepending of subjects for testing.
        tflags          NOSUBJECT_TEST_FAIL     nosubject
endif