Bug 1544 - Subject header rewritten incorrectly
Summary: Subject header rewritten incorrectly
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 2.50
Hardware: PC Linux
: P5 minor
Target Milestone: 2.50
Assignee: Malte S. Stretz
URL:
Whiteboard:
Keywords: backport
Depends on:
Blocks:
 
Reported: 2003-02-23 16:30 UTC by Simon Byrnand
Modified: 2003-03-13 10:34 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
suggested patch patch None Theo Van Dinter [HasCLA]
sample message text/plain None Theo Van Dinter [HasCLA]
latest suggested patch to fix this patch None Theo Van Dinter [HasCLA]
ok, new version of the patch with rindex() and position 0 instead of index() patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Byrnand 2003-02-23 16:30:07 UTC
With Subject header rewriting enabled, an incomming spam with the subject 
header in all caps, eg:

SUBJECT: blah blah

When rewritten, results in a new subject header being added, eg:

Subject: ***SPAM*** blah blah

but the original subject header is not removed, so two (different) subject 
headers remain in the email causing undefined display behaviour in email 
clients.

According to Theo Van Dinter this is only occuring with report_safe 0.
Comment 1 Theo Van Dinter 2003-02-23 18:14:34 UTC
Created attachment 673 [details]
suggested patch
Comment 2 Theo Van Dinter 2003-02-23 18:32:33 UTC
Subject: Re: [SAdev]  New: Subject header rewritten incorrectly

On Sun, Feb 23, 2003 at 04:30:07PM -0800, bugzilla-daemon@hughes-family.org wrote:
> According to Theo Van Dinter this is only occuring with report_safe 0.

Yeah, we don't rewrite the subject with 'report_safe 1' ... :)

Anyway, the bug is because NoMailAudit::replace_header() only deletes
the header case sensitive as passed to it (ie: Subject), and not all
headers case insensitive that are found.

I have a suggested patch done up, but due to issues with the bugzilla
machine I can't actually send it in yet... :(

Comment 3 Theo Van Dinter 2003-02-23 18:36:27 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Sun, Feb 23, 2003 at 06:32:34PM -0800, bugzilla-daemon@hughes-family.org wrote:
> Yeah, we don't rewrite the subject with 'report_safe 1' ... :)

err...  replace, not rewrite.  (we do rewrite the subject, but report_safe
has a new header section so there's no replacing going on...)

Comment 4 Theo Van Dinter 2003-02-24 14:30:16 UTC
I've applied this to 2.60 already.  sending to malte for 2.50
Comment 5 Daniel Quinlan 2003-02-25 05:43:06 UTC
ISSUE: Do we have an example message that triggers this bug?  I am unable to
reproduce it.

Also, should this:

  if ( defined $header_name_only ) {

be written as:

  if (defined $header_name_only && $header_name_only) {

<grumble>Can you not put spaces after the parens?  It's that way everywhere
else in the tree (well, 98% of the time).</grumble>

You can also use a substr test with index instead of the !/^\Q$dhdr\E:/ thing.
When occuring in common code (more common than here), it can be a real boost
and it's a bit less spooky.
Comment 6 Daniel Quinlan 2003-02-25 05:49:03 UTC
ISSUE: this fix puts the Subject at the bottom of the message which seems
bad to me.  If report_safe is turned off, then I think people are aiming for
minimal modification of the message.  Is it possible to rewrite the header
in place even if the Subject case changes?

Ignore my previous about reproducing it.  I had rewrite_subject set to 0.  Duh.
Comment 7 Theo Van Dinter 2003-02-25 06:25:31 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Tue, Feb 25, 2003 at 05:49:04AM -0800, bugzilla-daemon@hughes-family.org wrote:
> ISSUE: this fix puts the Subject at the bottom of the message which seems
> bad to me.  If report_safe is turned off, then I think people are aiming for
> minimal modification of the message.  Is it possible to rewrite the header
> in place even if the Subject case changes?

Well, older versions have the same behavior (new header at the end),
so as far as the bug fix is concerned, there's no change.  We just make
sure to get rid of the old header.

As for modification of the message, I'm not sure what the issue is with
moving the Subject around.  Sure, it means the header is modified, but
1) if this is an issue use report_safe, and 2) there's no specification
that says the subject can't be the last line -- as long as it's valid
the end user shouldn't notice.

For 2.60, we should be able to record the location in header_order before
we delete the original one.  We can then move it around after the header
is actually replaced.  But that's an enhancement request. ;)

Comment 8 Theo Van Dinter 2003-02-25 06:36:44 UTC
Created attachment 678 [details]
sample message
Comment 9 Theo Van Dinter 2003-02-25 06:57:13 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Tue, Feb 25, 2003 at 05:43:06AM -0800, bugzilla-daemon@hughes-family.org wrote:
> ISSUE: Do we have an example message that triggers this bug?  I am unable to
> reproduce it.

I just took a generic piece of spam and made the Subject header name
uppercase.  I'll attach the one I've been using.

> Also, should this:
> 
>   if ( defined $header_name_only ) {
> 
> be written as:
> 
>   if (defined $header_name_only && $header_name_only) {

It doesn't really matter for 2.5x (nothing else calls the sub with that
parameter, so just defined is enough), but the change should be made
for completeness.  I've already added it to 2.60.

> <grumble>Can you not put spaces after the parens?  It's that way everywhere
> else in the tree (well, 98% of the time).</grumble>

Do you mean inside the parens?  "after the parens" is in the rest of
the code too.  Either way, why does this matter, as long as we don't
start doing:

  if ( some_crap() ) 
  {

  }

eek!

When you have multiple developers working on the same code, you tend to
see their individual styles come out.  I'd personally like to put all
the code through "perltidy -i=2".  Clean everything up in one shot. :)

> You can also use a substr test with index instead of the !/^\Q$dhdr\E:/ thing.
> When occuring in common code (more common than here), it can be a real boost
> and it's a bit less spooky.

Well, index is inefficient for this, but I don't know, I find a substr
version more spooky than the RE:

    @{$self->{header_order}} = grep($dhdr eq substr($_,0,length $dhdr), @{$self->{header_order}});

Feel free to change the code if you feel it's a big enough issue. :)

Comment 10 Malte S. Stretz 2003-02-25 07:06:13 UTC
Please use my correct mail addres 
Comment 11 Daniel Quinlan 2003-02-25 07:11:03 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

felicity@kluge.net writes:

> Well, older versions have the same behavior (new header at the end),
> so as far as the bug fix is concerned, there's no change.  We just make
> sure to get rid of the old header.

Ah, okay.  Consider that issue resolved.

> As for modification of the message, I'm not sure what the issue is with
> moving the Subject around.  Sure, it means the header is modified, but
> 1) if this is an issue use report_safe, and 2) there's no specification
> that says the subject can't be the last line -- as long as it's valid
> the end user shouldn't notice.

RFC 2822 says headers "SHOULD NOT be reordered" although it's not an
absolute requirement like almost everything else in that RFC.  I don't
remember anything that would prohibit fixing the case of a header name,
though.

   It is important to note that the header fields are not guaranteed to
   be in a particular order.  They may appear in any order, and they
   have been known to be reordered occasionally when transported over
   the Internet.  However, for the purposes of this standard, header
   fields SHOULD NOT be reordered when a message is transported or
   transformed.  More importantly, the trace header fields and resent
   header fields MUST NOT be reordered, and SHOULD be kept in blocks
   prepended to the message.  See sections 3.6.6 and 3.6.7 for more
   information.

We should fix 2.60 to follow the RFC.

Comment 12 Theo Van Dinter 2003-02-25 07:22:38 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Tue, Feb 25, 2003 at 07:06:14AM -0800, bugzilla-daemon@hughes-family.org wrote:
>          AssignedTo|mss@msquadrat.de            |spamassassin-
>                    |                            |contrib@msquadrat.de
> 
> 
> 
> ------- Additional Comments From spamassassin-contrib@msquadrat.de  2003-02-25 07:06 -------
> Please use my correct mail addres 

sorry, I can never remember your address, and a grep of the spamassassin
list has a number of different addresses, so I just picked one.

Comment 13 Daniel Quinlan 2003-02-25 10:38:50 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

> Do you mean inside the parens?  "after the parens" is in the rest of
> the code too.  Either way, why does this matter, as long as we don't
> start doing:
>
>   if ( some_crap() ) 
>   {
> 
>   }
> 
> eek!

Yes, I meant inside the parens just your example there that and in your
proposed patch.  Inside the parens is only used about 4% of the time in
our code (it has grown).
 
> When you have multiple developers working on the same code, you tend to
> see their individual styles come out.  I'd personally like to put all
> the code through "perltidy -i=2".  Clean everything up in one shot. :)

Is perltidy really capable of sanely indenting complicated perl code?  I
use Emacs to do my editing and I often have to nudge it to get good
formatting.

> Well, index is inefficient for this, but I don't know, I find a substr
> version more spooky than the RE:

Sorry, rindex:

  @{$self->{header_order}} = grep(!rindex($_,$dhdr,0), @{$self->{header_order}});

Comment 14 Malte S. Stretz 2003-02-25 10:59:49 UTC
Subject: Re:  Subject header rewritten incorrectly

Well, if you're already arguing about style:

On Tuesday 25 February 2003 19:38 CET Dan wrote:
>[...]
> Sorry, rindex:
>
>   @{$self->{header_order}} = grep(!rindex($_,$dhdr,0), @{$self->{header_order}});

Better:

@{$self->{header_order}} = grep { !rindex ($_, $dhdr) } @{$self->{header_order}};

The block expression makes clear that the rindex is applied on the grepped
array. The space after the function name seems to be the style used in SA 
(I don't like it though). And spaces after the commata between the arguments 
increase readability a lot :o)

Oh, and the zero for the last argument doesn't make any sense because rindex 
will look for $dhdr in $_ at or *before* position zero.

Cheers,
Malte

Comment 15 Theo Van Dinter 2003-02-25 11:05:11 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Tue, Feb 25, 2003 at 10:38:50AM -0800, bugzilla-daemon@hughes-family.org wrote:
> Yes, I meant inside the parens just your example there that and in your
> proposed patch.  Inside the parens is only used about 4% of the time in
> our code (it has grown).

heh heh.  my sphere of influence is growing!  ;)

but seriously ...  I don't think it really matters if there are spaces
or not.  It's not a large enough difference so people have a problem
with the different code, and I've been coding with spaces within the
parens for so many years I don't think about it anymore.  My fingers
just knock off the "if ( " by impulse.

> Is perltidy really capable of sanely indenting complicated perl code?  I
> use Emacs to do my editing and I often have to nudge it to get good
> formatting.

So far it's worked reliably for me.  I don't actually configure it beyond
"-i=2" (indent 2 spaces) though.

> Sorry, rindex:
> 
>   @{$self->{header_order}} = grep(!rindex($_,$dhdr,0), @{$self->{header_order}});

You know, I just noticed we're both wrong:

   @{$self->{header_order}} = grep($_ ne $dhdr, @{$self->{header_order}});

wow, that was simple. ;)

Comment 16 Theo Van Dinter 2003-02-25 11:49:27 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Tue, Feb 25, 2003 at 11:05:11AM -0800, bugzilla-daemon@hughes-family.org wrote:
> You know, I just noticed we're both wrong:
> 
>    @{$self->{header_order}} = grep($_ ne $dhdr, @{$self->{header_order}});

huh.  you know, I just noticed that I was wrong ...  ;)

I thought header_order was just a list of headers in the order they
should come out in, I didn't realize they had a count on them too.
Well, of the suggestions listed then, index is better than rindex,
and I'd search for "$dhdr:".

Comment 17 Daniel Quinlan 2003-02-25 11:58:20 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

Malte writes:

> The block expression makes clear that the rindex is applied on the
> grepped array. The space after the function name seems to be the style
> used in SA (I don't like it though). And spaces after the commata
> between the arguments increase readability a lot :o)

I don't like it either.  I think the consensus is much less clear about
spaces after function names.  I think that might be Justin's style.  I'm
sure I have a quirk or two myself, but nobody has complained yet.  :-)
 
> Oh, and the zero for the last argument doesn't make any sense because
> rindex will look for $dhdr in $_ at or *before* position zero.

Actually, I did it that way on purpose.  I *only* wanted to look at
offset 0.  Nothing is less than 0, so it reproduces exactly the
expression in Theo's patch:

  !/^\Q$dhdr\E:/

but, it's faster.  Your version allows it to look at any offset, so the
"^" part of the RE is lost.

Theo writes:

> You know, I just noticed we're both wrong:
>
>    @{$self->{header_order}} = grep($_ ne $dhdr, @{$self->{header_order}});
>
> wow, that was simple. ;)

That's not quite what your original code did, though.  There was no $ in
your RE.  If there was a "$", then "ne" is clearly the way to go.  :-)

Incidentally, I left out a ":" in the rindex expression, though, so it
should have read (also changed to != 0 since I think that's clearer)
like this:

  @{$self->{header_order}} = grep(rindex($_,"$dhdr:",0) != 0, @{$self->{header_order}});

Daniel

Comment 18 Theo Van Dinter 2003-02-25 12:08:00 UTC
Created attachment 681 [details]
latest suggested patch to fix this
Comment 19 Daniel Quinlan 2003-02-25 13:19:26 UTC
ISSUE: looks good except you want:

  rindex($_,"$dhdr:",0) != 0

instead of

  index($_,"$dhdr:") != 0

Otherwise, a $dhdr of "Subject" will match "Foo-Subject".
Comment 20 Daniel Quinlan 2003-02-25 13:31:56 UTC
Oops, correction, with the last patch, a $dhdr of "Subject" will match
"Foo-Subject", but the result won't be zero so it WILL work, it is just
inefficient as Theo correctly stated earlier (because it will search the
entire string).  The rindex form does, however, solve Theo's concern about
the inefficiency of index, though.

OKAY: will actually work (if theoretically slower)
Comment 21 Daniel Quinlan 2003-02-25 13:34:51 UTC
Okay, now I'm going to seem indecisive.

ISSUE: doesn't work because it will remove "later in the header" matches.

The rindex form is actually necessary.
Comment 22 Theo Van Dinter 2003-02-25 13:50:31 UTC
Subject: Re: [SAdev]  Subject header rewritten incorrectly

On Tue, Feb 25, 2003 at 01:34:51PM -0800, bugzilla-daemon@hughes-family.org wrote:
> Okay, now I'm going to seem indecisive.

Hehe.  You'd think this would be simple being a simple line of code.
How many times have we changed it? ;)

> ISSUE: doesn't work because it will remove "later in the header" matches.

index() should work, but it's a little less efficient than the rindex()
with position 0 (index() is actually better if you don't put the position
in there).  "Foo-Subject:" won't be caught. :)

> The rindex form is actually necessary.

To hopefully stop the discussion, I've put in the rindex() with position
0 into HEAD.  I'll update the patch again, and hopefully that'll be it. :)

Gah!

Comment 23 Theo Van Dinter 2003-02-25 13:52:05 UTC
Created attachment 682 [details]
ok, new version of the patch with rindex() and position 0 instead of index()
Comment 24 Daniel Quinlan 2003-02-25 14:28:03 UTC
OKAY: no comment :-)
Comment 25 Theo Van Dinter 2003-03-13 19:34:37 UTC
commited to 2.5 branch.  closing bug.