Bug 4363 - [review] rfe: windows line ending support
Summary: [review] rfe: windows line ending support
Status: RESOLVED DUPLICATE of bug 5250
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 3.1.7
Hardware: PC Windows 2000
: P5 normal
Target Milestone: 3.1.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: ready to commit
Keywords:
: 4068 4289 4620 (view as bug list)
Depends on:
Blocks: 5250
  Show dependency tree
 
Reported: 2005-05-24 05:24 UTC by Nikolay V. Buslaev
Modified: 2007-01-16 03:00 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Change STDIN and STDOUT to binmode patch None Nikolay V. Buslaev [NoCLA]
Correct message headers generation to RFC822 patch None Nikolay V. Buslaev [NoCLA]
Change INPUT to binmode patch None Nikolay V. Buslaev [NoCLA]
patch as committed for 3.2 patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Nikolay V. Buslaev 2005-05-24 05:24:22 UTC
When you use activeperl, you need to set binmode for STDIN STDOUT for correct 
parsing 8bit messages.
But only for use this mode - you need to correct @pristine_headers adding to 
\r\n end of line.
Comment 1 Nikolay V. Buslaev 2005-05-24 05:26:22 UTC
Created attachment 2896 [details]
Change STDIN and STDOUT to binmode

Change STDIN and STDOUT to binmode for win32, os2 and dos platforms
Comment 2 Nikolay V. Buslaev 2005-05-24 05:28:22 UTC
Created attachment 2897 [details]
Correct message headers generation to RFC822

Correct message headers generation to RFC822 when wou set binmode.
Comment 3 Bob Menschel 2005-05-25 23:34:07 UTC
Triage: Needs testing for validity and reliability. Sounds like it should be
incorporated into 3.1 if the patches are good. 
Comment 4 Nico 2005-05-26 05:05:12 UTC
Patches apply well to 3.03 and show no issues since about 5k mails. Could it be
that it even causes a noticeable speedup under Activeperl if files are accessed
in binmode? However, the guys over at mozilla.org had a similar issue
(https://bugzilla.mozilla.org/show_bug.cgi?id=62000) and came to the conclusion
that (quoting): "While there is some speculation over always using binmode (even
with text files), nobody has reported any complications and many people have
successfully used binmode with all types of files on Win32, so I think it's sane
until otherwise proven"

Regards, Nico
Comment 5 Loren Wilton 2005-05-27 18:58:20 UTC
Suggest, instead of patching the startup script to set the handles to binary, 
move this code into SA processing itself, so it will always be done, regardless 
of the method used to glue SA into the mail processing.  

Since SA can handle both \n and \r\n line endings correctly, I can see no 
reason that it should not pass the message content through undisturbed as far 
as line endings are concerned.  Doing so requires binmode on a PC.
Comment 6 Justin Mason 2005-05-31 21:32:32 UTC
ok, r179354 includes the binmode() fix.  however, the other half, the
PerMsgStatus changes, is trickier.

this is because we have documented the report() APIs on that object as returning
report texts with \n as the newline separator, not \r\n.   we should possibly
change rewrite() to rewrite the message using whatever the correct newline
separator is (based on the message string itself, not on the current platform!)
-- but doing that fully, while avoiding adding \r\n to the text returned via
get_report(), is the hard part.
Comment 7 Nikolay V. Buslaev 2005-06-01 03:11:56 UTC
Created attachment 2914 [details]
Change INPUT to binmode

For correct process message from File (not from stream) in binmode
Comment 8 Nikolay V. Buslaev 2005-06-01 03:36:01 UTC
In RFC822 newline separator '\r\n' defined for mail headers.
When @pristine_headers generated in binmode() stream all message headers 
correct.
And message output print join('', @pristine_headers, "\r\n", ...
work correct.
When You add custom header like
push(@pristine_headers, "X-Spam-$header: $line\n");
in binmode - newline separator not correct for RFC822

For correct work binmode() fixes - You must correct all \n in message headers 
to \r\n before print it.
Comment 9 Theo Van Dinter 2005-06-01 08:11:41 UTC
Subject: Re:  ActivePerl compatibility

On Wed, Jun 01, 2005 at 03:36:01AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> In RFC822 newline separator '\r\n' defined for mail headers.
>
> For correct work binmode() fixes - You must correct all \n in message headers 
> to \r\n before print it.

We can't just assume \r\n, because most of the time it's just \n for SA users...

IMO, SA ought to check what the first EOL chars are and use them
throughout.  Something like:

$self->{eol} = ($message[0] =~ /\r\n/) ? "\r\n" : "\n";

During parsing.  Then, whenever we generate anything, use the value
appropriately.

Comment 10 Justin Mason 2005-06-01 09:28:43 UTC
agreed with felicity.

in particular, even if rfc-2822 mandates \r\n line endings, (a) that's not what
historically has applied on UNIX platforms and (b) we cannot break backwards
compatibility in that respect for callers on UNIX platforms.
Comment 11 Tom Schulz 2005-06-01 10:33:33 UTC
Note that rfc-2822 mandates \r\n line endings only for messages sent from one
message transfer agent to another message transfer agent.  This generally means
mail sent by the smtp protocol, usually from one machine to another machine.
Rfc-2822 specifically states that this does not apply to how messages are stored
on or transferred within a system.  As SpamAssassin is usually used on mail as
it is being processed within a system, it should use the line endings that
are found to be in use on the message being processed.

Note that I think that there is another bug open on this same problem.
Comment 12 Nikolay V. Buslaev 2005-06-01 20:50:10 UTC
I run SA as MS Exchange SMTP plugin.
When SA change Mail headers newline to '\n' - other plugins can work incorrect.
Comment 13 Daryl C. W. O'Shea 2005-06-02 00:52:41 UTC
Subject: Re:  ActivePerl compatibility

Agreed with Theo and Justin regarding auto-detecting the line ending. 
This has been discussed in bug 4068, BTW.

I can also confirm what Tom said about what rfc-2822 requires for line 
endings (\r\n is only required when the message is in transit).

Comment 14 Loren Wilton 2005-06-02 00:57:33 UTC
Subject: Re:  ActivePerl compatibility

As an alternative to detecting the line ending in the curremt message and
duplicating it, how about checking for Windoze platforms and using \r\n in
those cases.  That would probably fix the OPs problem without breaking Unix
systems.

Comment 15 Theo Van Dinter 2005-06-02 08:59:14 UTC
Subject: Re:  ActivePerl compatibility

On Thu, Jun 02, 2005 at 12:57:33AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> As an alternative to detecting the line ending in the curremt message and
> duplicating it, how about checking for Windoze platforms and using \r\n in
> those cases.  That would probably fix the OPs problem without breaking Unix
> systems.

This isn't just a Windows issue IMO.  If I were to pass a message using
CRLF to SA on my *NIX machine, I'd want it to DTRT(tm).

Comment 16 Justin Mason 2005-06-02 09:27:33 UTC
'This isn't just a Windows issue IMO.  If I were to pass a message using
CRLF to SA on my *NIX machine, I'd want it to DTRT(tm).'

and vice-versa, a UNIX scanner using spamc to send mails to a windows server
should get the line endings back as \n.  using the data to determine the correct
behaviour makes more sense.
Comment 17 Justin Mason 2005-06-02 09:27:57 UTC
*** Bug 4068 has been marked as a duplicate of this bug. ***
Comment 18 Justin Mason 2005-06-02 09:28:37 UTC
retitling
Comment 19 Loren Wilton 2005-06-02 15:12:25 UTC
Subject: Re:  ActivePerl compatibility

> using the data to determine the correct
> behaviour makes more sense.

I won't disagree with that.  But that idea has been floating around for at
least 4 months now and hasn't happened.

I was trying to come up with something that could probably be implemented in
a couple lines of code and actually happen in the 3.0.4 / 3.1 timeframe.

Comment 20 Theo Van Dinter 2005-06-02 15:15:01 UTC
Subject: Re:  rfe: windows line ending support

On Thu, Jun 02, 2005 at 03:12:25PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I was trying to come up with something that could probably be implemented in
> a couple lines of code and actually happen in the 3.0.4 / 3.1 timeframe.

It could happen for 3.1, it's doubtful for 3.0.  It's also basically the same
amount of code for "windows?" or "check the EOL of the message".  The bits
that output reference a variable, the only difference is where the variable is
set and how the code decides what to set the variable to.

Comment 21 Malte S. Stretz 2005-06-09 04:47:25 UTC
*** Bug 4289 has been marked as a duplicate of this bug. ***
Comment 22 Duncan Findlay 2005-07-23 16:23:20 UTC
It's too late for this in 3.1. We should fix this for 3.2.
Comment 23 Theo Van Dinter 2005-10-07 09:24:47 UTC
*** Bug 4620 has been marked as a duplicate of this bug. ***
Comment 24 Christopher G. Lewis 2005-10-14 15:57:25 UTC
This is a crosspost w/ http://bugzilla.spamassassin.org/show_bug.cgi?id=4068 

Hi - 
  I've written a relatively well used Exchange Sink for SpamAssassin in the 
Win32 world, (http://www.christopherlewis.com/ExchangeSpamAssassin.htm) and 
3.1.0 is seriously broken with regards to this CRLF issue.  

All message header parsing is now screwy because the X-Spam headers now look 
like this:

X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on ... \n
X-Spam-Level: \n
X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL... \n
Received: from xplewis01 ([127.0.0.1]) by ... \r\n

and reloading this back into exchange breaks the message header.

Note that this doesn't occur in the 2.6x or 3.0.x branches, so something has 
changed in the 3.1 release.

Thanks
Comment 25 John Gardiner Myers 2005-10-14 16:00:41 UTC
The code calling SpamAssassin needs to convert \r\n into \n before entering
SpamAssassin and convert \n to \r\n after it returns.
Comment 26 Theo Van Dinter 2005-10-14 16:06:27 UTC
Subject: Re:  rfe: windows line ending support

On Fri, Oct 14, 2005 at 03:57:26PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on ... \n
> X-Spam-Level: \n
> X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL... \n
> Received: from xplewis01 ([127.0.0.1]) by ... \r\n
> 
> Note that this doesn't occur in the 2.6x or 3.0.x branches, so something has 
> changed in the 3.1 release.

Besides having the X-Spam-* headers as trace headers (up at the top) instead
of at the bottom, how is the line/line ending any different than before?  If
the line endings were being dealt with before (which they would have to be if
LF and not CRLF breaks Exchange header parsing), why aren't they being dealt
with the same way now?

Comment 27 Theo Van Dinter 2005-10-24 03:09:08 UTC
ok, I committed r327897 which does a very simple check of the first header in the message.  If that header 
uses CRLF, then whenever the rewrite_mail() function is called, it will make sure all of the line endings are 
also CRLF.  Otherwise, it'll keep doing LF as has been the standard.

I only did a small amount of testing, but feel free to try out the 3.2 code.  I'll put up the patch if you want 
to try it with 3.1 instead.
Comment 28 Theo Van Dinter 2005-10-24 03:09:48 UTC
Created attachment 3204 [details]
patch as committed for 3.2
Comment 29 Christopher G. Lewis 2005-11-07 19:48:05 UTC
(In reply to comment #28)
> Created an attachment (id=3204) [edit]
> patch as committed for 3.2

(In reply to comment #27)
> ok, I committed r327897 which does a very simple check of the first header in 
the message.  If that header 
> uses CRLF, then whenever the rewrite_mail() function is called, it will make 
sure all of the line endings are 
> also CRLF.  Otherwise, it'll keep doing LF as has been the standard.
> I only did a small amount of testing, but feel free to try out the 3.2 code.  
I'll put up the patch if you want 
> to try it with 3.1 instead.


patch looks good.  Headers in Exchange seem to be parsing better, and the 
header object (CDO) looks like its no longer corrupt.

Thanks!
Comment 30 Daryl C. W. O'Shea 2006-03-06 23:05:54 UTC
aiming at 3.1.1
Comment 31 Daryl C. W. O'Shea 2006-03-07 05:17:17 UTC
still applies/make tests/works against 3.1

+1 for 3204
Comment 32 Sidney Markowitz 2006-03-07 05:29:15 UTC
+1 for patch 3204 for 3.1

I guess we need one more vote because Theo uploaded the patch for 3.2, right?
Comment 33 Theo Van Dinter 2006-03-07 05:33:32 UTC
(In reply to comment #32)
> +1 for patch 3204 for 3.1
> I guess we need one more vote because Theo uploaded the patch for 3.2, right?

I'm fine with putting it in 3.1.1, so it's all set.

Sending        lib/Mail/SpamAssassin/Message.pm
Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
Transmitting file data ..
Committed revision 383789.
Comment 34 Ivan Baldo 2006-05-26 14:21:43 UTC
I am receiving emails processed by SpamAssassin 3.1.1 that have \r\n line
endings on them and QMail-Scanner complains that they contain invalid characters
on their headers.
The \r\n are on the SpamAssassin lines only.
Could be a misconfiguration or is SpamAssassin misbehaving?
Any one seen this?
Thank you.


(In reply to comment #33)
> (In reply to comment #32)
> > +1 for patch 3204 for 3.1
> > I guess we need one more vote because Theo uploaded the patch for 3.2, right?
> 
> I'm fine with putting it in 3.1.1, so it's all set.
> 
> Sending        lib/Mail/SpamAssassin/Message.pm
> Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
> Transmitting file data ..
> Committed revision 383789.
> 

Comment 35 Theo Van Dinter 2006-05-26 16:02:50 UTC
(In reply to comment #34)
> I am receiving emails processed by SpamAssassin 3.1.1 that have \r\n line
> endings on them and QMail-Scanner complains that they contain invalid characters
> on their headers.
> The \r\n are on the SpamAssassin lines only.
> Could be a misconfiguration or is SpamAssassin misbehaving?

It sounds like a problem with qmail-scanner to me.  However, if you can provide a copy of an email 
exhibiting this behavior before SA processes it and after, we could do some debugging.
Comment 36 Nikolay V. Buslaev 2006-12-08 03:13:32 UTC
patch from Theo Van Dinter work incorrect with binary messages in MS-TNEF 
format.
When You change \r?\n in message body - you can replace single \n to \r\n
To correct only headers - you ned to replace 
$msg =~ s/\r?\n/$self->{msg}->{line_ending}/g;
to below code
#----------
$msg =~ m/\r?\n\r?\n/;
my $msghead = $`;
my $msgbody = $';
$msghead =~ s/\r?\n/$self->{msg}->{line_ending}/g;
$msg=$msghead.$self->{msg}->{line_ending}.$self->{msg}->{line_ending}.$msgbody;
Comment 37 Nikolay V. Buslaev 2006-12-08 03:28:22 UTC
For correct work with ActivePerl when open message from file (not from stdin)? 
you need add string in ArchiveIterator.pm in mail_open function
if (Mail::SpamAssassin::Util::am_running_on_windows()) binmode(INPUT);
before
return 1;
Comment 38 Justin Mason 2007-01-16 02:57:09 UTC
Nikolay opened a separate bug for his issue - bug 5250.

*** This bug has been marked as a duplicate of 5250 ***
Comment 39 Justin Mason 2007-01-16 03:00:37 UTC
also, http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4363#c37 was fixed
in bug 5249