SA Bugzilla – Bug 4363
[review] rfe: windows line ending support
Last modified: 2007-01-16 03:00:37 UTC
When you use activeperl, you need to set binmode for STDIN STDOUT for correct parsing 8bit messages. But only for use this mode - you need to correct @pristine_headers adding to \r\n end of line.
Created attachment 2896 [details] Change STDIN and STDOUT to binmode Change STDIN and STDOUT to binmode for win32, os2 and dos platforms
Created attachment 2897 [details] Correct message headers generation to RFC822 Correct message headers generation to RFC822 when wou set binmode.
Triage: Needs testing for validity and reliability. Sounds like it should be incorporated into 3.1 if the patches are good.
Patches apply well to 3.03 and show no issues since about 5k mails. Could it be that it even causes a noticeable speedup under Activeperl if files are accessed in binmode? However, the guys over at mozilla.org had a similar issue (https://bugzilla.mozilla.org/show_bug.cgi?id=62000) and came to the conclusion that (quoting): "While there is some speculation over always using binmode (even with text files), nobody has reported any complications and many people have successfully used binmode with all types of files on Win32, so I think it's sane until otherwise proven" Regards, Nico
Suggest, instead of patching the startup script to set the handles to binary, move this code into SA processing itself, so it will always be done, regardless of the method used to glue SA into the mail processing. Since SA can handle both \n and \r\n line endings correctly, I can see no reason that it should not pass the message content through undisturbed as far as line endings are concerned. Doing so requires binmode on a PC.
ok, r179354 includes the binmode() fix. however, the other half, the PerMsgStatus changes, is trickier. this is because we have documented the report() APIs on that object as returning report texts with \n as the newline separator, not \r\n. we should possibly change rewrite() to rewrite the message using whatever the correct newline separator is (based on the message string itself, not on the current platform!) -- but doing that fully, while avoiding adding \r\n to the text returned via get_report(), is the hard part.
Created attachment 2914 [details] Change INPUT to binmode For correct process message from File (not from stream) in binmode
In RFC822 newline separator '\r\n' defined for mail headers. When @pristine_headers generated in binmode() stream all message headers correct. And message output print join('', @pristine_headers, "\r\n", ... work correct. When You add custom header like push(@pristine_headers, "X-Spam-$header: $line\n"); in binmode - newline separator not correct for RFC822 For correct work binmode() fixes - You must correct all \n in message headers to \r\n before print it.
Subject: Re: ActivePerl compatibility On Wed, Jun 01, 2005 at 03:36:01AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > In RFC822 newline separator '\r\n' defined for mail headers. > > For correct work binmode() fixes - You must correct all \n in message headers > to \r\n before print it. We can't just assume \r\n, because most of the time it's just \n for SA users... IMO, SA ought to check what the first EOL chars are and use them throughout. Something like: $self->{eol} = ($message[0] =~ /\r\n/) ? "\r\n" : "\n"; During parsing. Then, whenever we generate anything, use the value appropriately.
agreed with felicity. in particular, even if rfc-2822 mandates \r\n line endings, (a) that's not what historically has applied on UNIX platforms and (b) we cannot break backwards compatibility in that respect for callers on UNIX platforms.
Note that rfc-2822 mandates \r\n line endings only for messages sent from one message transfer agent to another message transfer agent. This generally means mail sent by the smtp protocol, usually from one machine to another machine. Rfc-2822 specifically states that this does not apply to how messages are stored on or transferred within a system. As SpamAssassin is usually used on mail as it is being processed within a system, it should use the line endings that are found to be in use on the message being processed. Note that I think that there is another bug open on this same problem.
I run SA as MS Exchange SMTP plugin. When SA change Mail headers newline to '\n' - other plugins can work incorrect.
Subject: Re: ActivePerl compatibility Agreed with Theo and Justin regarding auto-detecting the line ending. This has been discussed in bug 4068, BTW. I can also confirm what Tom said about what rfc-2822 requires for line endings (\r\n is only required when the message is in transit).
Subject: Re: ActivePerl compatibility As an alternative to detecting the line ending in the curremt message and duplicating it, how about checking for Windoze platforms and using \r\n in those cases. That would probably fix the OPs problem without breaking Unix systems.
Subject: Re: ActivePerl compatibility On Thu, Jun 02, 2005 at 12:57:33AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > As an alternative to detecting the line ending in the curremt message and > duplicating it, how about checking for Windoze platforms and using \r\n in > those cases. That would probably fix the OPs problem without breaking Unix > systems. This isn't just a Windows issue IMO. If I were to pass a message using CRLF to SA on my *NIX machine, I'd want it to DTRT(tm).
'This isn't just a Windows issue IMO. If I were to pass a message using CRLF to SA on my *NIX machine, I'd want it to DTRT(tm).' and vice-versa, a UNIX scanner using spamc to send mails to a windows server should get the line endings back as \n. using the data to determine the correct behaviour makes more sense.
*** Bug 4068 has been marked as a duplicate of this bug. ***
retitling
Subject: Re: ActivePerl compatibility > using the data to determine the correct > behaviour makes more sense. I won't disagree with that. But that idea has been floating around for at least 4 months now and hasn't happened. I was trying to come up with something that could probably be implemented in a couple lines of code and actually happen in the 3.0.4 / 3.1 timeframe.
Subject: Re: rfe: windows line ending support On Thu, Jun 02, 2005 at 03:12:25PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > I was trying to come up with something that could probably be implemented in > a couple lines of code and actually happen in the 3.0.4 / 3.1 timeframe. It could happen for 3.1, it's doubtful for 3.0. It's also basically the same amount of code for "windows?" or "check the EOL of the message". The bits that output reference a variable, the only difference is where the variable is set and how the code decides what to set the variable to.
*** Bug 4289 has been marked as a duplicate of this bug. ***
It's too late for this in 3.1. We should fix this for 3.2.
*** Bug 4620 has been marked as a duplicate of this bug. ***
This is a crosspost w/ http://bugzilla.spamassassin.org/show_bug.cgi?id=4068 Hi - I've written a relatively well used Exchange Sink for SpamAssassin in the Win32 world, (http://www.christopherlewis.com/ExchangeSpamAssassin.htm) and 3.1.0 is seriously broken with regards to this CRLF issue. All message header parsing is now screwy because the X-Spam headers now look like this: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on ... \n X-Spam-Level: \n X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL... \n Received: from xplewis01 ([127.0.0.1]) by ... \r\n and reloading this back into exchange breaks the message header. Note that this doesn't occur in the 2.6x or 3.0.x branches, so something has changed in the 3.1 release. Thanks
The code calling SpamAssassin needs to convert \r\n into \n before entering SpamAssassin and convert \n to \r\n after it returns.
Subject: Re: rfe: windows line ending support On Fri, Oct 14, 2005 at 03:57:26PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on ... \n > X-Spam-Level: \n > X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL... \n > Received: from xplewis01 ([127.0.0.1]) by ... \r\n > > Note that this doesn't occur in the 2.6x or 3.0.x branches, so something has > changed in the 3.1 release. Besides having the X-Spam-* headers as trace headers (up at the top) instead of at the bottom, how is the line/line ending any different than before? If the line endings were being dealt with before (which they would have to be if LF and not CRLF breaks Exchange header parsing), why aren't they being dealt with the same way now?
ok, I committed r327897 which does a very simple check of the first header in the message. If that header uses CRLF, then whenever the rewrite_mail() function is called, it will make sure all of the line endings are also CRLF. Otherwise, it'll keep doing LF as has been the standard. I only did a small amount of testing, but feel free to try out the 3.2 code. I'll put up the patch if you want to try it with 3.1 instead.
Created attachment 3204 [details] patch as committed for 3.2
(In reply to comment #28) > Created an attachment (id=3204) [edit] > patch as committed for 3.2 (In reply to comment #27) > ok, I committed r327897 which does a very simple check of the first header in the message. If that header > uses CRLF, then whenever the rewrite_mail() function is called, it will make sure all of the line endings are > also CRLF. Otherwise, it'll keep doing LF as has been the standard. > I only did a small amount of testing, but feel free to try out the 3.2 code. I'll put up the patch if you want > to try it with 3.1 instead. patch looks good. Headers in Exchange seem to be parsing better, and the header object (CDO) looks like its no longer corrupt. Thanks!
aiming at 3.1.1
still applies/make tests/works against 3.1 +1 for 3204
+1 for patch 3204 for 3.1 I guess we need one more vote because Theo uploaded the patch for 3.2, right?
(In reply to comment #32) > +1 for patch 3204 for 3.1 > I guess we need one more vote because Theo uploaded the patch for 3.2, right? I'm fine with putting it in 3.1.1, so it's all set. Sending lib/Mail/SpamAssassin/Message.pm Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Transmitting file data .. Committed revision 383789.
I am receiving emails processed by SpamAssassin 3.1.1 that have \r\n line endings on them and QMail-Scanner complains that they contain invalid characters on their headers. The \r\n are on the SpamAssassin lines only. Could be a misconfiguration or is SpamAssassin misbehaving? Any one seen this? Thank you. (In reply to comment #33) > (In reply to comment #32) > > +1 for patch 3204 for 3.1 > > I guess we need one more vote because Theo uploaded the patch for 3.2, right? > > I'm fine with putting it in 3.1.1, so it's all set. > > Sending lib/Mail/SpamAssassin/Message.pm > Sending lib/Mail/SpamAssassin/PerMsgStatus.pm > Transmitting file data .. > Committed revision 383789. >
(In reply to comment #34) > I am receiving emails processed by SpamAssassin 3.1.1 that have \r\n line > endings on them and QMail-Scanner complains that they contain invalid characters > on their headers. > The \r\n are on the SpamAssassin lines only. > Could be a misconfiguration or is SpamAssassin misbehaving? It sounds like a problem with qmail-scanner to me. However, if you can provide a copy of an email exhibiting this behavior before SA processes it and after, we could do some debugging.
patch from Theo Van Dinter work incorrect with binary messages in MS-TNEF format. When You change \r?\n in message body - you can replace single \n to \r\n To correct only headers - you ned to replace $msg =~ s/\r?\n/$self->{msg}->{line_ending}/g; to below code #---------- $msg =~ m/\r?\n\r?\n/; my $msghead = $`; my $msgbody = $'; $msghead =~ s/\r?\n/$self->{msg}->{line_ending}/g; $msg=$msghead.$self->{msg}->{line_ending}.$self->{msg}->{line_ending}.$msgbody;
For correct work with ActivePerl when open message from file (not from stdin)? you need add string in ArchiveIterator.pm in mail_open function if (Mail::SpamAssassin::Util::am_running_on_windows()) binmode(INPUT); before return 1;
Nikolay opened a separate bug for his issue - bug 5250. *** This bug has been marked as a duplicate of 5250 ***
also, http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4363#c37 was fixed in bug 5249