Bug 4068 - should use consistent CRLF/LF line endings
Summary: should use consistent CRLF/LF line endings
Status: RESOLVED DUPLICATE of bug 4363
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: 2.64
Hardware: Other Linux
: P5 normal
Target Milestone: 3.1.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-07 18:51 UTC by cng
Modified: 2005-10-12 19:51 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description cng 2005-01-07 18:51:49 UTC
According to RFC 822, each field in a message header must be terminated by a
CRLF character sequence (ASCII 13 and 10 characters) and the message header must
be separated from the message body by a null line (two contigious CRLF sequences).

I've noticed that when SpamAssassin adds fields to a message header, it doesn't
terminate its fields with CRLFs; it only adds a LF character to the end of each
of its fields. Since SpamAssassin adds its fields to the end of an existing
message header, this also means that for a message that SpamAssassin has
processed, instead of the required null line, a LF-CRLF sequence separates the
message header from the message body.

According to RFCs 2045 and 2046, CRLFs serve as line breaks to define MIME part
headers and bodies.

When SpamAssassin embeds the original message with a report and attaches the
original message as a MIME attachment to the message, it doesn't terminate the
lines that it adds with CRLFs.

Many e-mail clients and servers loosely comply with RFCs 822, 2045, and 2046
with respect to handling CRLFs but for clients and servers that are more strict,
messages that SpamAssassin have processed will be tagged as invalid (not RFC
822, 2045, or 2046 compliant) or will be incorrectly parsed because of the
missing CRLFs.

(Although SpamAssassin keeps the original message intact with CRLFs, the lines
that SpamAssassin adds do not have the required CRLFs.)

---
RFC 822 - Standard for the format of ARPA Internet text messages

3.2.  HEADER FIELD DEFINITIONS
field       =  field-name ":" [ field-body ] CRLF

B.2.  SEMANTICS
Headers occur before the message body and are terminated  by a null line (i.e.,
two contiguous CRLFs).
Comment 1 Tony Finch 2005-01-09 12:43:46 UTC
Line ending syntax in Unix email software is a very difficult area, because whether Internet CRLF 
endings or LF Unix endings are used depends on the context and the other software that is being used. 
It is NOT simply a matter of 822 conformance.
Comment 2 Malte S. Stretz 2005-01-10 11:36:26 UTC
If this is really an issue (all places where I used SA till now converted the 
CRLF to Unix LF first), we could look at the newline of the first header (most 
probably a Return-Path or Received line added by some relatively trusted 
server) and use that one for our own headers.  Plus maybe an option a la 
  add_header_terminator { auto | crlf | lf | cr }  (default: auto) 
 
If this is really an issue -- any real-world examples? 
Comment 3 Justin Mason 2005-01-10 12:00:53 UTC
as Tony said -- it's not simply a matter of RFC-2822 compliance, as a lot of
UNIX mail software will barf on input messages that contain CR-LFs instead of
just LFs.

IIRC, we use the platform default -- so LF on UNIX, and CRLF on windows -- and
assume the caller will canonicalise the input message in the first place.
Comment 4 Daniel Quinlan 2005-01-10 13:43:54 UTC
SA should just use whatever the input used.

No way this should be an option.  -1 to a new option.
Comment 5 Malte S. Stretz 2005-01-10 13:51:41 UTC
So something automatic like the following pseudo-code in the input routine? 
 
  if (!$self->crlf) { 
    $line =~ /(\012\015|\015|\012)$/ 
    $self->crlf = $1 
  } 
Comment 6 Daniel Quinlan 2005-01-10 14:04:39 UTC
Subject: Re:  SpamAssassin doesn't follow RFCs 822, 2045, and 2046

> So something automatic like the following pseudo-code in the input routine? 
>  
>   if (!$self->crlf) { 
>     $line =~ /(\012\015|\015|\012)$/ 
>     $self->crlf = $1 
>   } 

Yeah, but I'd only use the first line of the input, or maybe the last
line in the headers, for setting the parameter.

Comment 7 Auto-Mass-Checker 2005-01-10 14:21:30 UTC
Subject: Re:  SpamAssassin doesn't follow RFCs 822, 2045, and 2046 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Daniel Quinlan writes:
> > So something automatic like the following pseudo-code in the input routine? 
> >  
> >   if (!$self->crlf) { 
> >     $line =~ /(\012\015|\015|\012)$/ 
> >     $self->crlf = $1 
> >   } 
> 
> Yeah, but I'd only use the first line of the input, or maybe the last
> line in the headers, for setting the parameter.

first line makes sense to me too.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFB4v/lMJF5cimLx9ARAoo+AJ9CbBcm1IdSaPoEHpsI+QrxEipSvwCfY85a
fHonJY2Ky6G3VELjT6LksSM=
=20k9
-----END PGP SIGNATURE-----

Comment 8 Tony Finch 2005-01-10 15:05:59 UTC
Subject: Re:  SpamAssassin doesn't follow RFCs 822, 2045, and 2046

The reason I jumped in on this one is because Exim has gone through a lot
of problems in this area, particularly because of software that uses CRLF
in a situation where you would expect LF only (e.g. sendmail -t) or which
uses LF only where you would expect CRLF (e.g. various kinds of SMTP).

There have also been amusing problems with ClamAV which also has to cope
with line endings being CRLF or LF depending on how the message is fed to
it - some months ago I had to fix one of the new upstream virus signatures
because it assumed CRLF but in my environment the message had been
converted to LF only.

The current Exim line ending DWIM rules are documented at
http://www.exim.org/exim-html-4.40/doc/html/spec_44.html#SECT44.1
It's a perpetual problem area so this might not be the best solution.

Tony.
Comment 9 cng 2005-01-10 16:11:26 UTC
I'm familiar with the different line termination character sequences that UNIX
and Windows platforms use to create files. However, these RFCs only specify how
messages should be formed when being transmitted between platforms and not how
they are stored on any specific platform.

I haven't used or tested against as many e-mail clients/servers as you probably
have but of the stuff that I have tested (Windows - MS Outlook Express and
Thunderbird as MUAs, Debian Linux - UNIX mail to send messages, Thunderbird as
MUA, exim4 as MTA, and UW pop/imap server to manage mailboxes), they always
include CRLFs at the end of lines when they send/receive messages (at the SMTP,
POP3, and IMAP4 protocol levels). (I intercepted messages using the ethereal
packet sniffer to confirm the presences of CRLFs.)

In terms of a real world example, I created a setup where I intercepted messages
between e-mail clients and POP3/IMAP3 servers, passed the messages to
SpamAssassin, and retransmitted the resulting messages that SpamAssassin generated.

I found two problems:
1- I noticed that the modified messages were missing CRLFs on the lines that
SpamAssassin added. The message parser that I wrote to examine messages broke
because the parser is strictly compliant with the RFCs (on the assumption that
"bad" message senders would be sloppy in terms of message syntax).

2- When a message contained spam and SpamAssassin tagged and modified this
message, SpamAssassin added a closing MIME boundary delimiter without a CRLF.
When a POP3 client (MS Outlook Express and Thunderbird) received this modified
message, the client would wait indefinitely (or until its timer expired) for the
missing closing CRLF to specify the end of the RETR reply even if the RETR
reply's end-of-date sequence had already been transmitted after the message.

---
I think that the "add CRLFs only if a message already contains a null line with
CRLFs between message header and body or a CRLF is already present at the end of
a message (when there is no message body)" suggestion make sense.

If CRLFs exist, it probably means that a message is in transit between MUA and
MTA or between MTA and MTA (being handled according to the RFCs using SMTP,
POP3, or IMAP4 rules) or being copied as a file on Windows platforms. In this
case, SpamAssassin should terminate the lines that it adds with CRLFs.

If LFs exist, then the message is being copied as a file on UNIX platforms. In
this case, SpamAssassin should terminate the lines that it adds with LFs.
Comment 10 Justin Mason 2005-03-21 17:37:32 UTC
retitling, aiming at 3.1.0 (hopefully)
Comment 11 Daniel Quinlan 2005-04-12 14:44:32 UTC
optimistically moved to 3.1.1
Comment 12 Justin Mason 2005-06-02 09:27:57 UTC
marking dup of 4363, where more discussion has taken place

*** This bug has been marked as a duplicate of 4363 ***
Comment 13 Christopher G. Lewis 2005-10-12 22:12:12 UTC
Hi - 
  I've written a relatively well used Exchange Sink for SpamAssassin in the 
Win32 world, (http://www.christopherlewis.com/ExchangeSpamAssassin.htm) and 
3.1.0 is seriously broken with regards to this CRLF issue.  

All message header parsing is now screwy because the X-Spam headers now look 
like this:

X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on ... \n
X-Spam-Level: \n
X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL... \n
Received: from xplewis01 ([127.0.0.1]) by ... \r\n

and reloading this back into exchange breaks the message header.

Note that this doesn't occur in the 2.6x or 3.0.x branches, so something has 
changed in the 3.1 release.

Thanks
Comment 14 Thomas Eisenbarth 2005-10-13 03:51:11 UTC
I want to speak out against the recurring sentiment that *nix mail clients or 
servers are unable to correctly grok CRLF line endings. CRLF has always been the 
standard, even before Windows ever was invented.

So don't fall into the overengineering trap, like trying to guess which line 
ending to use. ALWAYS use CRLF, and nothing else, ever.

If spamc/spamd do not grok CRLF, they are broken.

Well, I suppose no one here likes Windows, but this dislike shouldn't discredit 
a harmless character sequence such as CRLF. And also, it shouldn't prevent 
anyone from doing the right thing.
Comment 15 Theo Van Dinter 2005-10-13 05:55:02 UTC
Subject: Re:  should use consistent CRLF/LF line endings

On Thu, Oct 13, 2005 at 03:51:11AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> So don't fall into the overengineering trap, like trying to guess which line 
> ending to use. ALWAYS use CRLF, and nothing else, ever.
> 
> If spamc/spamd do not grok CRLF, they are broken.

They read in CRLF, they just don't print out CRLF currently.
Since SpamAssassin isn't designed to run at the MTA level, there's no
specification about what the line endings ought to be, so we've used
"\n" as is standard in most perl apps I've seen.

I think that SA ought to "do the right thing" and figure out "\n" vs
"\r\n" -- just switching from one to the other will likely break a
bunch of apps that just expect "\n"...  For now (and since SA began),
if you're hooking into the MTA via some third party application, it's up
to *that application* to know to deal with the conversion.  This is the
"enhancement request" version of this issue.

For Windows/other CRLF-EOL platforms, SA doesn't put out the correct
line endings for the platform, which is the "bug" version of this issue.

It just so happens that the solution to both is the same.