Bug 6781

Summary:	multiple emails in From
Product:	Spamassassin	Reporter:	Lemat <lemat>
Component:	Rules	Assignee:	SpamAssassin Developer Mailing List <dev>
Status:	NEW ---
Severity:	normal	CC:	jhardin, kmcgrail, lemat, rwmaillists, software+spamassassin
Priority:	P2
Version:	unspecified
Target Milestone:	---
Hardware:	PC
OS:	Linux
Whiteboard:

Description Lemat 2012-03-31 13:21:59 UTC

Broken spamware sometimes put multiple emails in From: header:

Example:

From: something <smth@sth.tld> something2 <smth2@sth.tld> something3 <smth3@sth.tld>

Solution:

header MULTIPLE_FROM         From =~ /([^<]*<[^>]+@[^>]+>){2}/i     
describe MULTIPLE_FROM       multiple emails in From

Comment 1 D. Stussy 2012-03-31 22:52:56 UTC

How is multiple "from" mailboxes considered broken?  The standards and RFCs permit such a construct.  Granted, 99% of all electronic mail has only one mailbox on the header, that alone does not make multiple entries "broken."

RFC 5322, Section 3.6.2:

  from            =   "From:" mailbox-list CRLF
  mailbox-list    =   (mailbox *("," mailbox)) / obs-mbox-list

So, what's "broken" about it?

In contrast, the "Sender:" header takes a single mailbox.

Comment 2 Lemat 2012-03-31 23:32:51 UTC

You're right I was wrong, please read / replace "broken spamware" as / with "spamware is taking advantage of RFC and ...".

Maybe MULTIPLE_FROM can be combined with a list of User-Agents that allow to set up multiple emails in From? Or with a list of User-Agents that are known to not support such feature?

Frankly speaking I haven't seen multiple emails in From: up until recent spamrun.

Comment 3 Kevin A. McGrail 2012-04-02 22:55:45 UTC

(In reply to comment #2)
> You're right I was wrong, please read / replace "broken spamware" as / with
> "spamware is taking advantage of RFC and ...".
> 
> Maybe MULTIPLE_FROM can be combined with a list of User-Agents that allow to
> set up multiple emails in From? Or with a list of User-Agents that are known to
> not support such feature?
> 
> Frankly speaking I haven't seen multiple emails in From: up until recent
> spamrun.

I think this is a question less for RFCs and more for "is it indicative of Spam".

So looking at my SPAM that by-passed my filters and using a FAR more simplistic check: 

I have nothing except email addresses with "email@example.com" <email@example.com> as the grep ^From: | grep -E "@.*@"    

Check my hand sorted Ham corpus, I have zero hits.

Checking my automatic filtered Spam corpus, I also have zero hits.

In short, I have zero examples of any r/w cases with multiple emails in the From header. Not saying it doesn't exist but don't really see that this is going to have a high S/O unless you have a corpus where you are seeing this a lot.

regards,
KAM

Comment 4 D. Stussy 2012-04-03 22:11:17 UTC

To clarify:  My "objection" or point was to address that the construct was called broken when it is not.  I have no problem assigning a score to it indicating spaminess when it occurs (if indeed is it not witnessed on legitimate mail but is only for spam).

Comment 5 Mark Martinec 2012-05-17 16:28:53 UTC

I'm seeing these messages too, and some of them are sneaking in.

The RFC 5322 section 3.6.2 states:

  If the originator of the message can be indicated by a single mailbox
  and the author and transmitter are identical, the "Sender:" field
  SHOULD NOT be used.  Otherwise, both fields SHOULD appear.

As these spam messages currently do not have a Sender present,
it should be safe to do:

header   __HAS_SENDER exists:Sender

header   MULTI_FROM_ADDR  From =~ /\@.*,.*\@/
describe MULTI_FROM_ADDR  Multiple addresses in a From header field
score    MULTI_FROM_ADDR  1

meta     MULTI_FROM_BAD  MULTI_FROM_ADDR && !__HAS_SENDER
describe MULTI_FROM_BAD  Multiple addresses in From, but no Sender
score    MULTI_FROM_BAD  6


(btw, we should be adding some of the missing '__HAS_* exists:*'
rules for completeness anyway, they come handy with other official
or local metarules)

Comment 6 AXB 2012-05-17 16:32:36 UTC

(In reply to comment #5)
> I'm seeing these messages too, and some of them are sneaking in.
> 
> The RFC 5322 section 3.6.2 states:
> 
>   If the originator of the message can be indicated by a single mailbox
>   and the author and transmitter are identical, the "Sender:" field
>   SHOULD NOT be used.  Otherwise, both fields SHOULD appear.
> 
> As these spam messages currently do not have a Sender present,
> it should be safe to do:
> 
> header   __HAS_SENDER exists:Sender
> 
> header   MULTI_FROM_ADDR  From =~ /\@.*,.*\@/
> describe MULTI_FROM_ADDR  Multiple addresses in a From header field
> score    MULTI_FROM_ADDR  1
> 
> meta     MULTI_FROM_BAD  MULTI_FROM_ADDR && !__HAS_SENDER
> describe MULTI_FROM_BAD  Multiple addresses in From, but no Sender
> score    MULTI_FROM_BAD  6
> 
> 
> (btw, we should be adding some of the missing '__HAS_* exists:*'
> rules for completeness anyway, they come handy with other official
> or local metarules)

+1 to the motion for a 20_hasbase.cf

I have a collection of them and would volunteer to start adding

Comment 7 Mark Martinec 2012-05-17 16:53:00 UTC

> As these spam messages currently do not have a Sender present,
> it should be safe to do

I meant: good enough for this spam, and safe for valid mail even with
multiple authors.


> > (btw, we should be adding some of the missing '__HAS_* exists:*'
> > rules for completeness anyway, they come handy with other official
> > or local metarules)
> 
> +1 to the motion for a 20_hasbase.cf
> I have a collection of them and would volunteer to start adding

Good idea.

These would be needed for Bug 6780:

header __HAS_FROM   exists:From
header __HAS_TO     exists:To
header __HAS_CC     exists:CC

and this one in this PR:

header __HAS_SENDER exists:Sender

Several other are scattered all over the place, would be nice
to have them all in one place.

Comment 8 AXB 2012-05-17 16:59:11 UTC

(In reply to comment #7)
> > As these spam messages currently do not have a Sender present,
> > it should be safe to do
> 
> I meant: good enough for this spam, and safe for valid mail even with
> multiple authors.
> 
> 
> > > (btw, we should be adding some of the missing '__HAS_* exists:*'
> > > rules for completeness anyway, they come handy with other official
> > > or local metarules)
> > 
> > +1 to the motion for a 20_hasbase.cf
> > I have a collection of them and would volunteer to start adding
> 
> Good idea.
> 
> These would be needed for Bug 6780:
> 
> header __HAS_FROM   exists:From
> header __HAS_TO     exists:To
> header __HAS_CC     exists:CC
> 
> and this one in this PR:
> 
> header __HAS_SENDER exists:Sender
> 
> Several other are scattered all over the place, would be nice
> to have them all in one place.

Commited 10_hasbase.cf

Comment 9 Mark Martinec 2012-05-21 18:36:11 UTC

Observed 3800 messages which hit MULTI_FROM_BAD during the last four days.

Among these there were three legitimate mail messages with two addresses
in a From, and a missing Sender (a conference registration confirmation
or paper submissions). These were genuine false positives (of which one
was quarantined for exceeding a spam threshold, while the other two
were rescued by other rules).

Besides the above three, there were three additional false positives, where
my version of MULTI_FROM_ADDR misfired. These three were a result of a
B64-encoded display name in the iso-2022-jp character set, which happened
to contain bytes '@' and ',' in the b64-decoded string.

The string that was matched looked like (somewhat obfuscated):
  _$B:#1xxf_(B _$B@5,_(B <xxx@example.com>

It is most unfortunate that the :addr modifier only returns the first
of multiple addresses (in a To, From, Cc, ...), which means it can't
be used in counting the number of e-mail addresses in a From.

It also seems wrong to do the manual (in-the-rule) parsing *after*
the QP or B decoding, so apparently the :raw form must be used,
which means having to deal with folding, comments, display names,
and a group name.

Comment 10 D. Stussy 2012-05-21 19:34:37 UTC

RE: Comment #9 - I must disagree:

A message with multiple from entries, no sender header, and non-spammy content is not a "false positive" as it is an RFC violation and therefore not a valid message.

Although spam is generally about content, I cannot accept that a malformed message is a legitimate message.  Such malformations are precisely the target of the rule(set) that we are developing as a result of this bug report.

Now, as for the misfirings on a character-set-encoded string, that could be a problem.  Maybe we need a "decoded" function, which for non-encoded strings will be identical to "raw", but for strings starting with "=charset", it obviously decodes them and performs comparisons thereafter.

Comment 11 Mark Martinec 2012-06-06 14:53:38 UTC

> A message with multiple from entries, no sender header, and non-spammy
> content is not a "false positive" as it is an RFC violation and therefore
> not a valid message.

Seems like the RFC 5322 is inconsistent with itself:

section 3.6.2: 
  If the originator of the message can be indicated
  by a single mailbox and the author and transmitter are identical,
  the "Sender:" field SHOULD NOT be used.  Otherwise, both fields
  SHOULD appear.

section 3.6.: sender ... MUST occur with multi-address from - see 3.6.2

So it's either a SHOULD or a MUST.


> Now, as for the misfirings on a character-set-encoded string, that could be
> a problem.  Maybe we need a "decoded" function, which for non-encoded
> strings will be identical to "raw", but for strings starting with
> "=charset", it obviously decodes them and performs comparisons thereafter.

...or a change in the behaviour of an :addr modifier, so that it would
return all addresses, not just the first. Its current behaviour is
probably questionable even when applying to a To or Cc header field.

Comment 12 D. Stussy 2012-06-06 18:23:51 UTC

There is no conflict with the RFC.  The "Sender:" header is clearly OPTIONAL (but discouraged) when there is a single mailbox identified in the "From:" header.  It is clearly REQUIRED when there are multiple mailboxes in "From:".

Comment 13 RW 2012-06-06 19:31:48 UTC

RFC compliance is irrelevant.

Comment 14 D. Stussy 2012-06-06 19:46:55 UTC

I must beg to differ.  RFC-compliant messages generally are valid messages.  Those messages not compliant are more likely to be spam.  Although compliance itself is not a validation, empirical evidence suggests that the two traits (compliance and spaminess) occur together in an inverse relationship.

Within the strict meaning of the word "standard," a compliant message is usually acceptable while a non-compliant message isn't and should be rejected on its face.  Therefore, compliance does have relevance.

Comment 15 RW 2012-06-06 21:27:44 UTC

If you construct rules based on non-compliance some will be useful, some will be completely useless. The only important thing is whether they work. If you remove rules that can hit RFC compliant mail there wont be much left.

Comment 16 D. Stussy 2012-06-06 22:13:13 UTC

Absolutely wrong.  All rules based on rejecting non-compliant messages are useful.  If you can't do things in the agreed-upon manner, I have every right to tell you to go away.

What's the point of having a standard if its rules are not enforced?  The very etymology of the word implies that enforcement is a necessary component of its existence.  [That is a concept that those on the MimeDefang mailing list didn't seem capable of understanding when a related discussion to this topic took place there last month.]

Comment 17 John Hardin 2012-06-06 22:48:40 UTC

Folks, let's not get into a(nother) religious war (as fun as they are) about whether or not SA is an RFC compliance audit tool. RFC compliance checking is only useful to our needs insofar as it helps to accurately detect spam.

(In reply to comment #16)
> If you can't do things in the agreed-upon manner, I have every
> right to tell you to go away.

Remember Postel's Law. http://en.wikipedia.org/wiki/Jon_Postel#Postel.27s_Law

(FWIW, I by nature agree that RFC non-compliance should be scored punitively, but reality intrudes...)

Comment 18 Lemat 2012-06-06 23:11:21 UTC

Let's get back to the rules and give each postmaster a choice to use 

1) rules that catch spam
2) rules that catch non-RFC compliant emails

a postmaster may choose by using something equivalent to "loadplugin"

Comment 19 D. Stussy 2012-06-06 23:22:49 UTC

Don't forget that Jon Postel went away too (before spam was a significant problem).  Today's reality is liberal == spam.

Comment 20 RW 2012-06-06 23:41:08 UTC

(In reply to comment #19)
> Don't forget that Jon Postel went away too (before spam was a significant
> problem).  Today's reality is liberal == spam.

What's disputed is whether we should have punitive rules that are ineffective against spam so that's completely irrelevant.

Punitive rules only really punish the recipient. If you wish to enforce standards it has to be done at the smtp level with clear feedback to the sender.

Comment 21 Kevin A. McGrail 2013-06-21 16:27:21 UTC

This is a discussion about a rule which doesn't require a version milestone.