|
SA Bugzilla – Full Text Bug Listing |
Summary: | multiple emails in From | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Lemat <lemat> |
Component: | Rules | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | NEW --- | ||
Severity: | normal | CC: | jhardin, kmcgrail, lemat, rwmaillists, software+spamassassin |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | Linux | ||
Whiteboard: |
Description
Lemat
2012-03-31 13:21:59 UTC
How is multiple "from" mailboxes considered broken? The standards and RFCs permit such a construct. Granted, 99% of all electronic mail has only one mailbox on the header, that alone does not make multiple entries "broken." RFC 5322, Section 3.6.2: from = "From:" mailbox-list CRLF mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list So, what's "broken" about it? In contrast, the "Sender:" header takes a single mailbox. You're right I was wrong, please read / replace "broken spamware" as / with "spamware is taking advantage of RFC and ...". Maybe MULTIPLE_FROM can be combined with a list of User-Agents that allow to set up multiple emails in From? Or with a list of User-Agents that are known to not support such feature? Frankly speaking I haven't seen multiple emails in From: up until recent spamrun. (In reply to comment #2) > You're right I was wrong, please read / replace "broken spamware" as / with > "spamware is taking advantage of RFC and ...". > > Maybe MULTIPLE_FROM can be combined with a list of User-Agents that allow to > set up multiple emails in From? Or with a list of User-Agents that are known to > not support such feature? > > Frankly speaking I haven't seen multiple emails in From: up until recent > spamrun. I think this is a question less for RFCs and more for "is it indicative of Spam". So looking at my SPAM that by-passed my filters and using a FAR more simplistic check: I have nothing except email addresses with "email@example.com" <email@example.com> as the grep ^From: | grep -E "@.*@" Check my hand sorted Ham corpus, I have zero hits. Checking my automatic filtered Spam corpus, I also have zero hits. In short, I have zero examples of any r/w cases with multiple emails in the From header. Not saying it doesn't exist but don't really see that this is going to have a high S/O unless you have a corpus where you are seeing this a lot. regards, KAM To clarify: My "objection" or point was to address that the construct was called broken when it is not. I have no problem assigning a score to it indicating spaminess when it occurs (if indeed is it not witnessed on legitimate mail but is only for spam). I'm seeing these messages too, and some of them are sneaking in. The RFC 5322 section 3.6.2 states: If the originator of the message can be indicated by a single mailbox and the author and transmitter are identical, the "Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD appear. As these spam messages currently do not have a Sender present, it should be safe to do: header __HAS_SENDER exists:Sender header MULTI_FROM_ADDR From =~ /\@.*,.*\@/ describe MULTI_FROM_ADDR Multiple addresses in a From header field score MULTI_FROM_ADDR 1 meta MULTI_FROM_BAD MULTI_FROM_ADDR && !__HAS_SENDER describe MULTI_FROM_BAD Multiple addresses in From, but no Sender score MULTI_FROM_BAD 6 (btw, we should be adding some of the missing '__HAS_* exists:*' rules for completeness anyway, they come handy with other official or local metarules) (In reply to comment #5) > I'm seeing these messages too, and some of them are sneaking in. > > The RFC 5322 section 3.6.2 states: > > If the originator of the message can be indicated by a single mailbox > and the author and transmitter are identical, the "Sender:" field > SHOULD NOT be used. Otherwise, both fields SHOULD appear. > > As these spam messages currently do not have a Sender present, > it should be safe to do: > > header __HAS_SENDER exists:Sender > > header MULTI_FROM_ADDR From =~ /\@.*,.*\@/ > describe MULTI_FROM_ADDR Multiple addresses in a From header field > score MULTI_FROM_ADDR 1 > > meta MULTI_FROM_BAD MULTI_FROM_ADDR && !__HAS_SENDER > describe MULTI_FROM_BAD Multiple addresses in From, but no Sender > score MULTI_FROM_BAD 6 > > > (btw, we should be adding some of the missing '__HAS_* exists:*' > rules for completeness anyway, they come handy with other official > or local metarules) +1 to the motion for a 20_hasbase.cf I have a collection of them and would volunteer to start adding > As these spam messages currently do not have a Sender present, > it should be safe to do I meant: good enough for this spam, and safe for valid mail even with multiple authors. > > (btw, we should be adding some of the missing '__HAS_* exists:*' > > rules for completeness anyway, they come handy with other official > > or local metarules) > > +1 to the motion for a 20_hasbase.cf > I have a collection of them and would volunteer to start adding Good idea. These would be needed for Bug 6780: header __HAS_FROM exists:From header __HAS_TO exists:To header __HAS_CC exists:CC and this one in this PR: header __HAS_SENDER exists:Sender Several other are scattered all over the place, would be nice to have them all in one place. (In reply to comment #7) > > As these spam messages currently do not have a Sender present, > > it should be safe to do > > I meant: good enough for this spam, and safe for valid mail even with > multiple authors. > > > > > (btw, we should be adding some of the missing '__HAS_* exists:*' > > > rules for completeness anyway, they come handy with other official > > > or local metarules) > > > > +1 to the motion for a 20_hasbase.cf > > I have a collection of them and would volunteer to start adding > > Good idea. > > These would be needed for Bug 6780: > > header __HAS_FROM exists:From > header __HAS_TO exists:To > header __HAS_CC exists:CC > > and this one in this PR: > > header __HAS_SENDER exists:Sender > > Several other are scattered all over the place, would be nice > to have them all in one place. Commited 10_hasbase.cf Observed 3800 messages which hit MULTI_FROM_BAD during the last four days. Among these there were three legitimate mail messages with two addresses in a From, and a missing Sender (a conference registration confirmation or paper submissions). These were genuine false positives (of which one was quarantined for exceeding a spam threshold, while the other two were rescued by other rules). Besides the above three, there were three additional false positives, where my version of MULTI_FROM_ADDR misfired. These three were a result of a B64-encoded display name in the iso-2022-jp character set, which happened to contain bytes '@' and ',' in the b64-decoded string. The string that was matched looked like (somewhat obfuscated): _$B:#1xxf_(B _$B@5,_(B <xxx@example.com> It is most unfortunate that the :addr modifier only returns the first of multiple addresses (in a To, From, Cc, ...), which means it can't be used in counting the number of e-mail addresses in a From. It also seems wrong to do the manual (in-the-rule) parsing *after* the QP or B decoding, so apparently the :raw form must be used, which means having to deal with folding, comments, display names, and a group name. RE: Comment #9 - I must disagree: A message with multiple from entries, no sender header, and non-spammy content is not a "false positive" as it is an RFC violation and therefore not a valid message. Although spam is generally about content, I cannot accept that a malformed message is a legitimate message. Such malformations are precisely the target of the rule(set) that we are developing as a result of this bug report. Now, as for the misfirings on a character-set-encoded string, that could be a problem. Maybe we need a "decoded" function, which for non-encoded strings will be identical to "raw", but for strings starting with "=charset", it obviously decodes them and performs comparisons thereafter. > A message with multiple from entries, no sender header, and non-spammy > content is not a "false positive" as it is an RFC violation and therefore > not a valid message. Seems like the RFC 5322 is inconsistent with itself: section 3.6.2: If the originator of the message can be indicated by a single mailbox and the author and transmitter are identical, the "Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD appear. section 3.6.: sender ... MUST occur with multi-address from - see 3.6.2 So it's either a SHOULD or a MUST. > Now, as for the misfirings on a character-set-encoded string, that could be > a problem. Maybe we need a "decoded" function, which for non-encoded > strings will be identical to "raw", but for strings starting with > "=charset", it obviously decodes them and performs comparisons thereafter. ...or a change in the behaviour of an :addr modifier, so that it would return all addresses, not just the first. Its current behaviour is probably questionable even when applying to a To or Cc header field. There is no conflict with the RFC. The "Sender:" header is clearly OPTIONAL (but discouraged) when there is a single mailbox identified in the "From:" header. It is clearly REQUIRED when there are multiple mailboxes in "From:". RFC compliance is irrelevant. I must beg to differ. RFC-compliant messages generally are valid messages. Those messages not compliant are more likely to be spam. Although compliance itself is not a validation, empirical evidence suggests that the two traits (compliance and spaminess) occur together in an inverse relationship. Within the strict meaning of the word "standard," a compliant message is usually acceptable while a non-compliant message isn't and should be rejected on its face. Therefore, compliance does have relevance. If you construct rules based on non-compliance some will be useful, some will be completely useless. The only important thing is whether they work. If you remove rules that can hit RFC compliant mail there wont be much left. Absolutely wrong. All rules based on rejecting non-compliant messages are useful. If you can't do things in the agreed-upon manner, I have every right to tell you to go away. What's the point of having a standard if its rules are not enforced? The very etymology of the word implies that enforcement is a necessary component of its existence. [That is a concept that those on the MimeDefang mailing list didn't seem capable of understanding when a related discussion to this topic took place there last month.] Folks, let's not get into a(nother) religious war (as fun as they are) about whether or not SA is an RFC compliance audit tool. RFC compliance checking is only useful to our needs insofar as it helps to accurately detect spam. (In reply to comment #16) > If you can't do things in the agreed-upon manner, I have every > right to tell you to go away. Remember Postel's Law. http://en.wikipedia.org/wiki/Jon_Postel#Postel.27s_Law (FWIW, I by nature agree that RFC non-compliance should be scored punitively, but reality intrudes...) Let's get back to the rules and give each postmaster a choice to use 1) rules that catch spam 2) rules that catch non-RFC compliant emails a postmaster may choose by using something equivalent to "loadplugin" Don't forget that Jon Postel went away too (before spam was a significant problem). Today's reality is liberal == spam. (In reply to comment #19) > Don't forget that Jon Postel went away too (before spam was a significant > problem). Today's reality is liberal == spam. What's disputed is whether we should have punitive rules that are ineffective against spam so that's completely irrelevant. Punitive rules only really punish the recipient. If you wish to enforce standards it has to be done at the smtp level with clear feedback to the sender. This is a discussion about a rule which doesn't require a version milestone. |