SA Bugzilla – Bug 6716
SPOOF_COM2OTH and SPOOF_COM2COM misfire on legitimate bounce
Last modified: 2019-06-24 11:59:51 UTC
A customer recently reported a mistagged bounce message that misfired on SPOOF_COM2OTH and SPOOF_COM2COM. The bounce message was in response to a message sent to noreply@youtube.com, and contained: <noreply@youtube.com>: host google.com.s9b2.psmtp.com[74.125.148.14] said: 550 5.1.1 <noreply@youtube.com>... User unknown (in reply to RCPT TO command) Any system relaying to a domain filtered by Postini, and attempting to contact an address that does not exist, may generate bounce messages with a similar remote hostname. Suggested fixes: (Note, the \w+ could be made more specific but I don't have a handy list of all possible *.psmtp.com cluster names.) uri SPOOF_COM2OTH m{^https?://(?:\w+\.)+?com\.(?!(?:[a-z]{2}\.)?(?:s3\.amazonaws|\w+\.psmtp)\.com)(?:\w+\.){2}}i uri SPOOF_COM2COM m{^https?://(?:\w+\.)+?com\.(?!(?:[a-z]{2}\.)?(?:s3\.amazonaws|\w+\.psmtp)\.com)(?:\w+\.)+?com\b}i
Based on logs of ~150k scans in recent months on systems handling mail for multiple small to medium businesses, SPOOF_COM2COM and SPOOF_COM2OTH most commonly are FPs in tandem on legitimate bounce notices (e.g. Google/Postini) and a few other messages mentioning matching hostnames and in one case a (dumb but real) reversed dotted domain identifier. Note that in neither of those types of mail is the match even of an actual URI, rather it is SA detecting what looks like a hostname and constructing a putative canonical URI from it. Many of these end up unnoticed because the 4.7 combined score is past our threshold and bounce messages are rarely anticipated mail. Far less frequently, by 2 orders of magnitude, SPOOF_COM2OTH (very rarely in tandem with SPOOF_COM2COM) hit on spam that only reached SA at all as a result of exemptions from frontline protections (DNSBL's, PTR existence mandate, slow banner, etc.) and these rules were not critical to identifying it as spam; typically those final scores exceed 15. Obviously my corpus is not entirely representative, mainly in that it consists of mail that gets past blocking that reliably rejects nearly all "bot" spam ahead of the SA scans. But it raises a few issues: 1. Should these have such high scores? Particularly since SPOOF_COM2COM matches are usually also SPOOF_COM2OTH matches, it seems that if SPOOF_COM2COM should exist at all, it may need a negative score (i.e. one prominent .com is already exempted and another clearly merits exemption) 2. SA should not be as ambitious as it is in converting bare hostname-like strings into URIs. To paraphrase Freud: sometimes a dotted domain string is *just* a dotted domain string. 3. Are these tests useful against modern spam in an environment without an outer layer of defenses catching most of the botspam? It would be helpful if someone with a large & recent corpus that isn't pre-cleaned could examine it in regards to these rules to see if there's any value at all in repairing them or if they aren't just as obsolete or redundant against the full firehose as I've found them to be against my less phishy streams.
(Been a while since I looked at this; obviously Not A Problem for many or it would have gotten more attention...) (In reply to Bill Cole from comment #1) > 2. SA should not be as ambitious as it is in converting bare hostname-like > strings into URIs. To paraphrase Freud: sometimes a dotted domain string is > *just* a dotted domain string. Tell that to the people writing code for desktop MUAs. SA goes to a fair bit of trouble to emulate their behaviour, so as to treat as a link the same things that desktop MUAs do. I've modified the rule as in my original report, and my own FP problem went away; after pondering a bit further that change probably isn't "enough" and the concept behind the rule probably needs a larger rethink. > 3. Are these tests useful against modern spam in an environment without an > outer layer of defenses catching most of the botspam? It would be helpful if > someone with a large & recent corpus that isn't pre-cleaned could examine it > in regards to these rules to see if there's any value at all in repairing > them or if they aren't just as obsolete or redundant against the full > firehose as I've found them to be against my less phishy streams. I'm tempted to just say "drop it"; wearing my ViaNet Spam Filter Admin hat and chewing through the past week's logs with a local stats script I get: Rule Hits % Useful Avg. time ... SPOOF_COM2COM 172 0.02 8 2.23 SPOOF_COM2OTH 156 0.01 7 1.65 ... 1143476 messages total So, they're hitting on ~0.01% of messages passed to the full SA ruleset, and were "useful" for ~5% of *that*. ("Useful" means taking away this hit without altering anything else would drop the score below the threshold - we're using the default threshold of 5. All other messages either scored high enough that these hits could be taken away and the message would still be tagged, or they didn't score high enough to get tagged in the first place.) I note, however, that we block connections with Spamhaus (50-90% of overall volume, depending on where and how you measure), and we run a "lean" SA instance with <30 total rules, mostly DNSBLs, to skim off ~50-80% of the spam that gets past the Spamhaus reject. While this isn't a hand-confirmed static corpus, we don't get many reports of FPs (either specific messages we can examine and downscore a rule or remove a DNSBL entry, or general complaints about them), and the only ones we've had for a while have been due to slightly overaggressive entries on our local DNSBL. Checking my personal server... I see no hits at all on either of these rules since Feb 16 (as far back as my logs go). During that time SA processed 6618 messages (mostly to my own account).
Recently, we have had false positives due to these two rule scores being increased. Our FP were sending reports of false mx records, formed as "domain.com.s5a1.psmtp.com" which automatically triggered both these, giving 4.8 (along with 1.3 for URI_HEX). Originally, I see back in September at least, probably much later as well, it doesn't appear to have these two rules scored as high, thus no previous issues. So for now we've had to override the scores for the time being.
(In reply to Bill Cole from comment #1) > Based on logs of ~150k scans in recent months on systems handling mail for > multiple small to medium businesses, SPOOF_COM2COM and SPOOF_COM2OTH most > commonly are FPs in tandem on legitimate bounce notices (e.g. > Google/Postini) (In reply to Kyle M from comment #3) > Recently, we have had false positives due to these two rule scores being > increased. Our FP were sending reports of false mx records, formed as > "domain.com.s5a1.psmtp.com" So perhaps exclusionary patterns should be added for psmtp.com and google.com? Can some examples of the FP "URI" patterns be posted to the bug?
(In reply to John Hardin from comment #4) > So perhaps exclusionary patterns should be added for psmtp.com and > google.com? > > Can some examples of the FP "URI" patterns be posted to the bug? See my original report for a Postini example (psmtp.com). Looking at the original rule definition it looks to me as if someone came across a bounce from an Amazon AWS rejection, where the Amazon MX host had a similar name format (eg example.com.s3.amazonaws.com).
Dropped scores to informational, perhaps someday rescorer will actually work.. Sending 50_scores.cf Transmitting file data .done Committing transaction... Committed revision 1861995.