SA Bugzilla – Bug 6353
[review] rule FH_FROMEML_NOTLD is wrong
Last modified: 2011-05-28 18:55:21 UTC
Rule "FH_FROMEML_NOTLD" is wrong in "72_active.cf". Current is: header FH_FROMEML_NOTLD From:addr !~ /\./ This rule checks the "." in the whole address, not in the part after the "@". This rule can be bypassed easily. We got lot of spam from ab.cd@ru I suggest this version: header FH_FROMEML_NOTLD From:addr !~ /(.*)@(.*)\.(.*)/
Wouldn't this be more appropriate/accurate: header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/
>Wouldn't this be more appropriate/accurate: >header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/ Thanks, it's a good idea! But it has to be case-insentive: header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/i
I mean case-insensitive.
(In reply to comment #2) > >Wouldn't this be more appropriate/accurate: > >header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/ > > Thanks, it's a good idea! > But it has to be case-insentive: > > header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/i this will miss From: john.the.ripper header FH_FROMEML_NOTLD From:addr !~ /\@.*+\./ or something like /\@[-\.0-9a-z]+\./ it would be nice if SA provided someehting like From:domain, which would be empty if there is no '@'. we could then simply use
(In reply to comment #4) > (In reply to comment #2) > > >Wouldn't this be more appropriate/accurate: > > >header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/ > > > > Thanks, it's a good idea! > > But it has to be case-insentive: > > > > header FH_FROMEML_NOTLD From:addr !~ /\.[a-z]+$/i > > this will miss > From: john.the.ripper > > > header FH_FROMEML_NOTLD From:addr !~ /\@.*+\./ > > or something like /\@[-\.0-9a-z]+\./ > > it would be nice if SA provided someehting like From:domain, which would be > empty if there is no '@'. we could then simply use (argh. TAB doesn't work on the web:) ... we could simply use From:domain !~ /\./
I tried Mouss's version: header FH_FROMEML_NOTLD From:addr !~ /\@.*+\./ It seems to be buggy, see result below! testlinux:/# spamassassin --lint Feb 28 23:13:07.491 [3009] warn: config: invalid regexp for rule FH_FROMEML_NOTLD: /\@.*+\./: Nested quantifiers in regex; marked by <-- HERE in m/\@.*+ <-- HERE \./ Feb 28 23:13:07.728 [3009] warn: config: warning: description exists for non-existent rule FH_FROMEML_NOTLD Feb 28 23:13:08.445 [3009] warn: lint: 2 issues detected, please rerun with debug enabled for more information
How about this: header FH_FROMEML_NOTLD From:addr !~ /\@.+\.[a-z]+$/
(In reply to comment #5) > > ... we could simply use > From:domain !~ /\./ Yeah let's bloat the code for a very marginal case, which can easily be handled with re itself. :-) I've committed to test rules to my sandbox. So let's just wait for results and not beat the horse anymore. header T_BUG6353_1 From:addr !~ /\@.+\.[a-z]+$/i [if-unset: foo@bar.com] header T_BUG6353_2 From:addr !~ /\.[a-z]+$/i [if-unset: foo@bar.com]
I tested Per Jessen's code: header FH_FROMEML_NOTLD From:addr !~ /\@.+\.[a-z]+$/i It's working perfectly! From address should have a "@" followed by a part before the dot, and a part after the dot. This code checks it all. It also handles UPPERcase and lowercase chars, because test is case-insensitive.
> header FH_FROMEML_NOTLD From:addr !~ /\@.+\.[a-z]+$/i This regexp is too strict, it does not allow for IDN ccTLD domains like test.xn--p1ai http://en.wikipedia.org/wiki/Internationalized_domain_name http://icann.org/en/topics/idn/fast-track/string-evaluation-completion-en.htm
Good point, let's add a hyphen: header FH_FROMEML_NOTLD From:addr !~ /\@.+\.[a-z-]+$/i I guess one issue might be exactly how critical the RE should be about what makes a TLD or not. There are obviously many strings that fit [a-z-]+ without them being TLDs.
Ok I really doubt the point here is to test a _valid_ TLD. It would require much more code, which someone is free to create. I've added this to the mix: header T_BUG6353_3 From:addr !~ /\@.+?\.[a-z]/i [if-unset: foo@bar.com]
> Ok I really doubt the point here is to test a _valid_ TLD. It would require > much more code, which someone is free to create. Why invite the trouble and ask for another round of a bug report on "the FH_FROMEML_NOTLD is FP-ing on my Russian domain", when it can be done right the first time around. > I've added this to the mix: > header T_BUG6353_3 From:addr !~ /\@.+?\.[a-z]/i [if-unset: foo@bar.com] What's the point of if-unset? - the rule description says "E-mail address doesn't have TLD", and a missing From header field fits the description, it doesn't have a TLD. The proposed rule is also missing anchoring, matching addresses like test@a.b.0. The .+ should be excluding further @ characters, otherwise a <"test@foo"@.com> would not get a hit. So I think either a: From:addr !~ /\@[^@]+\.(?:[a-z]{2,}|xn--[a-z0-9]+(?:-[a-z0-9]*)?)$/i or a simple: From:addr !~ /\@[^@]+\.(?:[a-z0-9-]+)$/i would do, I suppose.
(In reply to comment #13) > > Ok I really doubt the point here is to test a _valid_ TLD. It would require > > much more code, which someone is free to create. > > Why invite the trouble and ask for another round of a bug report > on "the FH_FROMEML_NOTLD is FP-ing on my Russian domain", when > it can be done right the first time around. If you are referring to IDN, my third rule handles it just fine. What I'm meant with valid is checking the TLD exists. That's a completely another matter. > > I've added this to the mix: > > header T_BUG6353_3 From:addr !~ /\@.+?\.[a-z]/i [if-unset: foo@bar.com] > > What's the point of if-unset? - the rule description says "E-mail address > doesn't have TLD", and a missing From header field fits the description, > it doesn't have a TLD. I don't agree. Missing header is again a completely different thing than a "missing TLD". It makes much more sense to have a separate MISSING_FROM rule then, if one doesn't exists. > The proposed rule is also missing anchoring, matching addresses > like test@a.b.0. > > The .+ should be excluding further @ characters, otherwise > a <"test@foo"@.com> would not get a hit. > > So I think either a: > From:addr !~ /\@[^@]+\.(?:[a-z]{2,}|xn--[a-z0-9]+(?:-[a-z0-9]*)?)$/i > > or a simple: > From:addr !~ /\@[^@]+\.(?:[a-z0-9-]+)$/i > > would do, I suppose. I'm not saying you are wrong, but we are getting into unnecessary complex territories. This is turning into a VALID_FROM_ADDRESS. As I somehow dragged myself into this, I've added yet another rules to the mix. header T_BUG6353_4 From:addr !~ /\@[^@]+\.(?:[a-z]{2,}|xn--[a-z0-9]+(?:-[a-z0-9]*)?)$/i header T_BUG6353_5 From:addr !~ /\@[^@]+\.(?:[a-z]{2,}|xn--[a-z0-9]+(?:-[a-z0-9]*)?)$/i [if-unset: foo@bar.com] header T_BUG6353_6 From:addr !~ /\@[^@]+\.(?:[a-z0-9-]+)$/i header T_BUG6353_7 From:addr !~ /\@[^@]+\.(?:[a-z0-9-]+)$/i [if-unset: foo@bar.com] header T_MISSING_FROM From =~ /^UNSET$/ [if-unset: UNSET]
And never mind my rants, I'm really tired.. Marks rules are of course better, and probably T_BUG6353_7 is the most elegant to go with.
> And never mind my rants, I'm really tired.. Marks rules are of course better, > and probably T_BUG6353_7 is the most elegant to go with. Sorry the (?: ... ) is redundant, I was just stripping down the larger rule, but failed to go all the way. It should be: /\@[^@]+\.[a-z0-9-]+$/i
I'm jumping in after just skimming this bug's comments with too much in my inbox to read thoroughly. Please excuse me if this comment is actually irrelevant or redundant... If it makes sense to really test for a valid TLD in this rule, we do have some code that is supposed to check for a valid TLD that can be re-used for this by making it an eval rule. If we are going to do the right thing about IDN ccTLD domains that's where we have to do it correctly anyway. Is it worth making this an eval rule to make it strict about real TLDs?
> we do have some code that is supposed to check for a valid TLD that > can be re-used for this by making it an eval rule. If we are going to do the > right thing about IDN ccTLD domains that's where we have to do it correctly > anyway. > Is it worth making this an eval rule to make it strict about real TLDs? Can't hurt to explore this possibility. Depending on a size of a change required, we could then decide which way to go for 3.3.1 .
(In reply to comment #6) > I tried Mouss's version: > > header FH_FROMEML_NOTLD From:addr !~ /\@.*+\./ > > It seems to be buggy, see result below! > > testlinux:/# spamassassin --lint > Feb 28 23:13:07.491 [3009] warn: config: invalid regexp for rule > FH_FROMEML_NOTLD: /\@.*+\./: Nested quantifiers in regex; marked by <-- HERE in > m/\@.*+ <-- HERE \./ > Feb 28 23:13:07.728 [3009] warn: config: warning: description exists for > non-existent rule FH_FROMEML_NOTLD > Feb 28 23:13:08.445 [3009] warn: lint: 2 issues detected, please rerun with > debug enabled for more information sure it's bogus: it has a ".*+". should be From:addr !~ /\@.+\./ without the '*'.
(In reply to comment #8) > (In reply to comment #5) > > > > ... we could simply use > > From:domain !~ /\./ > > Yeah let's bloat the code for a very marginal case, which can easily be handled > with re itself. :-) > I'd agree not to blow code if you agree not to add bogus rules;-p come on... there is no more bloat in providing From:domain than providing From:addr or From:name.
http://ruleqa.spamassassin.org/?rule=%2F%28BUG6353%7CMISSING_FROM%29 Pretty much fifty-fifty on what to choose. Actually the simplest rule has slightly better ratio.
Seems MISSING_FROM rule looks good, will be committing it. 0.2571 0.0015 0.994 0.65 T_MISSING_FROM Here are all the NOTLD variants which do not overlap with MISSING_FROM. I think Mark's no. 5 is the most "robust" one rule wise and ratios are only marginally different. 0.3620 0.0154 0.959 0.67 T_BUG6353_5 0.3616 0.0154 0.959 0.66 T_BUG6353_1 0.3119 0.0154 0.953 0.65 T_BUG6353_7 0.3616 0.0144 0.962 0.67 T_BUG6353_2 0.3059 0.0124 0.961 0.65 T_BUG6353_3 Giving this a review since there are lots to choose. Feel free to vote on MISSING_FROM score if you feel like it or have different opinion on the NOTLD rule.
> Giving this a review since there are lots to choose. Feel free to vote on > MISSING_FROM score if you feel like it or have different opinion on the NOTLD > rule. If you mean this one: header T_MISSING_FROM From =~ /^UNSET$/ [if-unset: UNSET] we already have rules: header __HAS_RCVD exists:Received header __HAS_MESSAGE_ID exists:Message-Id header __HAS_DATE exists:Date header __HAS_SUBJECT exists:Subject (which really test for existence of a header field, not just comparing it to some magic reserved string) so why not add a: header __HAS_FROM exists:From and than can have a: meta T_MISSING_FROM !__HAS_FROM
(In reply to comment #23) > > Giving this a review since there are lots to choose. Feel free to vote on > > MISSING_FROM score if you feel like it or have different opinion on the NOTLD > > rule. > > If you mean this one: > header T_MISSING_FROM From =~ /^UNSET$/ [if-unset: UNSET] > > we already have rules: > header __HAS_RCVD exists:Received > header __HAS_MESSAGE_ID exists:Message-Id > header __HAS_DATE exists:Date > header __HAS_SUBJECT exists:Subject > (which really test for existence of a header field, not just > comparing it to some magic reserved string) > > so why not add a: > header __HAS_FROM exists:From > > and than can have a: > meta T_MISSING_FROM !__HAS_FROM +1 for the meta
(In reply to comment #23) > > Giving this a review since there are lots to choose. Feel free to vote on > > MISSING_FROM score if you feel like it or have different opinion on the NOTLD > > rule. > > If you mean this one: > header T_MISSING_FROM From =~ /^UNSET$/ [if-unset: UNSET] > > we already have rules: > header __HAS_RCVD exists:Received > header __HAS_MESSAGE_ID exists:Message-Id > header __HAS_DATE exists:Date > header __HAS_SUBJECT exists:Subject > (which really test for existence of a header field, not just > comparing it to some magic reserved string) > > so why not add a: > header __HAS_FROM exists:From Exactly what I was planning to do, already added it my cf to verify if it has any difference on the unset one :)
(In reply to comment #24) > (In reply to comment #23) > > > > so why not add a: > > header __HAS_FROM exists:From > > +1 for the meta Agreed. +1
(In reply to comment #26) > (In reply to comment #24) > > (In reply to comment #23) > > > > > > so why not add a: > > > header __HAS_FROM exists:From > > > > +1 for the meta > > Agreed. +1 +1 as well so I believe you can commit
Henrik, Do we have a need for a patch on 3.3/trunk for this or is this something from your sandbox we are expecting to auto-promote? Just trying to get a clear idea so we can build the next tar for 3.3.2rc.
(In reply to comment #28) > Henrik, > > Do we have a need for a patch on 3.3/trunk for this or is this something from > your sandbox we are expecting to auto-promote? Just trying to get a clear idea > so we can build the next tar for 3.3.2rc. Yeah I see the exists:From rule works fine now, I'll commit it to main rules today.. Also I'll update the NOTLD variant to Mark's no. 5 proposal, since noone commented on that.
Closing this now, but please comment if MISSING_FROM should have a score set?? Will it be autogenerated? In any case even default score of 1 seems fine looking at the ruleqa. trunk: Sending rules/20_head_tests.cf Sending rulesrc/sandbox/emailed/00_FVGT_File001.cf Transmitting file data .. Committed revision 1128753. 3.3: Sending rules/20_head_tests.cf Sending rulesrc/sandbox/emailed/00_FVGT_File001.cf Transmitting file data .. Committed revision 1128754.