SA Bugzilla – Bug 3417
New Rule: Checking sender IP against MX records From: user@domain
Last modified: 2008-11-04 23:17:43 UTC
According ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt it is recomended to Register all sender MTAs within DNS as MX records with special priority (65535) After this MTA should check sender IP address against the obtained via DNS list of valid mail servers. It possible to create the new rule for Spamassassin that perform this checking?
Subject: Re: New: New Rule: Checking sender IP against MX records From: user@domain -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >According >ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt >it is recomended to Register all sender MTAs within DNS as MX records >with special priority (65535) > >After this MTA should check sender IP address against the obtained via >DNS list of valid mail servers. > >It possible to create the new rule for Spamassassin that perform this >checking? In terms of a core rule, probably not. This is very similar to what SPF is intended to solve, we already have that support in 3.0.0, and SPF has all the mindshare. In terms of a third-party plugin, I can't see why not if someone else implements that ;) - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFArilcQTcbUG5Y7woRAt5rAKDagbAlAfgu6RQYzNM6rG+lLUvFagCfUgWK sI2Ws7IPXq25g7U2B8hN33A= =IVhY -----END PGP SIGNATURE-----
As Justin says, there's no mindshare behind this idea, so closing as WONTFIX.
>In terms of a third-party plugin, I can't see why not if someone else implements that I have looked at the code and found, that all I need exist in two subs - check_for_from_dns and check_rbl_backend in EvalTests.pm I combine this two subs and create new one, that implemented my rule. I think this rule will be very effective and recomend you to include it in stable (2.6x) version of SpamAssasin. Sorry for bad english. Here the diff: EvalTests.pm: < sub check_for_from_mx { < my ($self) = @_; < < my $from = $self->get ('Reply-To:addr'); < if (!defined $from || $from !~ /\@\S+/) { < $from = $self->get ('From:addr'); < } < return 0 unless ($from =~ /\@(\S+)/); < $from = $1; < < # First check that DNS is available, if not do not perform this check < return 0 unless $self->is_dns_available(); < $self->load_resolver(); < < if ($from eq 'compiling.spamassassin.taint.org') { < # only used when compiling < return 0; < } < < if ($self->{conf}->{check_mx_attempts} < 1) { < return 0; < } < < < local ($_); < < # First check that DNS is available, if not do not perform this check < return 0 if $self->{conf}->{skip_rbl_checks}; < return 0 unless $self->is_dns_available(); < $self->load_resolver(); < < # How many IPs max you check in the received lines < my $checklast=$self->{conf}->{num_check_received}; < < my @fullips = map { $_->{ip} } @{$self->{relays_untrusted}}; < < # Make sure a header significantly improves results before adding here < # X-Sender-Ip: could be worth using (very low occurance for me) < # X-Sender: has a very low bang-for-buck for me < my @originating; < for my $header ('X-Originating-IP', 'X-Apparently-From') { < my $str = $self->get($header); < next unless defined $str; < push (@originating, ($str =~ m/($IP_ADDRESS)/g)); < } < < return 0 unless (scalar @fullips + scalar @originating > 0); < < # Let's go ahead and trim away all Reserved ips (KLC) < # also uniq the list and strip dups. (jm) < my @ips = (); < my %seen = (); < foreach my $ip (@fullips) { < next if (exists ($seen{$ip})); $seen{$ip} = 1; < if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@ips, $ip); } < } < < dbg("Got the following IPs: ".join(", ", @ips), "rbl", -3); < < if (scalar @ips + scalar @originating > 0) { < # If name is foo-notfirsthop, check all addresses except for < # the originating one. Suitable for use with dialup lists, like the PDL. < # note that if there's only 1 IP in the untrusted set, do NOT pop the < # list, since it'd remove that one, and a legit user is supposed to < # use their SMTP server (ie. have at least 1 more hop)! < if ($set =~ /-notfirsthop$/) { < if (scalar @ips > 1) { pop @ips; } < } < # If name is foo-firsttrusted, check only the Received header just < # after it enters our trusted networks; that's the only one we can < # trust the IP address from (since our relay added that header). < # And if name is foo-untrusted, check any untrusted IP address. < elsif ($set =~ /-(first|un)trusted$/) { < push(@ips, @originating); < if ($1 eq "first") { < @ips = ( $ips[0] ); < } < else { < shift @ips; < } < } < else { < # create a new list to avoid undef errors < my @newips = (); < my $i; for ($i = 0; $i < $checklast; $i++) { < my $ip = pop @ips; last unless defined($ip); < push (@newips, $ip); < } < # add originating IPs as untrusted IPs < for my $ip (@originating) { < next if (exists ($seen{$ip})); $seen{$ip} = 1; < if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@newips, $ip); } < } < @ips = @newips; < } < } < dbg("But only inspecting the following IPs: ".join(", ", @ips), "rbl", -3); < < < # Try check_mx_attempts times to protect against temporary outages. < # sleep between checks to give the DNS a chance to recover. < my @mxips = (); < for my $i (1..$self->{conf}->{check_mx_attempts}) { < my @mx = Net::DNS::mx($self->{res}, $from); < dbg ("DNS MX records found: " . scalar(@mx)); < < dbg("DNS MX records: ".join(", ", @mx), "mx", -3); < < # return 0 if (scalar @mx > 0); < foreach my $mx (@mx) { < my $query = $self->{res}->search($mx); < if ($query) { < my $count = 0; < foreach my $rr ($query->answer) { < if ($rr->type eq "A") { < $count++; < push (@mxips, $rr->rdatastr); < } < } < dbg ("DNS A records found: $count"); < } < } < if ($i < $self->{conf}->{check_mx_attempts}) {sleep $self->{conf}->{check_mx_delay}; }; < } < foreach my $ip (@ips) { < my $flag = 1; < foreach my $mxip (@mxips) { < $flag = 0 if ($ip eq $mxip); < } < return 1 if ($flag eq 1); < } < < return 0; < } 20_head_tests.cf: header NO_MX_FOR_FROM eval:check_for_from_mx() describe NO_MX_FOR_FROM Sender IP should be in the list of mx records of a Domain in From header tflags NO_MX_FOR_FROM net 50_scores.cf: score NO_MX_FOR_FROM 0 1.105 0 1.650 Where can I read something about "SPF"?
new version of sub: sub check_for_from_mx { my ($self) = @_; my $from = $self->get ('Reply-To:addr'); if (!defined $from || $from !~ /\@\S+/) { $from = $self->get ('From:addr'); } return 0 unless ($from =~ /\@(\S+)/); $from = $1; # First check that DNS is available, if not do not perform this check return 0 unless $self->is_dns_available(); $self->load_resolver(); if ($from eq 'compiling.spamassassin.taint.org') { # only used when compiling return 0; } if ($self->{conf}->{check_mx_attempts} < 1) { return 0; } local ($_); # First check that DNS is available, if not do not perform this check return 0 if $self->{conf}->{skip_rbl_checks}; return 0 unless $self->is_dns_available(); $self->load_resolver(); # How many IPs max you check in the received lines my $checklast=$self->{conf}->{num_check_received}; my @fullips = map { $_->{ip} } @{$self->{relays_untrusted}}; # Make sure a header significantly improves results before adding here # X-Sender-Ip: could be worth using (very low occurance for me) # X-Sender: has a very low bang-for-buck for me my @originating; for my $header ('X-Originating-IP', 'X-Apparently-From') { my $str = $self->get($header); next unless defined $str; push (@originating, ($str =~ m/($IP_ADDRESS)/g)); } return 0 unless (scalar @fullips + scalar @originating > 0); # Let's go ahead and trim away all Reserved ips (KLC) # also uniq the list and strip dups. (jm) my @ips = (); my %seen = (); foreach my $ip (@fullips) { next if (exists ($seen{$ip})); $seen{$ip} = 1; if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@ips, $ip); } } dbg("From_MX: Got the following IPs: ".join(", ", @ips)); if (scalar @ips + scalar @originating > 0) { # create a new list to avoid undef errors my @newips = (); my $i; for ($i = 0; $i < $checklast; $i++) { my $ip = pop @ips; last unless defined($ip); push (@newips, $ip); } # add originating IPs as untrusted IPs for my $ip (@originating) { next if (exists ($seen{$ip})); $seen{$ip} = 1; if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@newips, $ip); } } @ips = @newips; } dbg("From_MX: But only inspecting the following IPs: " . join(", ", @ips)); # Try check_mx_attempts times to protect against temporary outages. # sleep between checks to give the DNS a chance to recover. my @mxips = (); for my $i (1..$self->{conf}->{check_mx_attempts}) { my @mx = Net::DNS::mx($self->{res}, $from); dbg ("From_MX: DNS MX records found: " . scalar(@mx)); foreach my $mx (@mx) { my $query = $self->{res}->search($mx->exchange); if ($query) { my $count = 0; foreach my $rr ($query->answer) { if ($rr->type eq "A") { $count++; push (@mxips, $rr->rdatastr); } } dbg ("From_MX: DNS A records found: $count"); } } if ($i < $self->{conf}->{check_mx_attempts}) {sleep $self->{conf}->{check_mx_delay}; }; } dbg("From_MX: DNS MX A records: " . join(", ", @mxips)) if (scalar @mxips > 0); foreach my $ip (@ips) { my $flag = 1; foreach my $mxip (@mxips) { $flag = 0 if ($ip eq $mxip); } return 1 if ($flag eq 1); } return 0; }
I'm not sure what that patch is about, but please recreate it in unified format (diff -u) and create an attachment (don't copy and paste).
>I'm not sure what that patch is about, but please recreate it in unified >format (diff -u) and create an attachment (don't copy and paste). Ok, I commit new patch (diff -u) during some days after testing
Created attachment 1970 [details] New Evaluation Test that perform sender IPs checking in list of MX From Domain According ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt it is recomended to Register all sender MTAs within DNS as MX records This patch create new evaluation test, that perform this checking. About 98% of spam flag on this test About 50% of ham flag off this test
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain On Mon, May 24, 2004 at 11:45:35PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > According > ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt > it is recomended to Register all sender MTAs within DNS as MX records Wow. That's such an ugly hack. IMO, just use SPF. It's more "correct" than the above draft, and it doesn't overload an RR (MX for instance) to mean more than it's supposed to mean. > About 98% of spam flag on this test > About 50% of ham flag off this test So in short, the rule sucks...?
>IMO, just use SPF. It's more "correct" than the above draft, and it >doesn't overload an RR (MX for instance) to mean more than it's supposed >to mean. Hmmm, where can I read abount SPF? I think, DNS requests will be cached effectivly... >> About 98% of spam flag on this test >> About 50% of ham flag off this test >So in short, the rule sucks...? No! It rule like AWL, like good whitelist - it very effective, may be we shold revert flag and assign score<0 (-1.6 for example) Can anybody perform learning on big corpus hams and spams? I think this rule will be in top20 best rules.
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain On Tue, May 25, 2004 at 09:18:29AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > Hmmm, where can I read abount SPF? http://spf.pobox.com/ > I think, DNS requests will be cached effectivly... That's not my issue. "MX" is meant to specify who receives mail for given domain/host/etc. It's now suggested to overload MX to also mean who can send mail for give domain/host/etc. Also, if I have several machines, all of which are allowed to send mail for my domain, I have to add more 1 record per-domain. Not only is that asking for errors, it's also making the DNS response larger -- which could mean that it'll have to go TCP instead of UDP, which has a huge impact on DNS performance. With SPF, I only need to add 1 record which allows them all. > >> About 98% of spam flag on this test > >> About 50% of ham flag off this test > > No! It rule like AWL, like good whitelist - it very effective, may be we shold > revert flag and assign score<0 (-1.6 for example) I must be missing something... If the rule hits on 98% of spam, but also 50% of ham, the rule sucks. It'll have an S/O ratio of somewhere in the 0.6-0.7 range (assuming equal amounts of ham/spam). > I think this rule will be in top20 best rules. This isn't an anti-spam rule, it's an anti-joejobbing rule, so it's not going to be in the top 20 anti-spam rules. For instance, this rule would allow "bugzilla.spamassassin.org" to send mail as "kluge.net" (it's a backup MX for my domain), whereas I would consider that a forgery since it's not an outgoing mail server for my domain. With SPF I can clearly specify who I allow sending mail using my domain. The reverse is also true (kluge.net would be allowed to send as bugzilla.spamassassin.org...)
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes. Please read about SPF, it's a much cleaner system than this, (and also uses DNS as transport FWIW). Also there's no way we could use a rule for catching spam if it hits 50% of ham... that's way too false-positive-prone. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFAs4MOQTcbUG5Y7woRAvXnAJoDksmgLFu1kurTd0I1Wka2w3AFTQCfV8lR GuSJ0yH8MtKabNY1/1xpx1g= =loVB -----END PGP SIGNATURE-----
Closing as LATER. We'll implement this if it picks up steam and seems to be what the world is going to gather around as a standard for sender verification. I think SPF or a follow-on to SPF is much more likely to take hold.
I should add, thanks a lot for contributing the bug and the code. I hope this hasn't deterred you from making future contributions. :-)
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain FWIW, I had to add MX records for all of my outbound mail host to help assure delivery. I think it's a pretty silly idea but I have had to support it on my end or risk getting email rejected. (I have asymetric mail flow, with clusters for inbound, outbound, local, etc.)
>On Tue, May 25, 2004 at 09:18:29AM -0700, >bugzilla-daemon@bugzilla.spamassassin.org wrote: >> Hmmm, where can I read abount SPF? >http://spf.pobox.com/ Thanx. It takes some years before this technoglogy became in masses. My rule works now. >> I think, DNS requests will be cached effectivly... >That's not my issue. "MX" is meant to specify who receives mail for >given domain/host/etc. It's now suggested to overload MX to also mean >who can send mail for give domain/host/etc. Yes. Like many other rules - RCVD_IN_* , DNS_FROM_RFCI_DSN, NO_DNS_FOR_FROM >Also, if I have several machines, all of which are allowed to send >mail for my domain, I have to add more 1 record per-domain. Not only >is that asking for errors, it's also making the DNS response larger -- >which could mean that it'll have to go TCP instead of UDP, which has a >huge impact on DNS performance. With SPF, I only need to add 1 record >which allows them all. SPF rules is less effective now, then my rule NO_MX_FOR_FROM. It shold be only one criteria to reject or commit new rules - Effectifly of Rule. >> >> About 98% of spam flag on this test >> >> About 50% of ham flag off this test >> >> No! It rule like AWL, like good whitelist - it very effective, may be we >shold >> revert flag and assign score<0 (-1.6 for example) >I must be missing something... If the rule hits on 98% of spam, but >also 50% of ham, the rule sucks. It'll have an S/O ratio of somewhere >in the 0.6-0.7 range (assuming equal amounts of ham/spam). According your logic AWL is very sucks. AWL create scores for example between -4 and +2 with parameters auto_learn_threshold_nonspam 1 auto_learn_threshold_spam 13.0 reshold_spam 13.0 it is equivalent to rewritten AWL (+4 to all) AWL will create scores between 0 and +6 with parameters auto_learn_threshold_nonspam 5 auto_learn_threshold_spam 18.0 required_hits 11 AWL generate +4 for many many hams. This rule sucks! It is your logic. Why are use AWL? Why AWL implemented in SA? AWL sucks more than my rule. >> I think this rule will be in top20 best rules. >This isn't an anti-spam rule, it's an anti-joejobbing rule, so it's not >going to be in the top 20 anti-spam rules. Check it! >For instance, this rule would allow "bugzilla.spamassassin.org" to >send mail as "kluge.net" (it's a backup MX for my domain), whereas I >would consider that a forgery since it's not an outgoing mail server >for my domain. With SPF I can clearly specify who I allow sending mail >using my domain. The reverse is also true (kluge.net would be allowed >to send as bugzilla.spamassassin.org...) My rule NO_MX_FOR_FROM works not because new internet draft. NO_MX_FOR_FROM works and effective _now_ because originaly most email senders use only one email account and this account exist on server with MX records and they send emails though this server.
>Yes. Please read about SPF, it's a much cleaner system than this, >(and also uses DNS as transport FWIW). SPF need to be implemented at many internet servers. >Also there's no way we could use a rule for catching spam if it hits >50% of ham... that's way too false-positive-prone. I think, this rule more effective that AWL (false-positive-prone), DNS_FROM_RFCI_DSN, NO_DNS_FOR_FROM I think, effectivly (points after learning) - should be main criteria to reject rules. I am looking forward that somebody check effectifly of this rule.
>We'll implement this if it picks up steam and seems to be >what the world is going to gather around as a standard for sender verification. >I think SPF or a follow-on to SPF is much more likely to take hold. Are you shure that SPF going to be standard for sender verification? NO_MX_FOR_FROM effective now and more effective that implemented SPF. Because many servers not use SPF, but checked ok though NO_MX_FOR_FROM NO_MX_FOR_FROM is heiristic that work now. If it will be standard - I will reject emails on MTA header checking
According your logic AWL is very sucks. AWL create scores for example between -4 and +2 with parameters auto_learn_threshold_nonspam 1 auto_learn_threshold_spam 13.0 required_hits 7 it is equivalent to rewritten AWL (+4 to all) AWL will create scores between 0 and +6 with parameters auto_learn_threshold_nonspam 5 auto_learn_threshold_spam 17.0 required_hits 11 AWL generate +4 for many many hams. This rule sucks! It is your logic. Why are use AWL? Why AWL implemented in SA? AWL sucks more than my rule. ---------------------- I think AWL and my rule not sucks. This is whitelisted rules, like 0.001 scores in Bayes rules. They are diffrent. but they are very effective.
Created attachment 1972 [details] New Evaluation Whitelisted Test that perform sender IPs checking in list of MX From Domain New Evaluation Test that perform sender IPs checking in list of MX From Domain This patch create new whitelist test, that perform this checking. score RCVD_IN_MX_FROM_LIST 0 -2.105 0 -2.650 Effective score need to be recomputed on big corpus/servers
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain On Tue, May 25, 2004 at 11:59:49PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > >That's not my issue. "MX" is meant to specify who receives mail for > >given domain/host/etc. It's now suggested to overload MX to also mean > >who can send mail for give domain/host/etc. > > Yes. Like many other rules - RCVD_IN_* , DNS_FROM_RFCI_DSN, NO_DNS_FOR_FROM None of those overload a DNS RR types. RCVD_IN_*, and DNS_FROM_RFCI_DSN all request A records, which are for doing name->IP lookups. Via the rule, for a name request you get back an IP. No overloading (what you do with the IP is up to you). NO_DNS_FOR_FROM looks for either an A or MX record to exist, regardless of the data therein. So no overloading. The draft/your rule overloads "MX", which is meant to specify "when sending mail to this host/domain, try delivering mail to a specific list of mail orders using a specific set of priorities". The overload is that "MX" is now also supposed to specify which servers can send mail as the host/domain. A "correct" solution would be to make a new RR, say "MS" which you can then use to specify senders since they are quite commonly different than the hosts that receive mail. This is what SPF does, right now using TXT records, but there is talk about creating a new RR for this purpose.
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >The draft/your rule overloads "MX", which is meant to specify "when >sending mail to this host/domain, try delivering mail to a specific list >of mail orders using a specific set of priorities". The overload is that >"MX" is now also supposed to specify which servers can send mail as the >host/domain. A "correct" solution would be to make a new RR, say "MS" >which you can then use to specify senders since they are quite commonly >different than the hosts that receive mail. This is what SPF does, right >now using TXT records, but there is talk about creating a new RR for this >purpose. One key problem with this is that most (note: not some, MOST) large ISPs use separate servers for incoming and output SMTP traffic. So in other words this test will fire on everything from: yahoo.com gmail.com msn.com aol.com etc. etc. - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFAtMykQTcbUG5Y7woRAqAzAKCoqzmxQbsL1UHcJEA3kfbca2aTzgCfaGJa ifdoaYQptdir1LZ0q57DyOA= =ZAPI -----END PGP SIGNATURE-----
>A "correct" solution would be to make a new RR, >say "MS" which you can then use to specify senders since they are quite >commonly different than the hosts that receive mail. This is what SPF >does, right now using TXT records, but there is talk about creating a >new RR for this purpose. Why special priority 65535 for this MX records is not "correct" solution? What rules should be included in SpamAssasin - that represent correct "solution" or that effective on current traffic?
>One key problem with this is that most (note: not some, MOST) large ISPs >use separate servers for incoming and output SMTP traffic. So in >other words this test will fire on everything from: No, now it is _whitelist_ rule. (I commit new patch) Score on your servers will be zero.
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain On Wed, May 26, 2004 at 12:09:53AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > I am looking forward that somebody check effectifly of this rule. Just so we can clear this up, I started running this on my past 2 day's worth of mail. The original code 1) doesn't work directly with 3.0, and after fixing that, 2) got 0 hits for me after ~3k mails. So I went through and streamlined it a bit, fixed a few logic issues, etc. It's now getting some hits, though not a lot. I'll send up a patch for the EvalTest shortly. Thing to note: The IETF draft is meant to be an anti-forging system, but the rule is meant to be a net-based whitelist -- basically: don't forge, get negative points. They're distinctly different ideas. So let's ignore the draft and the MX overload issues. After 1000 mails, I stopped, since I only had 172 ham in the past 2 days (27.5k spam btw). I didn't want the results to get too skewed. 92 ham hits, 15 spam hits. The ~50% ham hit rate is nice, but since there are spam hits, there's no way you would want to use this as a whitelist. More on that below. The rule, as proposed, looks through all the untrusted received headers, and uses the header From domain, so it's trivial for a spammer to put in any matching forged From and Received header and hit the rule. Definitely not a good whitelist. If you limit the rule to the first untrusted received header only, it becomes harder to forge, but makes the results even worse as well: 29 ham hits, 15 spam hits. Compare this to SPF PASS results over the same 1000 mails: 15 ham hits, 0 spam hits. SPF FAIL results: 0 ham, 29 spam. SPF requires much fewer DNS requests and is therefore significantly faster than the MX version, BTW. In the end, we get back to the fact that anti-forging and anti-spam are 2 distinctly different ideas. The MX rule (as well as SPF PASS rules) don't indicate anything about spaminess. All they indicate is that the sender address isn't forged. This means you still have to scan the message, and spammers can simply go back to not forging the address (which is the goal of such mechanisms). Therefore, having it as a whitelist is just incentive for the spammers to stop forging (good), which will then help them bypass the spam filters (bad). This is why things like SPF_PASS are forced to a score of -0.001. It's good as an indicator for other rules/used in meta rules, but not as a whitelist. What you really want to do is flag when you know forging occurs -- give a penalty if an address is forged. You definitely can't do that right now with the MX version or the proposed rule -- the vast majority of MX records indicate the receiver _only_ (which is what MX is meant to do). Therefore, you can't have a rule that says "add 4 points if the sender IP isn't a MX for the domain". It would flag so much ham as to be useless. With SPF however, the owner of the domain can specify how they want others to treat mail from their domain: there's no overloading, there's no guessing, and it works right now.
Created attachment 1974 [details] version that works in 3.0, streamlined a bit, etc.
I should note that att 1974 is the "first untrusted" version. To have it try all the untrusted headers, change: my $checklast = @originating+1; the 1 would become $self->{conf}->{check_mx_attempts} ...
Than you for patch, Theo! >2) got 0 hits for me after ~3k mails. It strange, because bugzilla@spamassasin flag on this rule :) You did't received messages from us with first 3k emails? Lets define "Effectivly of Tested Rule" or "Strong of Rule" (I suppose that this criteria reject or commit new rules) The first and main criteria is ham/spam ratio for whitelist rules and spam/ham ratio for blacklist rules. The second criteria is "popularity" or "wide" - ham/totalhams from whitelist rules and spam/totalspams for blacklist rules. For better quality all coefficients must be > 200. There are rules that have biggest Ratio (big scores) but work seldom. There are rules that have small Ratio (small scores) but they work almost in every message and there are many rules of this type. The total "Rule Strong" I define as production Ratio*Wide Let see on my rule in your corpus Ratio - 92/15 = 6 it very good! but 15<200 (low level of accuracy) Wide - 92/172 = 0.5 - very good 92<200 SPF Pass Ratio - 15/0 = undefined 15<200, and 0<200 - low level of accuracy Wide - 15/172 = 0.09, not bad 15<200, 172<200 Total Strong My Rule in your corpus = 6*0.5 = 3 Total Strong SPF - undefined. I think SPF very good rule, but not wide now. Let check our whitelist rules whith Bayes00 and Bayes01! On my server in russia I think I will have on SPF Pass 0 hams and 0 spams. zero divide zero = ? IF SPF became popular, Wide will rise, virus-spamers will send mail correctly (you are right!), and Ratio will be fall down. And Total Wide*Ratio will be about constant! *But if we will be have many whitelist rules (as many as blacklist now()) total effect from all of them will be very strong. Whitelist and blacklist rules with the same Strong Wide*Ratio have the equal possibility to divide mail into spam and ham. I think SpamAssasin developers should use "Wide*Ratio" as a main criteria to accept or reject new rules (and remove old rules). And here, there is no diffrence between whitelist and blacklist rules. I remain in opinion, that if we will create "Wide*Ratio" rating for whitelist and blacklist rules, my rule will be in top20.
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain On Wed, May 26, 2004 at 02:54:15PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote: > >2) got 0 hits for me after ~3k mails. > > It strange, because bugzilla@spamassasin flag on this rule :) > You did't received messages from us with first 3k emails? I don't put spam-related mails into my corpus or filter them through SpamAssassin in general. > I remain in opinion, that if we will create "Wide*Ratio" rating for whitelist > and blacklist rules, my rule will be in top20. I think you missed the point of my previous mail. So I'm not going to post any more in the ticket after this mail. Your proposed rule is trivially forgable, meaning it will absolutely not be considered for general use, even if it was at 100% ham right now. Those types of rules are targeted by spammers and become useless as time goes on. This is why we have very strict requirements for "nice" rules now. The "first untrusted only" version is harder to forge, but there are still a relatively high percentage of spam hits (~1.8% in my test). The reason for this is that your rule deals with "forging", and not spam detection. It is perfectly valid for spam to be hit by this rule -- in fact that's the end goal of such rules, to stop address forgery. Therefore it is completely not appropriate to have as a whitelist in and of itself.
>I think you missed the point of my previous mail. So I'm not going to >post any more in the ticket after this mail. >The rule, as proposed, looks through all the untrusted received headers, >and uses the header From domain, so it's trivial for a spammer to >put in any matching forged From and Received header and hit the rule. >Definitely not a good whitelist. Thanx for your time and I am sorry. You are right about my first patch. In my second patch I check all IPs in header. All IP (not only one of them) should be in MX lists. Lets speak about "forging" From Spamers use hacked servers, many of that dont send or received mail. There are no domains, mx lists of that have this servers. It is impossible to forge "From" from this servers. Spamers and viruses should create a map "sender IP"->"correct From" to forge rule. It is very hard task. It is impossible for many spamer servers. The percentage of spams for rule can rise, but remain at low level. The potencial "forging" of rule is low. If you use AWL,BAYES_00,RCVD_IN_BSP_TRUSTED and plan to use SPF_CHECK, tell me criteria, why this rules better than my? They have low level of "forging"? They are hard to forge? How do you compute "the level of potential forging"? Lets define a new criteria of accepting new whitelisted rules: Forging*Wide*Ratio If we cannot define "Forging", lets use current Wide*Ratio.
I think we should first check SPF TXT records for From domain. If SPF exists - we do not check thus rule. If SPF exists - we check MX records. Rule will be fire if there is no SPF records _AND_ senderIP is not in the list From domain MX records. By the time a quantity of SPF domains will rise and this rule will be work more seldom.
>I should add, thanks a lot for contributing the bug and the code. I hope >this hasn't deterred you from making future contributions. :-) Possibly has. :) The Problem is that I dont undestand your criteria for accepting new rules. :( This should be a mathematical criteria, statistical formula and then contributers like me can check new rules and contribute them after statistical checking. It cant for example Wide*Ratio as I defined. Or Wide*Ratio*PotencialForging Or Wide*Ratio*PotencialForging*Correlation May be something other... But it must be the formula, not words "the rule should not be sucks" NOW I dont undestand why you use some rules and reject other...
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain > NOW I dont undestand why you use some rules and reject other... Whitelist rules cannot be easily defeated by spammers - yours is. Whitelist rules need to have an S/O ratio below 0.1% - yours is not. Whitelist rules need to hit at least 0.1% of ham - yours does. 1 out of 3 -> very bad 2 out of 3 -> not good 3 out of 3 -> likely we'll use unless it's very expensive, violates a policy, or overlaps too much with another rule Blacklist rules are harder to quantify since some ham hits are okay and defeatability is not as big of an issue. Roughly speaking... Blacklist rules should hit > 1% of spam Blacklist rules should hit < 0.5% of ham Blacklist rules should have an S/O ratio of 0.95 or better Blacklist rules should not overlap other rules, especially better rules, too much Blacklist rules should not be too expensive to run or use. Blacklist rules cannot violate a policy (unwritten or written) such as intentionally discriminating against a particular country. Daniel
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain P.S. We reserve the right to not accept any code for any reason since it is not possible to enumerate every reason why code is not going to be accepted. This is common sense, but I think it needs to be pointed out considering the current line of discussion.
Thank you for answer! >Whitelist rules cannot be easily defeated by spammers - yours is. Many hacked servers have IPs, that not exists in any mx records of any domain. Spamers use the nets of thousands hacked computers to send spam. My rule can be forged easily as SPF rule, that included as plugin. >Whitelist rules need to have an S/O ratio below 0.1% - yours is not. My rule at corpus 50% hams and %50 spams have R/0 about 5% I think it is good. If I will check SPF records in my rule - S/0 will be better. Could you tell me R/0 of BAYES_00,BAYES_01,BAYES_05,BAYES_10, BAYES_25, AWL (AWL < -1), RCVD_IN_BSP_TRUSTED,HABEAS_USER and SPF_CHECK (ham rules) on your big corpus? I think, they have R/0 more than 0.1% >Whitelist rules need to hit at least 0.1% of ham - yours does. 50% of ham - is very effective. 50% is more that 0.1% in 500 times. My rule has very good Hit Rate and it can compensate R/0 My rule totaly uncorelated with other rules - they not correlate with SPF, for example. >Blacklist rules should have an S/O ratio of 0.95 or better Ok, but it is assymetric to hams rules 1 - 0.1% = 99,9% = 0.999 Are you shure? :)
Subject: Re: New Rule: Checking sender IP against MX records From: user@domain > Many hacked servers have IPs, that not exists in any mx records of any > domain. Spamers use the nets of thousands hacked computers to send > spam. > > My rule can be forged easily as SPF rule, that included as plugin. No. We have some SPF whitelist rules with very very very low scores (and they may be removed), but the only rules with appreciable scores are positive (not whitelist) and are working increasingly well over time. > Could you tell me R/0 of BAYES_00,BAYES_01,BAYES_05,BAYES_10, > BAYES_25, AWL (AWL < -1), RCVD_IN_BSP_TRUSTED,HABEAS_USER and > SPF_CHECK (ham rules) on your big corpus? You can look this up yourself. Follow the link to collated results from the hacking on SpamAssassin web page. I don't mind answering questions, but I need to be clear, your MX rule is not going into SpamAssassin. This is my last comment on this bug.
>You can look this up yourself. Follow the link to collated results from >the hacking on SpamAssassin web page. Thank you for intresting link! >I don't mind answering questions, but I need to be clear, your MX rule >is not going into SpamAssassin. This is my last comment on this bug. Thank you for your time! First, I am going critisize your criteria, Can I? I think, If we will critisize something, this will be better by the time. Your criteria is very strong. All hams rules can be forged by spamers. All of them. AWL could be forged by checking active state of email accounts. BAYES rules can be forged by sending anekdots or japan hoku. SPF and my rule can be forged by seaching "right" emails. To perform your criteria you should delete all ham rules right now. But I think you strategy to reject ham rules is wrong. I think ham rules with R/0 more than 0.25 are good and can be usefull IMHO If we will have 100 uncorrelated ham rules with 0.25 R/0 ratio with big Hit Rate - they create strong effect. This is mathematical fact. I am not going prove it. Trust me. Now I am going show that many rules from stable version do not hit your criteria. For example let see at ham BAYES rules and user "jm", that have ham=spam in corpus. R/0 44.692 1.0003 88.3846 0.011 0.82 0.00 BAYES_00:jm 0.138 0.1875 0.0875 0.682 0.44 0.00 BAYES_05:jm 0.213 0.2626 0.1625 0.618 0.42 0.00 BAYES_10:jm 0.225 0.4126 0.0375 0.917 0.36 0.00 BAYES_25:jm There are no ham BAYES rules that perform your criteria R/0 < 0.001 R/0 0.011 more 0.001 0.682 more 0.001 0.618 more 0.001 0.917 more 0.001! My rule have R/0 about 0.05, but you are reject it. SPF_CHECK 0.700 0.6002 0.8002 0.429 0.38 -0.00 SPF_PASS:jm 0.429 more 0.001 I did't find AWL rules. So.. why we use ham BAYES rules? Are there people that dont said "this is my last comment"? Dont trust intuition! Trust mathematical formulas!
> First, I am going critisize your criteria, Can I? You just did. >I think, If we will critisize something, this will be better by the time. Only if you're right. (You aren't) > AWL could be forged by checking active state of email accounts. > BAYES rules can be forged by sending anekdots or japan hoku. > SPF and my rule can be forged by seaching "right" emails. AWL and BAYES can not be forged, since they are dependent on site-specific e-mail. I don't think you understand how BAYES rules work. They can not be forged, although they could perhaps for a specific user, they can not be forged on a widescale basis. I don't think you understand SPF either. (Or I don't understand your comment) > To perform your criteria you should delete all ham rules right now. We have deleted all ham rules that are not forgeable. We had ham rules in previous releases under less stringent criteria, and they were simply abused by spammers. > But I think you strategy to reject ham rules is wrong. > I think ham rules with R/0 more than 0.25 are good and can be usefull > IMHO If we will have 100 uncorrelated ham rules with 0.25 R/0 ratio with big > Hit Rate - they create strong effect. This is mathematical fact. I am not > going prove it. Trust me. I don't know what you're talking about with R/O. But we have seen that any ham tests that can be abused by spammers will be abused by spammers. If we have 100 ham rules, spamassassin will suck, as spammers will be able to get every message by it. > For example let see at ham BAYES rules and user "jm", that have ham=spam in > corpus. BAYES tests CAN NOT BE FORGED BY SPAMMERS! > My rule have R/0 about 0.05, but you are reject it. It is not as good a solution to the problem as SPF. It is not really a standard (or certainly not one thats gained any support) > So.. why we use ham BAYES rules? BAYES tests CAN NOT BE FORGED BY SPAMMERS! > Are there people that dont said "this is my last comment"? Yes. > Dont trust intuition! Trust mathematical formulas! We need to balance mathematical formulas with intuition. Mathematical formulas can't model the way spammers react to our creation of easy to forge ham rules. (Or at least, if they can, we don't know them...) As a result, these rules get lower than optimal scores, and we all get burned. Our scores are based on mathematics (have you looked at the perceptron)? I think it's very clear from the response you have received that this rule is not going in to SpamAssassin. We encourage you to continue to contribute, but please don't continue to waste our time by arguing this point with us. I'm tempted to mark this bug CLOSED (since I don't think you can add comments to CLOSED bugs) but I'm not going to right now.
> AWL could be forged by checking active state of email accounts. > BAYES rules can be forged by sending anekdots or japan hoku. > SPF and my rule can be forged by seaching "right" emails. I think the points about these could be made more explicit. AWL can no longer be forged since we added the Received header to the information along with the From address. A spammer not only has to use a valid From address that you receive ham from (assuming they can customize the spam to you that way) but it has to come from the same server that your ham from that email address comes from. BAYES cannot be forged by sending jokes or haiku unless a large portion of your ham consists of similar jokes or haiku. Bayes looks for words that are common only in your ham or only in your spam. Random infrequently seen words are not used. A spammer would need to have access to your personal Bayes database to be able to forge ham. SPF is not a ham rule. A message can fail the SPF test, which detects spam, or it can pass, which scores -0.001 to allow it to be used in a meta rule, or the result can be unknown. I'm not sure what you mean by forging by 'searching "right" emails' but if you think that a spammer can just copy headers from good email, that's wrong. The test looks for "trusted relays" and does not believe the information from untrusted ones. So spf_pass cannot be forged. It is possible for a spammer to send spam through a server that is properly configured with SPF, and that mail would still be spam. That's why it is not a ham rule.
>AWL and BAYES can not be forged, since they are dependent on site-specific >e-mail. I don't think you understand how BAYES rules work. They can not be >forged, although they could perhaps for a specific user, they can not be forged >on a widescale basis. I don't think you understand how spamers work. They Can forge Bayes rules for text body. From mathematical point of view there is no diffrence between ham and spam rules. From spamers point of view there is no diffrence between forging ham or spam rules. Spam rules can be easily "forging" as a ham rules. Therefore we should have the same criteria for ham and spam rules. The same r/o and hitrate criteria. BAYES, AWL and more rules can be easily forge next way: spamers hack many users computers and get old emails, that was sented corretly though correctly servers with correctly "From". They change dates in this emails, add spamer text or spamer image and send it. This forge SPF_CHECK,AWL,BAYES and my rule. Therefore, any rule, spam or ham, can be forged by this way. And therefore we should not reject rules with r/o from 0.01 to 0.24 and from 0.75 to 0.95. If we will have many uncorrelated rules of this type - and they will have big hit rate - they can be effective if work together. Bayes rules work the same way - bayes rules do not reject tokes whith probabylity between 0.05 and 0.95, all this tokens work, and therefore bayes rules work! If we will reject tokens with probability 0.05 and 0.95 - bayes rules will drop effect. >We have deleted all ham rules that are not forgeable. We had ham rules in >previous releases under less stringent criteria, and they were simply abused by >spammers. Spamers abuse not only ham rules. They abuse many spam rules. From mathematical point of view there is no diffrence between ham and spam rules. If we have many spam rules whith big scores - and spamers can abuse them - they will do it. The forging problem exist for spam rules too. >It is not as good a solution to the problem as SPF. It is not really a standard >(or certainly not one thats gained any support) Yes, not good, but have mathematicaly effect according your mathematical criteria. May be you can formalize term "good solution" in mathematical terms? >BAYES tests CAN NOT BE FORGED BY SPAMMERS! According your statistic now BAYES_20 forged by spammers! They have score -1.428 and "bad" R/O about 0.60 >Our scores are based on mathematics (have you looked at the perceptron)? I dont trust perceptron, I trust clear statistical mathematical methods. >Mathematical formulas >can't model the way spammers react to our creation of easy to forge ham rules. If you are not proffessional spamer, how do you know, what easy and what not easy? I have my criteria for "easy" - a potencial quantity of spamer servers that can forge the rule. Many many spamer servers can not forge my rule. Therefore my rule not easy and have biggest Hit Rate (if we compare it with SPF) >I think it's very clear from the response you have received that this rule is >not going in to SpamAssassin. We encourage you to continue to contribute, but >please don't continue to waste our time by arguing this point with us. I'm >tempted to mark this bug CLOSED (since I don't think you can add comments to >CLOSED bugs) but I'm not going to right now. May be I ask stupid questions that exist in developers FAQ? You can do it in any moment, but are you shure you are ringh in 100%? Have we mathematical theory that tell us that you should reject ham rules with R/0 < 0.001? Is there any links to mathematical articles in science journals? I waste your time, but You waste my.. How do you know, that I am not rigt? Because I am alone with marginal opinion? You trust your intuition, that have errors... You can shut up me in any moment, it is your Right. Thank you for your time.
If we define Rating as a log(HitRate*Ratio) where Ratio = Spam/Ham for spam rules and Ham/Spam for ham rules then top20 will be next: Rating Hitrate R/0 Spam Ham Name 7.639 32.854 63.3 65.6746 0.0375 URIBL_WS_SURBL:jm 7.588 44.692 44.185 1.0003 88.3846 BAYES_00:jm 7.358 28.007 56.014 56.0140 0.0000 BAYES_99:jm 7.292 38.700 37.952 76.3911 1.0128 RCVD_IN_XBL:jm 7.254 26.600 53.2 53.2008 0.0000 MIME_HTML_ONLY:jm 7.226 34.117 40.331 67.5628 0.6752 RCVD_IN_DSBL:jm 6.741 23.651 35.801 46.9926 0.3126 RCVD_IN_SORBS_DUL:jm 6.528 33.783 20.257 65.3413 2.2256 HTML_MESSAGE:jm 6.387 23.989 24.776 47.0802 0.9002 URIBL_SBL:jm 6.322 16.685 33.37 33.3708 0.0000 MIME_BOUND_DD_DIGITS:jm 6.274 17.804 29.828 35.4214 0.1875 MIME_HTML_NO_CHARSET:jm 6.128 15.148 30.298 30.2989 0.0000 DCC_CHECK:jm 5.805 12.884 25.768 25.7689 0.0000 MIME_HTML_ONLY_MULTI:jm 5.764 12.791 24.933 25.5564 0.0250 BIZ_TLD:jm 5.65 33.058 8.603 5.9890 60.1275 AWL:jm 5.434 27.500 8.333 50.0000 5.0000 MY_RULE:jm 5.352 12.085 17.473 23.8089 0.3626 RCVD_IN_BL_SPAMCOP_NET:jm 5.347 10.515 19.981 20.9802 0.0500 BAYES_50:jm 5.216 12.035 15.305 23.5338 0.5376 RCVD_IN_NJABL_PROXY:jm 5.111 9.109 18.217 18.2171 0.0000 MSGID_SPAM_CAPS:jm My rule in the top20!!
top100 rating: Rating Hitrate R/0 Spam Ham Name 7.639 32.854 63.3 65.6746 0.0375 URIBL_WS_SURBL:jm 7.588 44.692 44.185 1.0003 88.3846 BAYES_00:jm 7.358 28.007 56.014 56.0140 0.0000 BAYES_99:jm 7.292 38.700 37.952 76.3911 1.0128 RCVD_IN_XBL:jm 7.254 26.600 53.2 53.2008 0.0000 MIME_HTML_ONLY:jm 7.226 34.117 40.331 67.5628 0.6752 RCVD_IN_DSBL:jm 6.741 23.651 35.801 46.9926 0.3126 RCVD_IN_SORBS_DUL:jm 6.528 33.783 20.257 65.3413 2.2256 HTML_MESSAGE:jm 6.387 23.989 24.776 47.0802 0.9002 URIBL_SBL:jm 6.322 16.685 33.37 33.3708 0.0000 MIME_BOUND_DD_DIGITS:jm 6.274 17.804 29.828 35.4214 0.1875 MIME_HTML_NO_CHARSET:jm 6.128 15.148 30.298 30.2989 0.0000 DCC_CHECK:jm 5.805 12.884 25.768 25.7689 0.0000 MIME_HTML_ONLY_MULTI:jm 5.764 12.791 24.933 25.5564 0.0250 BIZ_TLD:jm 5.65 33.058 8.603 5.9890 60.1275 AWL:jm 5.434 27.500 8.333 5.0000 50.0000 MY_RULE:jm 5.352 12.085 17.473 23.8089 0.3626 RCVD_IN_BL_SPAMCOP_NET:jm 5.347 10.515 19.981 20.9802 0.0500 BAYES_50:jm 5.216 12.035 15.305 23.5338 0.5376 RCVD_IN_NJABL_PROXY:jm 5.111 9.109 18.217 18.2171 0.0000 MSGID_SPAM_CAPS:jm 5.101 9.065 18.129 18.1295 0.0000 RCVD_DOUBLE_IP_SPAM:jm 5.089 9.127 17.784 18.2296 0.0250 MISSING_MIMEOLE:jm 5.058 9.602 16.38 19.0423 0.1625 RCVD_NUMERIC_HELO:jm 5.005 8.640 17.279 17.2793 0.0000 HELO_DYNAMIC_IPADDR:jm 4.975 12.447 11.629 23.8435 1.0503 RCVD_BY_IP:jm 4.93 8.321 16.641 16.6417 0.0000 LONGWORDS:jm 4.797 8.434 14.371 16.7063 0.1625 RCVD_IN_SORBS_HTTP:jm 4.689 7.377 14.753 14.7537 0.0000 X_MESSAGE_INFO:jm 4.592 7.027 14.053 14.0535 0.0000 FORGED_MUA_OUTLOOK:jm 4.567 6.939 13.878 13.8785 0.0000 DRUGS_ERECTILE:jm 4.496 9.015 9.952 17.2940 0.7377 RCVD_IN_SORBS_MISC:jm 4.475 6.670 13.163 13.3283 0.0125 RCVD_HELO_IP_MISMATCH:jm 4.257 8.778 8.05 16.5062 1.0503 RCVD_IN_RFCI:jm 4.148 5.626 11.252 11.2528 0.0000 HTML_BACKHAIR_8:jm 4.13 5.727 10.861 11.4043 0.0500 URIBL_SC_SURBL:jm 4.033 5.314 10.627 10.6277 0.0000 MSGID_RANDY:jm 3.947 5.089 10.177 10.1775 0.0000 UNIQUE_WORDS:jm 3.927 5.108 9.941 10.1900 0.0250 MISSING_OUTLOOK_NAME:jm 3.83 8.727 5.281 15.5164 1.9380 MSGID_FROM_MTA_ID:jm 3.822 5.039 9.071 9.9787 0.1000 URIBL_BE_SURBL:jm 3.788 5.464 8.087 10.6152 0.3126 HTML_60_70:jm 3.757 8.021 5.34 14.3536 1.6879 LINES_OF_YELLING:jm 3.548 4.170 8.339 8.3396 0.0000 DRUGS_ANXIETY:jm 3.539 4.151 8.302 8.3021 0.0000 DRUGS_ERECTILE_OBFU:jm 3.535 4.170 8.224 8.3271 0.0125 HTML_IMAGE_ONLY_08:jm 3.528 4.157 8.199 8.3021 0.0125 DRUGS_PAIN:jm 3.503 4.076 8.151 8.1520 0.0000 MSGID_DOLLARS:jm 3.481 22.843 1.423 18.2671 27.4194 FORGED_RCVD_HELO:jm 3.475 4.020 8.04 8.0405 0.0000 RCVD_IN_RSL:jm 3.45 4.551 6.923 8.8283 0.2751 DNS_FROM_RFCI_DSN:jm 3.433 4.614 6.718 8.9022 0.3251 HTML_50_60:jm 3.393 4.251 7.001 8.3146 0.1875 HTML_80_90:jm 3.33 3.738 7.476 7.4769 0.0000 DATE_SPAMWARE_Y2K:jm 3.256 4.689 5.536 8.7897 0.5876 MIME_QP_LONG_LINE:jm 3.252 3.595 7.189 7.1893 0.0000 RATWARE_ZERO_TZ:jm 3.217 3.532 7.065 7.0651 0.0000 RCVD_IN_NJABL_DIALUP:jm 3.195 3.495 6.989 6.9892 0.0000 HELO_DYNAMIC_HCC:jm 3.126 3.520 6.478 6.9642 0.0750 HTML_90_100:jm 3.111 3.351 6.701 6.7017 0.0000 HELO_DYNAMIC_DHCP:jm 3.103 3.338 6.676 6.6767 0.0000 URI_AFFILIATE:jm 3.04 3.395 6.162 6.7017 0.0875 PRIORITY_NO_NAME:jm 3.039 3.301 6.326 6.5641 0.0375 HTML_LINK_CLICK_HERE:jm 3.023 3.207 6.414 6.4141 0.0000 HELO_DYNAMIC_IPADDR2:jm 2.986 3.826 5.179 7.2518 0.4001 CLICK_BELOW:jm 2.98 3.138 6.277 6.2774 0.0000 SPF_HELO_FAIL:jm 2.967 3.570 5.446 6.8767 0.2626 HTML_40_50:jm 2.928 3.057 6.114 6.1140 0.0000 HTML_IMAGE_RATIO_02:jm 2.916 3.213 5.751 6.3266 0.1000 FORGED_YAHOO_RCVD:jm 2.844 3.476 4.944 6.6142 0.3376 HTML_70_80:jm 2.822 3.026 5.56 5.9772 0.0750 RCVD_IN_SBL:jm 2.742 2.907 5.338 5.7389 0.0750 URI_REDIRECTOR:jm 2.717 2.751 5.502 5.5021 0.0000 DNS_FROM_AHBL_RHSBL:jm 2.656 2.669 5.338 5.3388 0.0000 FORGED_OUTLOOK_TAGS:jm 2.604 2.601 5.201 5.2013 0.0000 MIME_BASE64_TEXT:jm 2.598 2.688 5.001 5.3138 0.0625 FORGED_HOTMAIL_RCVD2:jm 2.589 2.732 4.876 5.3638 0.1000 NORMAL_HTTP_TO_IP:jm 2.565 2.588 5.026 5.1519 0.0250 RCVD_IN_SORBS_SMTP:jm 2.561 2.926 4.426 5.5889 0.2626 HTML_30_40:jm 2.477 2.876 4.143 5.4389 0.3126 HTML_20_30:jm 2.462 2.738 4.286 5.2513 0.2251 MIME_BASE64_NO_NAME:jm 2.391 2.494 4.383 4.8762 0.1125 RCVD_ILLEGAL_IP:jm 2.388 2.507 4.345 4.8887 0.1250 HTML_10_20:jm 2.385 2.401 4.524 4.7512 0.0500 HTML_BADTAG_00_10:jm 2.381 2.326 4.651 4.6512 0.0000 MISSING_SUBJECT:jm 2.348 2.288 4.576 4.5761 0.0000 HTML_OBFUSCATE_20_30:jm 2.308 5.589 1.798 7.8270 3.3508 NO_REAL_NAME:jm 2.293 2.226 4.451 4.4511 0.0000 MIME_BASE64_BLANKS:jm 2.289 3.063 3.222 5.4389 0.6877 UNDISC_RECIPS:jm 2.276 2.713 3.589 5.0263 0.4001 INFO_TLD:jm 2.236 2.163 4.326 4.3261 0.0000 HTML_WEB_BUGS:jm 2.224 2.151 4.301 4.3011 0.0000 DOMAIN_RATIO:jm 2.202 2.176 4.157 4.3136 0.0375 MIME_QP_EXCESSIVE:jm 2.159 2.082 4.163 4.1635 0.0000 HTML_MIME_NO_HTML_TAG:jm 2.147 2.069 4.138 4.1385 0.0000 HTML_SHOUTING3:jm 2.129 2.051 4.101 4.1010 0.0000 DATE_IN_FUTURE_12_24:jm 2.051 1.988 3.914 3.9635 0.0125 RCVD_DOUBLE_IP_LOOSE:jm 1.984 1.907 3.813 3.8135 0.0000 BAYES_90:jm 1.97 1.894 3.788 3.7884 0.0000 OBFUSCATING_COMMENT:jm 1.957 1.882 3.763 3.7634 0.0000 MSGID_YAHOO_CAPS:jm 1.95 1.875 3.75 3.7509 0.0000 DRUGS_ANXIETY_EREC:jm
Created attachment 1978 [details] Script that compute Quality of Rules Rating We need a mathematical criteria for accepting and rejecting new rules. I suppose you formula log(HitRate*Ratio)
Created attachment 1979 [details] Checking sender IP against MX records From New ham Rule, that check sender IP against MX records From New: Rule check SPF records, if they exist - rule not fire. This new Rule should have better R/0 ratio and (I think better that AWL have) and highly recomended to be included in SA. Networks administrators now have two possibility - send mail only from servers that listed in MX records (now many servers have this behavior) OR use SPF mechanism. If SPF for domain is On, This Rule NOT Fire.
Can anybody include new patch to test, at big corpus (where ham ~ spam)? Thank you
testing out moving bugs off of RESOLVED/LATER status en masse
resolving bugs previously marked RESOLVED/LATER
>According >ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt >it is recomended to Register all sender MTAs within DNS as MX records >with special priority (65535) According to ftp://ftp.nordu.net/ietf-online-proceedings/05mar/proceedings/IDs/draft-stout-antispam-01.txt " This Internet-Draft has been deleted. Unrevised documents placed in the Internet-Drafts directories have a maximum life of six months. After that time, they are deleted. This Internet-Draft was not published as an RFC. The name of the internet-draft was draft-stout-antispam-00.txt ... " so it's expired. dead. on the other hand, checking the MX may be used in zombie detection: if client looks dynamic, but it is the MX of the sender domain, then reduce the score (if it was increased due to *DYN* and *DHCP* ... rules). of course, this can be abused by spammers...