Bug 3417 - New Rule: Checking sender IP against MX records From: user@domain
Summary: New Rule: Checking sender IP against MX records From: user@domain
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (Eval Tests) (show other bugs)
Version: 2.63
Hardware: All All
: P5 enhancement
Target Milestone: 3.1.0
Assignee: SpamAssassin Developer Mailing List
URL: ftp://ftp.rfc-editor.org/in-notes/int...
Whiteboard: resolved_later
Keywords:
Depends on:
Blocks:
 
Reported: 2004-05-21 07:03 UTC by Sergey Shmelev
Modified: 2008-11-04 23:17 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
New Evaluation Test that perform sender IPs checking in list of MX From Domain patch None Sergey Shmelev [NoCLA]
New Evaluation Whitelisted Test that perform sender IPs checking in list of MX From Domain patch None Sergey Shmelev [NoCLA]
version that works in 3.0, streamlined a bit, etc. patch None Theo Van Dinter [HasCLA]
Script that compute Quality of Rules Rating text/plain None Sergey Shmelev [NoCLA]
Checking sender IP against MX records From patch None Sergey Shmelev [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Sergey Shmelev 2004-05-21 07:03:06 UTC
According 
ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt
it is recomended to Register all sender MTAs within DNS as MX records with 
special priority (65535)

After this MTA should check sender IP address against the obtained via DNS list
of valid mail servers.

It possible to create the new rule for Spamassassin that perform this checking?
Comment 1 Justin Mason 2004-05-21 09:08:25 UTC
Subject: Re:  New: New Rule: Checking sender IP against MX records From: user@domain 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>According
>ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt
>it is recomended to Register all sender MTAs within DNS as MX records
>with special priority (65535)
>
>After this MTA should check sender IP address against the obtained via
>DNS list of valid mail servers.
>
>It possible to create the new rule for Spamassassin that perform this
>checking?

In terms of a core rule, probably not.  This is very similar to what SPF
is intended to solve, we already have that support in 3.0.0, and SPF
has all the mindshare.

In terms of a third-party plugin, I can't see why not if someone else
implements that ;)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFArilcQTcbUG5Y7woRAt5rAKDagbAlAfgu6RQYzNM6rG+lLUvFagCfUgWK
sI2Ws7IPXq25g7U2B8hN33A=
=IVhY
-----END PGP SIGNATURE-----

Comment 2 Daniel Quinlan 2004-05-22 08:33:20 UTC
As Justin says, there's no mindshare behind this idea, so closing as WONTFIX.
Comment 3 Sergey Shmelev 2004-05-22 12:41:06 UTC
>In terms of a third-party plugin, I can't see why not if someone else
implements that

I have looked at the code and found, that all I need exist in two subs - 
check_for_from_dns and check_rbl_backend in EvalTests.pm
I combine this two subs and create new one, that implemented my rule.

I think this rule will be very effective and recomend you to include it in
stable (2.6x) version of SpamAssasin.

Sorry for bad english.

Here the diff:

EvalTests.pm:
< sub check_for_from_mx {
<   my ($self) = @_;
<
<   my $from = $self->get ('Reply-To:addr');
<   if (!defined $from || $from !~ /\@\S+/) {
<     $from = $self->get ('From:addr');
<   }
<   return 0 unless ($from =~ /\@(\S+)/);
<   $from = $1;
<
<   # First check that DNS is available, if not do not perform this check
<   return 0 unless $self->is_dns_available();
<   $self->load_resolver();
<
<   if ($from eq 'compiling.spamassassin.taint.org') {
<     # only used when compiling
<     return 0;
<   }
<
<   if ($self->{conf}->{check_mx_attempts} < 1) {
<     return 0;
<   }
<
<
<   local ($_);
<
<   # First check that DNS is available, if not do not perform this check
<   return 0 if $self->{conf}->{skip_rbl_checks};
<   return 0 unless $self->is_dns_available();
<   $self->load_resolver();
<
<   # How many IPs max you check in the received lines
<   my $checklast=$self->{conf}->{num_check_received};
<
<   my @fullips = map { $_->{ip} } @{$self->{relays_untrusted}};
<
<   # Make sure a header significantly improves results before adding here
<   # X-Sender-Ip: could be worth using (very low occurance for me)
<   # X-Sender: has a very low bang-for-buck for me
<   my @originating;
<   for my $header ('X-Originating-IP', 'X-Apparently-From') {
<     my $str = $self->get($header);
<     next unless defined $str;
<     push (@originating, ($str =~ m/($IP_ADDRESS)/g));
<   }
<
<   return 0 unless (scalar @fullips + scalar @originating > 0);
<
<   # Let's go ahead and trim away all Reserved ips (KLC)
<   # also uniq the list and strip dups. (jm)
<   my @ips = ();
<   my %seen = ();
<   foreach my $ip (@fullips) {
<     next if (exists ($seen{$ip})); $seen{$ip} = 1;
<     if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@ips, $ip); }
<   }
<
<   dbg("Got the following IPs: ".join(", ", @ips), "rbl", -3);
<
<   if (scalar @ips + scalar @originating > 0) {
<     # If name is foo-notfirsthop, check all addresses except for
<     # the originating one.  Suitable for use with dialup lists, like the PDL.
<     # note that if there's only 1 IP in the untrusted set, do NOT pop the
<     # list, since it'd remove that one, and a legit user is supposed to
<     # use their SMTP server (ie. have at least 1 more hop)!
<     if ($set =~ /-notfirsthop$/) {
<       if (scalar @ips > 1) { pop @ips; }
<     }
<     # If name is foo-firsttrusted, check only the Received header just
<     # after it enters our trusted networks; that's the only one we can
<     # trust the IP address from (since our relay added that header).
<     # And if name is foo-untrusted, check any untrusted IP address.
<     elsif ($set =~ /-(first|un)trusted$/) {
<       push(@ips, @originating);
<       if ($1 eq "first") {
<       @ips = ( $ips[0] );
<       }
<       else {
<       shift @ips;
<       }
<     }
<     else {
<       # create a new list to avoid undef errors
<       my @newips = ();
<       my $i; for ($i = 0; $i < $checklast; $i++) {
<       my $ip = pop @ips; last unless defined($ip);
<       push (@newips, $ip);
<       }
<       # add originating IPs as untrusted IPs
<       for my $ip (@originating) {
<       next if (exists ($seen{$ip})); $seen{$ip} = 1;
<       if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@newips, $ip); }
<       }
<       @ips = @newips;
<     }
<   }
<   dbg("But only inspecting the following IPs: ".join(", ", @ips), "rbl", -3);
<
<
<   # Try check_mx_attempts times to protect against temporary outages.
<   # sleep between checks to give the DNS a chance to recover.
<   my @mxips = ();
<   for my $i (1..$self->{conf}->{check_mx_attempts}) {
<     my @mx = Net::DNS::mx($self->{res}, $from);
<     dbg ("DNS MX records found: " . scalar(@mx));
<
<     dbg("DNS MX records: ".join(", ", @mx), "mx", -3);
<
< #   return 0 if (scalar @mx > 0);
<     foreach my $mx (@mx) {
<       my $query = $self->{res}->search($mx);
<       if ($query) {
<                       my $count = 0;
<                       foreach my $rr ($query->answer) {
<                       if ($rr->type eq "A") {
<                               $count++;
<                               push (@mxips, $rr->rdatastr);
<                       }
<                       }
<                       dbg ("DNS A records found: $count");
<       }
<     }
<     if ($i < $self->{conf}->{check_mx_attempts}) {sleep
$self->{conf}->{check_mx_delay}; };
<   }
<   foreach my $ip (@ips) {
<       my $flag = 1;
<       foreach my $mxip (@mxips) {
<               $flag = 0 if ($ip eq $mxip);
<       }
<       return 1 if ($flag eq 1);
<   }
<
<   return 0;
< }

20_head_tests.cf:
header NO_MX_FOR_FROM           eval:check_for_from_mx()
describe NO_MX_FOR_FROM         Sender IP should be in the list of mx records of
a Domain in From header
tflags NO_MX_FOR_FROM           net

50_scores.cf:
score NO_MX_FOR_FROM 0 1.105 0 1.650



Where can I read something about "SPF"?
Comment 4 Sergey Shmelev 2004-05-23 01:52:48 UTC
new version of sub:

sub check_for_from_mx {
  my ($self) = @_;

  my $from = $self->get ('Reply-To:addr');
  if (!defined $from || $from !~ /\@\S+/) {
    $from = $self->get ('From:addr');
  }
  return 0 unless ($from =~ /\@(\S+)/);
  $from = $1;

  # First check that DNS is available, if not do not perform this check
  return 0 unless $self->is_dns_available();
  $self->load_resolver();

  if ($from eq 'compiling.spamassassin.taint.org') {
    # only used when compiling
    return 0;
  }

  if ($self->{conf}->{check_mx_attempts} < 1) {
    return 0;
  }

  local ($_);

  # First check that DNS is available, if not do not perform this check
  return 0 if $self->{conf}->{skip_rbl_checks};
  return 0 unless $self->is_dns_available();
  $self->load_resolver();

  # How many IPs max you check in the received lines
  my $checklast=$self->{conf}->{num_check_received};

  my @fullips = map { $_->{ip} } @{$self->{relays_untrusted}};

  # Make sure a header significantly improves results before adding here
  # X-Sender-Ip: could be worth using (very low occurance for me)
  # X-Sender: has a very low bang-for-buck for me
  my @originating;
  for my $header ('X-Originating-IP', 'X-Apparently-From') {
    my $str = $self->get($header);
    next unless defined $str;
    push (@originating, ($str =~ m/($IP_ADDRESS)/g));
  }
  return 0 unless (scalar @fullips + scalar @originating > 0);

  # Let's go ahead and trim away all Reserved ips (KLC)
  # also uniq the list and strip dups. (jm)
  my @ips = ();
  my %seen = ();
  foreach my $ip (@fullips) {
    next if (exists ($seen{$ip})); $seen{$ip} = 1;
    if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@ips, $ip); }
  }

  dbg("From_MX: Got the following IPs: ".join(", ", @ips));

  if (scalar @ips + scalar @originating > 0) {
      # create a new list to avoid undef errors
      my @newips = ();
      my $i; for ($i = 0; $i < $checklast; $i++) {
        my $ip = pop @ips; last unless defined($ip);
        push (@newips, $ip);
      }
      # add originating IPs as untrusted IPs
      for my $ip (@originating) {
        next if (exists ($seen{$ip})); $seen{$ip} = 1;
        if (!($ip =~ /${IP_IN_RESERVED_RANGE}/o)) { push(@newips, $ip); }
      }
      @ips = @newips;
  }
  dbg("From_MX: But only inspecting the following IPs: " . join(", ", @ips));


  # Try check_mx_attempts times to protect against temporary outages.
  # sleep between checks to give the DNS a chance to recover.
  my @mxips = ();
  for my $i (1..$self->{conf}->{check_mx_attempts}) {
    my @mx = Net::DNS::mx($self->{res}, $from);
    dbg ("From_MX: DNS MX records found: " . scalar(@mx));

    foreach my $mx (@mx) {
        my $query = $self->{res}->search($mx->exchange);
        if ($query) {
                my $count = 0;
                foreach my $rr ($query->answer) {
                        if ($rr->type eq "A") {
                                $count++;
                                push (@mxips, $rr->rdatastr);
                        }
                }
                dbg ("From_MX: DNS A records found: $count");
        }
    }
    if ($i < $self->{conf}->{check_mx_attempts}) {sleep
$self->{conf}->{check_mx_delay}; };
  }
  dbg("From_MX: DNS MX A records: " . join(", ", @mxips)) if (scalar @mxips > 0);
  foreach my $ip (@ips) {
        my $flag = 1;
        foreach my $mxip (@mxips) {
                $flag = 0 if ($ip eq $mxip);
        }
        return 1 if ($flag eq 1);
  }

  return 0;
}
Comment 5 Malte S. Stretz 2004-05-23 08:29:08 UTC
I'm not sure what that patch is about, but please recreate it in unified 
format (diff -u) and create an attachment (don't copy and paste). 
Comment 6 Sergey Shmelev 2004-05-24 03:00:36 UTC
>I'm not sure what that patch is about, but please recreate it in unified 
>format (diff -u) and create an attachment (don't copy and paste). 

Ok, I commit new patch (diff -u) during some days after testing
Comment 7 Sergey Shmelev 2004-05-24 23:45:34 UTC
Created attachment 1970 [details]
New Evaluation Test that perform sender IPs checking in list of MX From Domain

According 
ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt
it is recomended to Register all sender MTAs within DNS as MX records

This patch create new evaluation test, that perform this checking.

About 98% of spam flag on this test
About 50% of ham flag off this test
Comment 8 Theo Van Dinter 2004-05-25 07:30:26 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

On Mon, May 24, 2004 at 11:45:35PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> According 
> ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt
> it is recomended to Register all sender MTAs within DNS as MX records

Wow.  That's such an ugly hack.

IMO, just use SPF.  It's more "correct" than the above draft, and it
doesn't overload an RR (MX for instance) to mean more than it's supposed
to mean.

> About 98% of spam flag on this test
> About 50% of ham flag off this test

So in short, the rule sucks...?

Comment 9 Sergey Shmelev 2004-05-25 09:18:28 UTC
>IMO, just use SPF.  It's more "correct" than the above draft, and it
>doesn't overload an RR (MX for instance) to mean more than it's supposed
>to mean.


Hmmm, where can I read abount SPF?
I think, DNS requests will be cached effectivly...

>> About 98% of spam flag on this test
>> About 50% of ham flag off this test

>So in short, the rule sucks...?

No! It rule like AWL, like good whitelist - it very effective, may be we shold
revert flag and assign score<0 (-1.6 for example)

Can anybody perform learning on big corpus hams and spams?


I think this rule will be in top20 best rules.
Comment 10 Theo Van Dinter 2004-05-25 09:44:51 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

On Tue, May 25, 2004 at 09:18:29AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Hmmm, where can I read abount SPF?

http://spf.pobox.com/

> I think, DNS requests will be cached effectivly...

That's not my issue.  "MX" is meant to specify who receives mail for
given domain/host/etc.  It's now suggested to overload MX to also mean
who can send mail for give domain/host/etc.

Also, if I have several machines, all of which are allowed to send
mail for my domain, I have to add more 1 record per-domain.  Not only
is that asking for errors, it's also making the DNS response larger --
which could mean that it'll have to go TCP instead of UDP, which has a
huge impact on DNS performance.  With SPF, I only need to add 1 record
which allows them all.

> >> About 98% of spam flag on this test
> >> About 50% of ham flag off this test
> 
> No! It rule like AWL, like good whitelist - it very effective, may be we shold
> revert flag and assign score<0 (-1.6 for example)

I must be missing something...   If the rule hits on 98% of spam, but
also 50% of ham, the rule sucks.  It'll have an S/O ratio of somewhere
in the 0.6-0.7 range (assuming equal amounts of ham/spam).

> I think this rule will be in top20 best rules.

This isn't an anti-spam rule, it's an anti-joejobbing rule, so it's not
going to be in the top 20 anti-spam rules.

For instance, this rule would allow "bugzilla.spamassassin.org" to
send mail as "kluge.net" (it's a backup MX for my domain), whereas I
would consider that a forgery since it's not an outgoing mail server
for my domain.  With SPF I can clearly specify who I allow sending mail
using my domain.  The reverse is also true (kluge.net would be allowed
to send as bugzilla.spamassassin.org...)

Comment 11 Justin Mason 2004-05-25 10:32:22 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Yes.  Please read about SPF, it's a much cleaner system than this,
(and also uses DNS as transport FWIW).

Also there's no way we could use a rule for catching spam if it hits
50% of ham... that's way too false-positive-prone.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAs4MOQTcbUG5Y7woRAvXnAJoDksmgLFu1kurTd0I1Wka2w3AFTQCfV8lR
GuSJ0yH8MtKabNY1/1xpx1g=
=loVB
-----END PGP SIGNATURE-----

Comment 12 Daniel Quinlan 2004-05-25 11:07:31 UTC
Closing as LATER.  We'll implement this if it picks up steam and seems to be
what the world is going to gather around as a standard for sender verification.

I think SPF or a follow-on to SPF is much more likely to take hold.
Comment 13 Daniel Quinlan 2004-05-25 11:08:40 UTC
I should add, thanks a lot for contributing the bug and the code.  I hope
this hasn't deterred you from making future contributions.  :-)
Comment 14 Kelsey Cummings 2004-05-25 11:44:35 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

FWIW, I had to add MX records for all of my outbound mail host to help
assure delivery.  I think it's a pretty silly idea but I have had to
support it on my end or risk getting email rejected.  (I have asymetric
mail flow, with clusters for inbound, outbound, local, etc.)

Comment 15 Sergey Shmelev 2004-05-25 23:59:48 UTC
>On Tue, May 25, 2004 at 09:18:29AM -0700,
>bugzilla-daemon@bugzilla.spamassassin.org wrote:
>> Hmmm, where can I read abount SPF?

>http://spf.pobox.com/

Thanx. It takes some years before this technoglogy became in masses.
My rule works now.

>> I think, DNS requests will be cached effectivly...

>That's not my issue.  "MX" is meant to specify who receives mail for
>given domain/host/etc.  It's now suggested to overload MX to also mean
>who can send mail for give domain/host/etc.

Yes. Like many other rules -  RCVD_IN_*  , DNS_FROM_RFCI_DSN, NO_DNS_FOR_FROM

>Also, if I have several machines, all of which are allowed to send
>mail for my domain, I have to add more 1 record per-domain.  Not only
>is that asking for errors, it's also making the DNS response larger --
>which could mean that it'll have to go TCP instead of UDP, which has a
>huge impact on DNS performance.  With SPF, I only need to add 1 record
>which allows them all.

SPF rules is less effective now, then my rule NO_MX_FOR_FROM.

It shold be only one criteria to reject or commit new rules - Effectifly of Rule.

>> >> About 98% of spam flag on this test
>> >> About 50% of ham flag off this test
>> 
>> No! It rule like AWL, like good whitelist - it very effective, may be we >shold
>> revert flag and assign score<0 (-1.6 for example)

>I must be missing something...   If the rule hits on 98% of spam, but
>also 50% of ham, the rule sucks.  It'll have an S/O ratio of somewhere
>in the 0.6-0.7 range (assuming equal amounts of ham/spam).

According your logic AWL is very sucks.

AWL create scores for example between  -4 and +2
with parameters
auto_learn_threshold_nonspam 1
auto_learn_threshold_spam 13.0
reshold_spam 13.0


it is equivalent to rewritten AWL (+4 to all)

AWL  will create scores between  0 and +6
with parameters
auto_learn_threshold_nonspam 5
auto_learn_threshold_spam 18.0
required_hits 11


AWL generate +4 for many many hams. This rule sucks!

It is your logic.
Why are use AWL? Why AWL implemented in SA?
AWL sucks more than my rule.


>> I think this rule will be in top20 best rules.

>This isn't an anti-spam rule, it's an anti-joejobbing rule, so it's not
>going to be in the top 20 anti-spam rules.

Check it!

>For instance, this rule would allow "bugzilla.spamassassin.org" to
>send mail as "kluge.net" (it's a backup MX for my domain), whereas I
>would consider that a forgery since it's not an outgoing mail server
>for my domain.  With SPF I can clearly specify who I allow sending mail
>using my domain.  The reverse is also true (kluge.net would be allowed
>to send as bugzilla.spamassassin.org...)


My rule NO_MX_FOR_FROM works not because new internet draft.

NO_MX_FOR_FROM works and effective _now_ because originaly most email senders
use only one email account and this account exist on server with MX records and
they send emails though this server.
Comment 16 Sergey Shmelev 2004-05-26 00:09:52 UTC
>Yes.  Please read about SPF, it's a much cleaner system than this,
>(and also uses DNS as transport FWIW).

SPF need to be implemented at many internet servers.


>Also there's no way we could use a rule for catching spam if it hits
>50% of ham... that's way too false-positive-prone.

I think, this rule more effective that AWL (false-positive-prone), 
DNS_FROM_RFCI_DSN, NO_DNS_FOR_FROM

I think, effectivly (points after learning) - should be main criteria to reject
rules.

I am looking forward that somebody check effectifly of this rule.
Comment 17 Sergey Shmelev 2004-05-26 00:17:25 UTC
>We'll implement this if it picks up steam and seems to be
>what the world is going to gather around as a standard for sender verification.


>I think SPF or a follow-on to SPF is much more likely to take hold.

Are you shure that SPF going to be standard for sender verification?

NO_MX_FOR_FROM effective now and more effective that implemented SPF. 
Because many servers not use SPF, but checked ok though NO_MX_FOR_FROM

NO_MX_FOR_FROM is heiristic that work now.

If it will be standard - I will reject emails on MTA header checking

Comment 18 Sergey Shmelev 2004-05-26 00:22:33 UTC
According your logic AWL is very sucks.

AWL create scores for example between  -4 and +2
with parameters
auto_learn_threshold_nonspam 1
auto_learn_threshold_spam 13.0
required_hits 7


it is equivalent to rewritten AWL (+4 to all)

AWL  will create scores between  0 and +6
with parameters
auto_learn_threshold_nonspam 5
auto_learn_threshold_spam 17.0
required_hits 11


AWL generate +4 for many many hams. This rule sucks!

It is your logic.
Why are use AWL? Why AWL implemented in SA?
AWL sucks more than my rule.
----------------------

I think AWL and my rule not sucks.

This is whitelisted rules, like 0.001 scores in Bayes rules.
They are diffrent. but they are very effective.


Comment 19 Sergey Shmelev 2004-05-26 01:08:52 UTC
Created attachment 1972 [details]
New Evaluation Whitelisted Test that perform sender IPs checking in list of MX From Domain

New Evaluation Test that perform sender IPs checking in list of MX From Domain

This patch create new whitelist test, that perform this checking.

score RCVD_IN_MX_FROM_LIST 0 -2.105 0 -2.650

Effective score need to be recomputed on big corpus/servers
Comment 20 Theo Van Dinter 2004-05-26 09:31:04 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

On Tue, May 25, 2004 at 11:59:49PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> >That's not my issue.  "MX" is meant to specify who receives mail for
> >given domain/host/etc.  It's now suggested to overload MX to also mean
> >who can send mail for give domain/host/etc.
> 
> Yes. Like many other rules -  RCVD_IN_*  , DNS_FROM_RFCI_DSN, NO_DNS_FOR_FROM

None of those overload a DNS RR types.

RCVD_IN_*, and DNS_FROM_RFCI_DSN all request A records, which are for
doing name->IP lookups.  Via the rule, for a name request you get back
an IP.  No overloading (what you do with the IP is up to you).

NO_DNS_FOR_FROM looks for either an A or MX record to exist, regardless
of the data therein.  So no overloading.

The draft/your rule overloads "MX", which is meant to specify "when
sending mail to this host/domain, try delivering mail to a specific list
of mail orders using a specific set of priorities".  The overload is
that "MX" is now also supposed to specify which servers can send mail
as the host/domain.  A "correct" solution would be to make a new RR,
say "MS" which you can then use to specify senders since they are quite
commonly different than the hosts that receive mail.  This is what SPF
does, right now using TXT records, but there is talk about creating a
new RR for this purpose.

Comment 21 Justin Mason 2004-05-26 09:58:37 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>The draft/your rule overloads "MX", which is meant to specify "when
>sending mail to this host/domain, try delivering mail to a specific list
>of mail orders using a specific set of priorities".  The overload is that
>"MX" is now also supposed to specify which servers can send mail as the
>host/domain.  A "correct" solution would be to make a new RR, say "MS"
>which you can then use to specify senders since they are quite commonly
>different than the hosts that receive mail.  This is what SPF does, right
>now using TXT records, but there is talk about creating a new RR for this
>purpose.

One key problem with this is that most (note: not some, MOST) large ISPs
use separate servers for incoming and output SMTP traffic.   So in
other words this test will fire on everything from:

    yahoo.com
    gmail.com
    msn.com
    aol.com

etc. etc.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAtMykQTcbUG5Y7woRAqAzAKCoqzmxQbsL1UHcJEA3kfbca2aTzgCfaGJa
ifdoaYQptdir1LZ0q57DyOA=
=ZAPI
-----END PGP SIGNATURE-----

Comment 22 Sergey Shmelev 2004-05-26 10:04:15 UTC
>A "correct" solution would be to make a new RR,
>say "MS" which you can then use to specify senders since they are quite
>commonly different than the hosts that receive mail.  This is what SPF
>does, right now using TXT records, but there is talk about creating a
>new RR for this purpose.

Why special priority 65535 for this MX records is not "correct" solution?

What rules should be included in SpamAssasin - that represent correct "solution"
or that effective on current traffic?
Comment 23 Sergey Shmelev 2004-05-26 10:09:49 UTC
>One key problem with this is that most (note: not some, MOST) large ISPs
>use separate servers for incoming and output SMTP traffic.   So in
>other words this test will fire on everything from:

No, now it is _whitelist_ rule. (I commit new patch)

Score on your servers will be zero.


Comment 24 Theo Van Dinter 2004-05-26 12:32:55 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

On Wed, May 26, 2004 at 12:09:53AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I am looking forward that somebody check effectifly of this rule.

Just so we can clear this up, I started running this on my past 2 day's
worth of mail.   The original code 1) doesn't work directly with 3.0,
and after fixing that, 2) got 0 hits for me after ~3k mails.  So I
went through and streamlined it a bit, fixed a few logic issues, etc.
It's now getting some hits, though not a lot.  I'll send up a patch for
the EvalTest shortly.

Thing to note: The IETF draft is meant to be an anti-forging system,
but the rule is meant to be a net-based whitelist -- basically: don't
forge, get negative points.  They're distinctly different ideas.
So let's ignore the draft and the MX overload issues.

After 1000 mails, I stopped, since I only had 172 ham in the past 2 days
(27.5k spam btw).  I didn't want the results to get too skewed.

92 ham hits, 15 spam hits.  The ~50% ham hit rate is nice, but since there
are spam hits, there's no way you would want to use this as a whitelist.
More on that below.

The rule, as proposed, looks through all the untrusted received headers,
and uses the header From domain, so it's trivial for a spammer to
put in any matching forged From and Received header and hit the rule.
Definitely not a good whitelist.

If you limit the rule to the first untrusted received header only,
it becomes harder to forge, but makes the results even worse as well:
29 ham hits, 15 spam hits.

Compare this to SPF PASS results over the same 1000 mails:  15 ham hits, 0
spam hits.  SPF FAIL results: 0 ham, 29 spam.  SPF requires much fewer DNS
requests and is therefore significantly faster than the MX version, BTW.

In the end, we get back to the fact that anti-forging and anti-spam are
2 distinctly different ideas.  The MX rule (as well as SPF PASS rules)
don't indicate anything about spaminess.  All they indicate is that the
sender address isn't forged.  This means you still have to scan the
message, and spammers can simply go back to not forging the address
(which is the goal of such mechanisms).  Therefore, having it as a
whitelist is just incentive for the spammers to stop forging (good),
which will then help them bypass the spam filters (bad).  This is why
things like SPF_PASS are forced to a score of -0.001.  It's good as an
indicator for other rules/used in meta rules, but not as a whitelist.

What you really want to do is flag when you know forging occurs -- give
a penalty if an address is forged.  You definitely can't do that right
now with the MX version or the proposed rule -- the vast majority of MX
records indicate the receiver _only_ (which is what MX is meant to do).
Therefore, you can't have a rule that says "add 4 points if the sender IP
isn't a MX for the domain".  It would flag so much ham as to be useless.

With SPF however, the owner of the domain can specify how they want
others to treat mail from their domain: there's no overloading, there's
no guessing, and it works right now.

Comment 25 Theo Van Dinter 2004-05-26 12:36:20 UTC
Created attachment 1974 [details]
version that works in 3.0, streamlined a bit, etc.
Comment 26 Theo Van Dinter 2004-05-26 12:39:14 UTC
I should note that att 1974 is the "first untrusted" version.  To have it try all the untrusted headers, 
change:

  my $checklast = @originating+1;

the 1 would become $self->{conf}->{check_mx_attempts} ...
Comment 27 Sergey Shmelev 2004-05-26 14:54:14 UTC
Than you for patch, Theo!

>2) got 0 hits for me after ~3k mails. 

It strange, because bugzilla@spamassasin flag on this rule :)
You did't received messages from us with first 3k emails?

Lets define "Effectivly of Tested Rule" or "Strong of Rule"
(I suppose that this criteria reject or commit new rules)

The first and main criteria is ham/spam ratio for whitelist rules
and spam/ham ratio for blacklist rules.

The second criteria is "popularity" or "wide" - ham/totalhams from whitelist
rules and spam/totalspams for blacklist rules.

For better quality all coefficients must be > 200.

There are rules that have biggest Ratio (big scores) but work seldom.
There are rules that have small Ratio (small scores) but they work almost in
every message and there are many rules of this type.

The total "Rule Strong" I define as production Ratio*Wide

Let see on my rule in your corpus 
Ratio - 92/15 = 6  it very good! but 15<200 (low level of accuracy)
Wide  - 92/172 = 0.5  - very good  92<200

SPF Pass
Ratio  - 15/0 = undefined 15<200, and 0<200 - low level of accuracy
Wide -  15/172 = 0.09,   not bad  15<200, 172<200

Total Strong My Rule in your corpus = 6*0.5 = 3
Total Strong SPF - undefined.

I think SPF very good rule, but not wide now.
Let check our whitelist rules whith Bayes00 and Bayes01!

On my server in russia I think I will have on SPF Pass 0 hams and 0 spams.
zero divide zero = ?

IF SPF became popular, Wide will rise, virus-spamers will send mail correctly
(you are right!), and Ratio will be fall down.

And Total Wide*Ratio will be about constant!

*But if we will be have many whitelist rules (as many as blacklist now()) total
effect from all of them will be very strong.

Whitelist and blacklist rules with the same Strong Wide*Ratio have the equal
possibility to divide mail into spam and ham.

I think SpamAssasin developers should use "Wide*Ratio" as a main criteria to
accept or reject new rules (and remove old rules). 
And here, there is no diffrence between whitelist and blacklist rules.

I remain in opinion, that if we will create "Wide*Ratio" rating for whitelist
and blacklist rules, my rule will be in top20.





Comment 28 Theo Van Dinter 2004-05-26 16:18:20 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

On Wed, May 26, 2004 at 02:54:15PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> >2) got 0 hits for me after ~3k mails. 
> 
> It strange, because bugzilla@spamassasin flag on this rule :)
> You did't received messages from us with first 3k emails?

I don't put spam-related mails into my corpus or filter them through
SpamAssassin in general.

> I remain in opinion, that if we will create "Wide*Ratio" rating for whitelist
> and blacklist rules, my rule will be in top20.

I think you missed the point of my previous mail.  So I'm not going to
post any more in the ticket after this mail.

Your proposed rule is trivially forgable, meaning it will absolutely not
be considered for general use, even if it was at 100% ham right now.
Those types of rules are targeted by spammers and become useless as
time goes on.  This is why we have very strict requirements for "nice"
rules now.

The "first untrusted only" version is harder to forge, but there are
still a relatively high percentage of spam hits (~1.8% in my test).

The reason for this is that your rule deals with "forging", and not
spam detection.  It is perfectly valid for spam to be hit by this rule
-- in fact that's the end goal of such rules, to stop address forgery.
Therefore it is completely not appropriate to have as a whitelist in
and of itself.

Comment 29 Sergey Shmelev 2004-05-26 22:45:48 UTC
>I think you missed the point of my previous mail.  So I'm not going to
>post any more in the ticket after this mail.

>The rule, as proposed, looks through all the untrusted received headers,
>and uses the header From domain, so it's trivial for a spammer to
>put in any matching forged From and Received header and hit the rule.
>Definitely not a good whitelist.

Thanx for your time and I am sorry.

You are right about my first patch.
In my second patch I check all IPs in header.
All IP (not only one of them) should be in MX lists.

Lets speak about "forging" From

Spamers use hacked servers, many of that dont send or received mail. There are
no domains, mx lists of that have this servers. 

It is impossible to forge "From" from this servers.

Spamers and viruses should create a map "sender IP"->"correct From" to forge rule.

It is very hard task. It is impossible for many spamer servers.
The percentage of spams for rule can rise, but remain at low level.

The potencial "forging" of rule is low.

If you use AWL,BAYES_00,RCVD_IN_BSP_TRUSTED and plan to use SPF_CHECK, tell me
criteria, why this rules better than my? 
They have low level of "forging"? They are hard to forge?

How do you compute "the level of potential forging"?

Lets define a new criteria of accepting new whitelisted rules:
 
Forging*Wide*Ratio

If we cannot define "Forging", lets use current Wide*Ratio.

   
Comment 30 Sergey Shmelev 2004-05-27 10:50:37 UTC
I think we should first check SPF TXT records for From domain.

If SPF exists - we do not check thus rule.
If SPF exists - we check MX records.

Rule will be fire if there is no SPF records _AND_ senderIP is not in the list
From domain MX records.

By the time a quantity of SPF domains will rise and this rule will be work more
seldom.



Comment 31 Sergey Shmelev 2004-05-27 11:18:05 UTC
>I should add, thanks a lot for contributing the bug and the code.  I hope
>this hasn't deterred you from making future contributions.  :-)

Possibly has. :)

The Problem is that I dont undestand your criteria for accepting new rules.

:(

This should be a mathematical criteria, statistical formula and then
contributers like me can check new rules and contribute them after statistical
checking.


It cant for example Wide*Ratio as I defined.
Or Wide*Ratio*PotencialForging
Or Wide*Ratio*PotencialForging*Correlation

May be something other... 
But it must be the formula, not words "the rule should not be sucks"

NOW I dont undestand why you use some rules and reject other...

Comment 32 Daniel Quinlan 2004-05-27 12:16:48 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

> NOW I dont undestand why you use some rules and reject other...

Whitelist rules cannot be easily defeated by spammers - yours is.
Whitelist rules need to have an S/O ratio below 0.1% - yours is not.
Whitelist rules need to hit at least 0.1% of ham - yours does.

1 out of 3 -> very bad
2 out of 3 -> not good
3 out of 3 -> likely we'll use unless it's very expensive, violates
              a policy, or overlaps too much with another rule

Blacklist rules are harder to quantify since some ham hits are okay and
defeatability is not as big of an issue.  Roughly speaking...

Blacklist rules should hit > 1% of spam
Blacklist rules should hit < 0.5% of ham
Blacklist rules should have an S/O ratio of 0.95 or better
Blacklist rules should not overlap other rules, especially better rules,
  too much
Blacklist rules should not be too expensive to run or use.
Blacklist rules cannot violate a policy (unwritten or written) such as
  intentionally discriminating against a particular country.

Daniel

Comment 33 Daniel Quinlan 2004-05-27 12:18:33 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

P.S. We reserve the right to not accept any code for any reason since it
is not possible to enumerate every reason why code is not going to be
accepted.  This is common sense, but I think it needs to be pointed out
considering the current line of discussion.

Comment 34 Sergey Shmelev 2004-05-27 12:56:23 UTC
Thank you for answer!

>Whitelist rules cannot be easily defeated by spammers - yours is.
Many hacked servers have IPs, that not exists in any mx records of any domain.
Spamers use the nets of thousands hacked computers to send spam.

My rule can be forged easily as SPF rule, that included as plugin.

>Whitelist rules need to have an S/O ratio below 0.1% - yours is not.
My rule at corpus 50% hams and %50 spams have R/0 about 5%
I think it is good. 
If I will check SPF records in my rule -  S/0 will be better.

Could you tell me R/0 of BAYES_00,BAYES_01,BAYES_05,BAYES_10, BAYES_25,
AWL (AWL < -1), RCVD_IN_BSP_TRUSTED,HABEAS_USER and SPF_CHECK (ham rules) on
your big corpus?

I think, they have R/0 more than 0.1%

>Whitelist rules need to hit at least 0.1% of ham - yours does.

50% of ham - is very effective. 50% is more that 0.1% in 500 times.
My rule has very good Hit Rate and it can compensate R/0
My rule totaly uncorelated with other rules - they not correlate with SPF, for
example.


>Blacklist rules should have an S/O ratio of 0.95 or better
Ok, but it is assymetric to hams rules 1 - 0.1% = 99,9% = 0.999 
Are you shure? :)


Comment 35 Daniel Quinlan 2004-05-27 13:35:39 UTC
Subject: Re:  New Rule: Checking sender IP against MX records From: user@domain

> Many hacked servers have IPs, that not exists in any mx records of any
> domain.  Spamers use the nets of thousands hacked computers to send
> spam.
> 
> My rule can be forged easily as SPF rule, that included as plugin.

No.  We have some SPF whitelist rules with very very very low scores
(and they may be removed), but the only rules with appreciable scores
are positive (not whitelist) and are working increasingly well over
time.
 
> Could you tell me R/0 of BAYES_00,BAYES_01,BAYES_05,BAYES_10,
> BAYES_25, AWL (AWL < -1), RCVD_IN_BSP_TRUSTED,HABEAS_USER and
> SPF_CHECK (ham rules) on your big corpus?

You can look this up yourself.  Follow the link to collated results from
the hacking on SpamAssassin web page.
 
I don't mind answering questions, but I need to be clear, your MX rule
is not going into SpamAssassin.  This is my last comment on this bug.

Comment 36 Sergey Shmelev 2004-05-27 14:49:09 UTC
>You can look this up yourself.  Follow the link to collated results from
>the hacking on SpamAssassin web page.

Thank you for intresting link! 

>I don't mind answering questions, but I need to be clear, your MX rule
>is not going into SpamAssassin.  This is my last comment on this bug.

Thank you for your time!

First, I am going critisize your criteria, Can I?
I think, If we will critisize something, this will be better by the time.

Your criteria is very strong. 
All hams rules can be forged by spamers. All of them.
AWL could be forged by checking active state of email accounts.
BAYES rules can be forged by sending anekdots or japan hoku.
SPF and my rule can be forged by seaching "right" emails.

To perform your criteria you should delete all ham rules right now.

But I think you strategy to reject ham rules is wrong.
I think ham rules with R/0 more than 0.25 are good and can be usefull
IMHO If we will  have 100 uncorrelated ham rules with 0.25 R/0 ratio with big
Hit Rate - they create strong effect. This is mathematical fact. I am not going
prove it. Trust me.

Now I am going show that many rules from stable version do not hit your criteria.

For example let see at ham BAYES rules and user "jm", that have ham=spam in corpus.
                             R/0
 44.692   1.0003  88.3846    0.011   0.82    0.00  BAYES_00:jm
  0.138   0.1875   0.0875    0.682   0.44    0.00  BAYES_05:jm
  0.213   0.2626   0.1625    0.618   0.42    0.00  BAYES_10:jm
  0.225   0.4126   0.0375    0.917   0.36    0.00  BAYES_25:jm

There are no ham BAYES rules that perform your criteria  R/0 < 0.001
R/0
0.011 more 0.001
0.682 more 0.001
0.618 more 0.001
0.917 more 0.001!
My rule have R/0 about 0.05, but you are reject it.

SPF_CHECK
0.700   0.6002   0.8002    0.429   0.38   -0.00  SPF_PASS:jm
0.429 more 0.001 

I did't find AWL rules.

So.. why we use ham BAYES rules?

Are there people that dont said "this is my last comment"?

Dont trust intuition! Trust mathematical formulas!
Comment 37 Duncan Findlay 2004-05-27 20:28:31 UTC
> First, I am going critisize your criteria, Can I?

You just did.

>I think, If we will critisize something, this will be better by the time.

Only if you're right. (You aren't)

> AWL could be forged by checking active state of email accounts.
> BAYES rules can be forged by sending anekdots or japan hoku.
> SPF and my rule can be forged by seaching "right" emails.

AWL and BAYES can not be forged, since they are dependent on site-specific
e-mail. I don't think you understand how BAYES rules work. They can not be
forged, although they could perhaps for a specific user, they can not be forged
on a widescale basis.

I don't think you understand SPF either. (Or I don't understand your comment)

> To perform your criteria you should delete all ham rules right now.

We have deleted all ham rules that are not forgeable. We had ham rules in
previous releases under less stringent criteria, and they were simply abused by
spammers.

> But I think you strategy to reject ham rules is wrong.
> I think ham rules with R/0 more than 0.25 are good and can be usefull
> IMHO If we will  have 100 uncorrelated ham rules with 0.25 R/0 ratio with big
> Hit Rate - they create strong effect. This is mathematical fact. I am not
> going prove it. Trust me.

I don't know what you're talking about with R/O. But we have seen that any ham
tests that can be abused by spammers will be abused by spammers. If we have 100
ham rules, spamassassin will suck, as spammers will be able to get every message
by it.


> For example let see at ham BAYES rules and user "jm", that have ham=spam in
> corpus.

BAYES tests CAN NOT BE FORGED BY SPAMMERS!

> My rule have R/0 about 0.05, but you are reject it.

It is not as good a solution to the problem as SPF. It is not really a standard
(or certainly not one thats gained any support)

> So.. why we use ham BAYES rules?

BAYES tests CAN NOT BE FORGED BY SPAMMERS!

> Are there people that dont said "this is my last comment"?

Yes.

> Dont trust intuition! Trust mathematical formulas!

We need to balance mathematical formulas with intuition. Mathematical formulas
can't model the way spammers react to our creation of easy to forge ham rules.
(Or at least, if they can, we don't know them...) As a result, these rules get
lower than optimal scores, and we all get burned.

Our scores are based on mathematics (have you looked at the perceptron)?

I think it's very clear from the response you have received that this rule is
not going in to SpamAssassin. We encourage you to continue to contribute, but
please don't continue to waste our time by arguing this point with us. I'm
tempted to mark this bug CLOSED (since I don't think you can add comments to
CLOSED bugs) but I'm not going to right now.
Comment 38 Sidney Markowitz 2004-05-27 22:10:33 UTC
> AWL could be forged by checking active state of email accounts.
> BAYES rules can be forged by sending anekdots or japan hoku.
> SPF and my rule can be forged by seaching "right" emails.

I think the points about these could be made more explicit.

AWL can no longer be forged since we added the Received header to the
information along with the From address. A spammer not only has to use a valid
From address that you receive ham from (assuming they can customize the spam to
you that way) but it has to come from the same server that your ham from that
email address comes from.

BAYES cannot be forged by sending jokes or haiku unless a large portion of your
ham consists of similar jokes or haiku. Bayes looks for words that are common
only in your ham or only in your spam. Random infrequently seen words are not
used. A spammer would need to have access to your personal Bayes database to be
able to forge ham.

SPF is not a ham rule. A message can fail the SPF test, which detects spam, or
it can pass, which scores -0.001 to allow it to be used in a meta rule, or the
result can be unknown. I'm not sure what you mean by forging by 'searching
"right" emails' but if you think that a spammer can just copy headers from good
email, that's wrong. The test looks for "trusted relays" and does not believe
the information from untrusted ones. So spf_pass cannot be forged. It is
possible for a spammer to send spam through a server that is properly configured
with SPF, and that mail would still be spam. That's why it is not a ham rule.
Comment 39 Sergey Shmelev 2004-05-28 02:17:49 UTC
>AWL and BAYES can not be forged, since they are dependent on site-specific
>e-mail. I don't think you understand how BAYES rules work. They can not be
>forged, although they could perhaps for a specific user, they can not be forged
>on a widescale basis.

I don't think you understand how spamers work. They Can forge Bayes rules for
text body.

From mathematical point of view there is no diffrence between ham and spam rules.
From spamers point of view there is no diffrence between forging ham or spam rules.

Spam rules can be easily "forging" as a ham rules.
Therefore we should have the same criteria for ham and spam rules.
The same r/o and hitrate criteria.

BAYES, AWL and more rules can be easily forge next way:

spamers hack many users computers and get old emails, that was sented corretly
though correctly servers with correctly "From".

They change dates in this emails, add spamer text or spamer image and send it.
This forge SPF_CHECK,AWL,BAYES and my rule.

Therefore, any rule, spam or ham, can be forged by this way.

And therefore we should not reject rules with r/o from 0.01 to 0.24 and from
0.75 to 0.95.

If we will have many uncorrelated rules of this type - and they will have big
hit rate - they 
can be effective if work together.

Bayes rules work the same way - bayes rules do not reject tokes whith
probabylity between 0.05 and 0.95, all this tokens work, and therefore bayes
rules work!

If we will reject tokens with probability 0.05 and 0.95 - bayes rules will drop
effect.


>We have deleted all ham rules that are not forgeable. We had ham rules in
>previous releases under less stringent criteria, and they were simply abused by
>spammers.

Spamers abuse not only ham rules. They abuse many spam rules. From mathematical
point of view there is no diffrence between ham and spam rules. If we have many
spam rules whith big scores - and spamers can abuse them - they will do it.

The forging problem exist for spam rules too.

>It is not as good a solution to the problem as SPF. It is not really a standard
>(or certainly not one thats gained any support)

Yes, not good, but have mathematicaly effect according your mathematical
criteria. May be you can formalize term "good solution" in mathematical terms?

>BAYES tests CAN NOT BE FORGED BY SPAMMERS!

According your statistic now BAYES_20 forged by spammers! 
They have score -1.428 and "bad" R/O about 0.60

>Our scores are based on mathematics (have you looked at the perceptron)?
I dont trust perceptron, I trust clear statistical mathematical methods.

>Mathematical formulas
>can't model the way spammers react to our creation of easy to forge ham rules.

If you are not proffessional spamer, how do you know, what easy and what not easy? 
I have my criteria for "easy" - a potencial quantity of spamer servers that can
forge the rule.

Many many spamer servers can not forge my rule. 
Therefore my rule not easy and have biggest Hit Rate (if we compare it with SPF)

>I think it's very clear from the response you have received that this rule is
>not going in to SpamAssassin. We encourage you to continue to contribute, but
>please don't continue to waste our time by arguing this point with us. I'm
>tempted to mark this bug CLOSED (since I don't think you can add comments to
>CLOSED bugs) but I'm not going to right now.

May be I ask stupid questions that exist in developers FAQ?
You can do it in any moment, but are you shure you are ringh in 100%?

Have we mathematical theory that tell us that you should reject ham rules with
R/0 < 0.001? 

Is there any links to mathematical articles in science journals?

I waste your time, but You waste my..

How do you know, that I am not rigt? Because I am alone with marginal opinion?

You trust your intuition, that have errors...

You can shut up me in any moment, it is your Right.

Thank you for your time.



Comment 40 Sergey Shmelev 2004-05-28 03:03:50 UTC
If we define Rating as a log(HitRate*Ratio)
where Ratio = Spam/Ham for spam rules and Ham/Spam for ham rules

then top20 will be next:

Rating  Hitrate R/0     Spam    Ham     Name
7.639   32.854  63.3    65.6746 0.0375  URIBL_WS_SURBL:jm
7.588   44.692  44.185  1.0003  88.3846 BAYES_00:jm
7.358   28.007  56.014  56.0140 0.0000  BAYES_99:jm
7.292   38.700  37.952  76.3911 1.0128  RCVD_IN_XBL:jm
7.254   26.600  53.2    53.2008 0.0000  MIME_HTML_ONLY:jm
7.226   34.117  40.331  67.5628 0.6752  RCVD_IN_DSBL:jm
6.741   23.651  35.801  46.9926 0.3126  RCVD_IN_SORBS_DUL:jm
6.528   33.783  20.257  65.3413 2.2256  HTML_MESSAGE:jm
6.387   23.989  24.776  47.0802 0.9002  URIBL_SBL:jm
6.322   16.685  33.37   33.3708 0.0000  MIME_BOUND_DD_DIGITS:jm
6.274   17.804  29.828  35.4214 0.1875  MIME_HTML_NO_CHARSET:jm
6.128   15.148  30.298  30.2989 0.0000  DCC_CHECK:jm
5.805   12.884  25.768  25.7689 0.0000  MIME_HTML_ONLY_MULTI:jm
5.764   12.791  24.933  25.5564 0.0250  BIZ_TLD:jm
5.65    33.058  8.603   5.9890  60.1275 AWL:jm
5.434   27.500  8.333   50.0000 5.0000  MY_RULE:jm
5.352   12.085  17.473  23.8089 0.3626  RCVD_IN_BL_SPAMCOP_NET:jm
5.347   10.515  19.981  20.9802 0.0500  BAYES_50:jm
5.216   12.035  15.305  23.5338 0.5376  RCVD_IN_NJABL_PROXY:jm
5.111   9.109   18.217  18.2171 0.0000  MSGID_SPAM_CAPS:jm


My rule in the top20!!
Comment 41 Sergey Shmelev 2004-05-28 03:08:52 UTC
top100 rating:

Rating  Hitrate R/0     Spam    Ham     Name
7.639   32.854  63.3    65.6746 0.0375  URIBL_WS_SURBL:jm
7.588   44.692  44.185  1.0003  88.3846 BAYES_00:jm
7.358   28.007  56.014  56.0140 0.0000  BAYES_99:jm
7.292   38.700  37.952  76.3911 1.0128  RCVD_IN_XBL:jm
7.254   26.600  53.2    53.2008 0.0000  MIME_HTML_ONLY:jm
7.226   34.117  40.331  67.5628 0.6752  RCVD_IN_DSBL:jm
6.741   23.651  35.801  46.9926 0.3126  RCVD_IN_SORBS_DUL:jm
6.528   33.783  20.257  65.3413 2.2256  HTML_MESSAGE:jm
6.387   23.989  24.776  47.0802 0.9002  URIBL_SBL:jm
6.322   16.685  33.37   33.3708 0.0000  MIME_BOUND_DD_DIGITS:jm
6.274   17.804  29.828  35.4214 0.1875  MIME_HTML_NO_CHARSET:jm
6.128   15.148  30.298  30.2989 0.0000  DCC_CHECK:jm
5.805   12.884  25.768  25.7689 0.0000  MIME_HTML_ONLY_MULTI:jm
5.764   12.791  24.933  25.5564 0.0250  BIZ_TLD:jm
5.65    33.058  8.603   5.9890  60.1275 AWL:jm
5.434   27.500  8.333   5.0000  50.0000 MY_RULE:jm
5.352   12.085  17.473  23.8089 0.3626  RCVD_IN_BL_SPAMCOP_NET:jm
5.347   10.515  19.981  20.9802 0.0500  BAYES_50:jm
5.216   12.035  15.305  23.5338 0.5376  RCVD_IN_NJABL_PROXY:jm
5.111   9.109   18.217  18.2171 0.0000  MSGID_SPAM_CAPS:jm
5.101   9.065   18.129  18.1295 0.0000  RCVD_DOUBLE_IP_SPAM:jm
5.089   9.127   17.784  18.2296 0.0250  MISSING_MIMEOLE:jm
5.058   9.602   16.38   19.0423 0.1625  RCVD_NUMERIC_HELO:jm
5.005   8.640   17.279  17.2793 0.0000  HELO_DYNAMIC_IPADDR:jm
4.975   12.447  11.629  23.8435 1.0503  RCVD_BY_IP:jm
4.93    8.321   16.641  16.6417 0.0000  LONGWORDS:jm
4.797   8.434   14.371  16.7063 0.1625  RCVD_IN_SORBS_HTTP:jm
4.689   7.377   14.753  14.7537 0.0000  X_MESSAGE_INFO:jm
4.592   7.027   14.053  14.0535 0.0000  FORGED_MUA_OUTLOOK:jm
4.567   6.939   13.878  13.8785 0.0000  DRUGS_ERECTILE:jm
4.496   9.015   9.952   17.2940 0.7377  RCVD_IN_SORBS_MISC:jm
4.475   6.670   13.163  13.3283 0.0125  RCVD_HELO_IP_MISMATCH:jm
4.257   8.778   8.05    16.5062 1.0503  RCVD_IN_RFCI:jm
4.148   5.626   11.252  11.2528 0.0000  HTML_BACKHAIR_8:jm
4.13    5.727   10.861  11.4043 0.0500  URIBL_SC_SURBL:jm
4.033   5.314   10.627  10.6277 0.0000  MSGID_RANDY:jm
3.947   5.089   10.177  10.1775 0.0000  UNIQUE_WORDS:jm
3.927   5.108   9.941   10.1900 0.0250  MISSING_OUTLOOK_NAME:jm
3.83    8.727   5.281   15.5164 1.9380  MSGID_FROM_MTA_ID:jm
3.822   5.039   9.071   9.9787  0.1000  URIBL_BE_SURBL:jm
3.788   5.464   8.087   10.6152 0.3126  HTML_60_70:jm
3.757   8.021   5.34    14.3536 1.6879  LINES_OF_YELLING:jm
3.548   4.170   8.339   8.3396  0.0000  DRUGS_ANXIETY:jm
3.539   4.151   8.302   8.3021  0.0000  DRUGS_ERECTILE_OBFU:jm
3.535   4.170   8.224   8.3271  0.0125  HTML_IMAGE_ONLY_08:jm
3.528   4.157   8.199   8.3021  0.0125  DRUGS_PAIN:jm
3.503   4.076   8.151   8.1520  0.0000  MSGID_DOLLARS:jm
3.481   22.843  1.423   18.2671 27.4194 FORGED_RCVD_HELO:jm
3.475   4.020   8.04    8.0405  0.0000  RCVD_IN_RSL:jm
3.45    4.551   6.923   8.8283  0.2751  DNS_FROM_RFCI_DSN:jm
3.433   4.614   6.718   8.9022  0.3251  HTML_50_60:jm
3.393   4.251   7.001   8.3146  0.1875  HTML_80_90:jm
3.33    3.738   7.476   7.4769  0.0000  DATE_SPAMWARE_Y2K:jm
3.256   4.689   5.536   8.7897  0.5876  MIME_QP_LONG_LINE:jm
3.252   3.595   7.189   7.1893  0.0000  RATWARE_ZERO_TZ:jm
3.217   3.532   7.065   7.0651  0.0000  RCVD_IN_NJABL_DIALUP:jm
3.195   3.495   6.989   6.9892  0.0000  HELO_DYNAMIC_HCC:jm
3.126   3.520   6.478   6.9642  0.0750  HTML_90_100:jm
3.111   3.351   6.701   6.7017  0.0000  HELO_DYNAMIC_DHCP:jm
3.103   3.338   6.676   6.6767  0.0000  URI_AFFILIATE:jm
3.04    3.395   6.162   6.7017  0.0875  PRIORITY_NO_NAME:jm
3.039   3.301   6.326   6.5641  0.0375  HTML_LINK_CLICK_HERE:jm
3.023   3.207   6.414   6.4141  0.0000  HELO_DYNAMIC_IPADDR2:jm
2.986   3.826   5.179   7.2518  0.4001  CLICK_BELOW:jm
2.98    3.138   6.277   6.2774  0.0000  SPF_HELO_FAIL:jm
2.967   3.570   5.446   6.8767  0.2626  HTML_40_50:jm
2.928   3.057   6.114   6.1140  0.0000  HTML_IMAGE_RATIO_02:jm
2.916   3.213   5.751   6.3266  0.1000  FORGED_YAHOO_RCVD:jm
2.844   3.476   4.944   6.6142  0.3376  HTML_70_80:jm
2.822   3.026   5.56    5.9772  0.0750  RCVD_IN_SBL:jm
2.742   2.907   5.338   5.7389  0.0750  URI_REDIRECTOR:jm
2.717   2.751   5.502   5.5021  0.0000  DNS_FROM_AHBL_RHSBL:jm
2.656   2.669   5.338   5.3388  0.0000  FORGED_OUTLOOK_TAGS:jm
2.604   2.601   5.201   5.2013  0.0000  MIME_BASE64_TEXT:jm
2.598   2.688   5.001   5.3138  0.0625  FORGED_HOTMAIL_RCVD2:jm
2.589   2.732   4.876   5.3638  0.1000  NORMAL_HTTP_TO_IP:jm
2.565   2.588   5.026   5.1519  0.0250  RCVD_IN_SORBS_SMTP:jm
2.561   2.926   4.426   5.5889  0.2626  HTML_30_40:jm
2.477   2.876   4.143   5.4389  0.3126  HTML_20_30:jm
2.462   2.738   4.286   5.2513  0.2251  MIME_BASE64_NO_NAME:jm
2.391   2.494   4.383   4.8762  0.1125  RCVD_ILLEGAL_IP:jm
2.388   2.507   4.345   4.8887  0.1250  HTML_10_20:jm
2.385   2.401   4.524   4.7512  0.0500  HTML_BADTAG_00_10:jm
2.381   2.326   4.651   4.6512  0.0000  MISSING_SUBJECT:jm
2.348   2.288   4.576   4.5761  0.0000  HTML_OBFUSCATE_20_30:jm
2.308   5.589   1.798   7.8270  3.3508  NO_REAL_NAME:jm
2.293   2.226   4.451   4.4511  0.0000  MIME_BASE64_BLANKS:jm
2.289   3.063   3.222   5.4389  0.6877  UNDISC_RECIPS:jm
2.276   2.713   3.589   5.0263  0.4001  INFO_TLD:jm
2.236   2.163   4.326   4.3261  0.0000  HTML_WEB_BUGS:jm
2.224   2.151   4.301   4.3011  0.0000  DOMAIN_RATIO:jm
2.202   2.176   4.157   4.3136  0.0375  MIME_QP_EXCESSIVE:jm
2.159   2.082   4.163   4.1635  0.0000  HTML_MIME_NO_HTML_TAG:jm
2.147   2.069   4.138   4.1385  0.0000  HTML_SHOUTING3:jm
2.129   2.051   4.101   4.1010  0.0000  DATE_IN_FUTURE_12_24:jm
2.051   1.988   3.914   3.9635  0.0125  RCVD_DOUBLE_IP_LOOSE:jm
1.984   1.907   3.813   3.8135  0.0000  BAYES_90:jm
1.97    1.894   3.788   3.7884  0.0000  OBFUSCATING_COMMENT:jm
1.957   1.882   3.763   3.7634  0.0000  MSGID_YAHOO_CAPS:jm
1.95    1.875   3.75    3.7509  0.0000  DRUGS_ANXIETY_EREC:jm
Comment 42 Sergey Shmelev 2004-05-28 03:16:10 UTC
Created attachment 1978 [details]
Script that compute Quality of Rules Rating

We need a mathematical criteria for accepting and rejecting new rules.
I suppose you formula log(HitRate*Ratio)
Comment 43 Sergey Shmelev 2004-05-28 05:45:52 UTC
Created attachment 1979 [details]
Checking sender IP against MX records From

New ham Rule, that check sender IP against MX records From

New: Rule check SPF records, if they exist - rule not fire.

This new Rule should have better R/0 ratio and (I think better that AWL have) 
and highly recomended to be included in SA.

Networks administrators now have two possibility - send mail only from servers
that listed in MX records (now many servers have this behavior) OR use SPF
mechanism. If SPF for domain is On, This Rule  NOT Fire.
Comment 44 Sergey Shmelev 2004-06-07 21:26:56 UTC
Can anybody include new patch to test, at big corpus (where ham ~ spam)?

Thank you
Comment 45 Justin Mason 2008-10-28 10:10:41 UTC
testing out moving bugs off of RESOLVED/LATER status en masse
Comment 46 Justin Mason 2008-10-28 10:12:40 UTC
testing out moving bugs off of RESOLVED/LATER status en masse
Comment 47 Justin Mason 2008-10-28 10:13:59 UTC
resolving bugs previously marked RESOLVED/LATER
Comment 48 mouss 2008-11-04 23:17:43 UTC
>According
>ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-stout-antispam-00.txt
>it is recomended to Register all sender MTAs within DNS as MX records
>with special priority (65535)

According to
ftp://ftp.nordu.net/ietf-online-proceedings/05mar/proceedings/IDs/draft-stout-antispam-01.txt

"
This Internet-Draft has been deleted. Unrevised documents placed in the
Internet-Drafts directories have a maximum life of six months. After
that time, they are deleted. This Internet-Draft was not published as
an RFC.

The name of the internet-draft was draft-stout-antispam-00.txt

...
"

so it's expired. dead. 

on the other hand, checking the MX may be used in zombie detection: if client looks dynamic, but it is the MX of the sender domain, then reduce the score (if it was increased due to *DYN* and *DHCP* ... rules). of course, this can be abused by spammers...