Bug 5817 - Poorly faked MTA Received headers (MUA to MX)
Summary: Poorly faked MTA Received headers (MUA to MX)
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.2.4
Hardware: Other other
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-09 08:23 UTC by Karsten Bräckelmann
Modified: 2019-06-19 16:20 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Plugin: Fake Relay 0.2 text/plain None Karsten Bräckelmann [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Karsten Bräckelmann 2008-02-09 08:23:05 UTC
I recently noticed quite a few spams with a poorly faked MTA Received header.
The from IPs of the first and second Received header are identical, and the
second Received is by the MX. I guess this is to sneak around MTA_TO_MX style rules.

I haven't checked a lot of spam for this yet, though. Also, I'm not sure where
to best add such a rule. Maybe I'll have a look next week, when I get some time.
Each and any comment or hint welcome. :)


And now for two examples I found in my <10 scoring spam today:

Received: from dsl.static812142840.ttnet.net.tr
  (dsl.static812142840.ttnet.net.tr [81.214.28.40] (may be forged)) by ...
Received: from [81.214.28.40] by kilowog.blockstackers.com; ...

Received: from 91.pool85-49-188.dynamic.orange.es
  (91.pool85-49-188.dynamic.orange.es [85.49.188.91]) by ...
Received: from [85.49.188.91] by mx2.free.fr; ...


Btw, all three spamples I had a look at today are "downloadable software" spams,
and the MUA claims to be The Bat! (yeah, again). However, this time, they did
not screw up the Date header. ;)
Comment 1 Karsten Bräckelmann 2008-02-09 08:37:33 UTC
...and they come with a throwaway [a-z]+.blogspot.com URI.  Whoops, forgot. :)
Comment 2 Karsten Bräckelmann 2008-02-09 08:44:31 UTC
Doh!  Please disregard the last two paragraphs. I was tracking down two issues
at the same time. The mentioned similarities about content, URI and Mailer don't
apply.  Sorry for the noise.
Comment 3 Daryl C. W. O'Shea 2008-02-09 16:56:37 UTC
There's a rule for such forged headers in my sandbox.  I never bothered to
publish it, it doesn't seem to hit too much spam:

http://ruleqa.spamassassin.org/20080209-r620091-n/DOS_FORGED_RCVD_QUADS/detail
Comment 4 Karsten Bräckelmann 2008-02-09 18:04:13 UTC
Daryl, maybe it's just too late for me today, and I missed it. Where's the
actual rule? The name doesn't suggest to me it checks the first two IPs for
equality, does it?

If I get some time the next days, I'll check my recent corpus for this. Maybe I
should set up a mass-check env one of these days...
Comment 5 Karsten Bräckelmann 2008-02-13 15:47:34 UTC
Still don't know where to find your rule, Daryl.  Anyway, here goes...

I had some fun hacking SA, my own little mass-checker, and a plugin. Will attach
that in a few. Results: NO hits on my collected ham corpus of 5199 msgs, and a
whopping 47% hit rate on Feb spam. (Note that I did not check high scoring spam
with (some assorted) RBL hits and high Bayes, and that the average score is
higher than vanilla, due to custom rules.)

Details:  0% ham,  31% 05-10,  63% 10-15,  34% 15+

Anyone interested? :)
Comment 6 Karsten Bräckelmann 2008-02-13 15:48:43 UTC
Created attachment 4262 [details]
Plugin: Fake Relay 0.2
Comment 7 Daryl C. W. O'Shea 2008-02-13 16:35:19 UTC
(In reply to comment #4)
> Daryl, maybe it's just too late for me today, and I missed it. Where's the
> actual rule?

The rule is in the 70_other.cf file in my sandbox (available via svn).

http://wiki.apache.org/spamassassin/RuleSandboxes

> The name doesn't suggest to me it checks the first two IPs for
> equality, does it?

The rule checks for suspected forged received headers by checking if external
received headers contain (dot-)quads that are duplicated one after another. 
Hence the name DOS_FORGED_RCVD_QUADS.  The rule is Sendmail-ish specific and
relies on the IP having no rDNS.  In your examples above the IPs do have rDNS so
the rule wouldn't fire.

I've just added a generic version of this rule, DOS_RCVD_IP_TWICE, that will
fire on your examples.

(In reply to comment #5)
> I had some fun hacking SA, my own little mass-checker, and a plugin. Will attach
> that in a few.
> Anyone interested? :)

FWIW, you could implement the plugin via a couple of rules and achieve the same
result.  I'd either do that (and hopefully it'll be self documenting) or at
least document what you're trying to achieve (and why) in your plugin.
Comment 8 Karsten Bräckelmann 2008-02-13 17:26:57 UTC
(In reply to comment #7)
> The rule is in the 70_other.cf file in my sandbox (available via svn).

Ah, there it is. Thanks.

> The rule checks for suspected forged received headers by checking if external
> received headers contain (dot-)quads that are duplicated one after another. 
> Hence the name DOS_FORGED_RCVD_QUADS.  The rule is Sendmail-ish specific and
> relies on the IP having no rDNS.  In your examples above the IPs do have rDNS
> so the rule wouldn't fire.
> 
> I've just added a generic version of this rule, DOS_RCVD_IP_TWICE, that will
> fire on your examples.

> FWIW, you could implement the plugin via a couple of rules and achieve the same
> result.  I'd either do that (and hopefully it'll be self documenting) or at
> least document what you're trying to achieve (and why) in your plugin.

Damn, you are quite efficient in demotivating me... :/

You're probably right that one could accomplish the same with rules only. I
forgot about that meta header entirely, when I started hacking. Doh!


However, there are quite some significant differences between the rule you
mentioned above, and my plugin.

First, I explicitely limit the untrusted relays to be exactly 2. I don't want to
FP on mail processing or generating external entities, like mailing list server
or automatic notifications of any kind (including bugzilla). I don't want to
catch anything in the chain actually, but direct MUA to MX msgs with a "second"
forged relay.

Also, even that resulted in 0.3% FP hits. While that might not be much, it was
unsatisfying and suboptimal. ;)

After some checking of the remaining info, the HELO seemed to be key. A
non-empty, non-IP HELO was almost exclusivly characeristic for ham, eliminating
all FPs and resulting in a mere 1% less accuracy on spam.


Btw, my ham corpus consists of directly sent mail only. No mailing list stuff or
something, which sure would not hit with my constraint of exactly 2 untrusted
relays.

I am confident about my plugin/rule, and currently score it at 3.0, given my
test results and that this trick seems to be rather new to get around the MUA to
MX style rules. It sure does hit galore for, especially with a current wave of
low scoring spam (read: almost no hits but Bayes).


Regarding documentation: Sure!  I just meant to publish it early and get some
discussion and opinions. I already wrote some doc, however, mostly suited for me
as a reminder. If the plugin would be welcome and accepted, I'll invest further
work to document inline and point out the details. For now, I figured the
discussion in here should be sufficient for documenting the reasoning.


Also, still, this was quite some fun. :)  And I am rather positive, that trying
the very same stunt in rules only, would have been harder. At least for me.
Comment 9 Karsten Bräckelmann 2008-02-13 17:52:28 UTC
Daryl, I just ran your DOS_RCVD_IP_TWICE rule against my ham corpus, out of
shere curiosity.  FWIW, 70 FPs / 5199 ham, 1.3%.

By simply enforcing equality of the IPs and *exactly* 2 untrusted relays (as I
did in my first attempt), this will drop to 16 FP (0.3%).  No FP at all in my
corpus, with the additional constraints as per my plugin.
Comment 10 Daryl C. W. O'Shea 2008-02-14 18:37:45 UTC
(In reply to comment #8)
> Damn, you are quite efficient in demotivating me... :/

That's not my intention at all.  It's just not clear to me why you're mixing
substr with regexes and then testing if $helo has a value after you've already
tried to take a substr of it.  I wasn't sure of what you were trying to do or if
you were sure that you had achieved what you were trying to do.

> However, there are quite some significant differences between the rule you
> mentioned above, and my plugin.
> 
> First, I explicitely limit the untrusted relays to be exactly 2. I don't want to
> FP on mail processing or generating external entities, like mailing list server
> or automatic notifications of any kind (including bugzilla). I don't want to
> catch anything in the chain actually, but direct MUA to MX msgs with a "second"
> forged relay.
> 
> Also, even that resulted in 0.3% FP hits. While that might not be much, it was
> unsatisfying and suboptimal. ;)
> 
> After some checking of the remaining info, the HELO seemed to be key. A
> non-empty, non-IP HELO was almost exclusivly characeristic for ham, eliminating
> all FPs and resulting in a mere 1% less accuracy on spam.

Either way works (plugin or header rule) for this criteria.  Header rules are
just more likely to get tested by us than eval rules, especially when it is
possible to do something via a rule.  Eval rules (especially when implemented in
their own plugins) are expensive in both use and life-cycle management.
Comment 11 Karsten Bräckelmann 2008-02-14 19:39:02 UTC
(In reply to comment #10)
> > Damn, you are quite efficient in demotivating me... :/
> 
> That's not my intention at all.

Thanks -- I know.

> It's just not clear to me why you're mixing
> substr with regexes and then testing if $helo has a value after you've already
> tried to take a substr of it.  I wasn't sure of what you were trying to do or
> if you were sure that you had achieved what you were trying to do.

That much is easy -- at least to explain after you hacked it, I guess. ;)

The reason is, that IPs in $pms->{relays_untrusted}->[]->{helo} are identified
by a leading and trailing exclamation mark. And M::SA::Constants::IP_PRIVATE is
bound to the beginning of the string. That's why I had to get rid of the leading
exclamation mark, and by that way got rid of the trailing one, too.

The RE then actually does the intended action of testing for private IPs
(especially 127.0.0.1, which I have found in my ham corpus) and the return 0
exonerates them.

The [^!]+$ ensures there is no other exclamation mark in the string, because I
also have seen helo values of 'hostname!!ip!'. However, that test is supposed to
*only* trigger on private IPs.

In retrospect, I do see an issue with this, as it might in some pathological
cases match, where it should not. 210.example.net would be one such case. I'll
re-think that part and will rewrite it.


> Either way works (plugin or header rule) for this criteria.  Header rules are
> just more likely to get tested by us than eval rules, especially when it is
> possible to do something via a rule.  Eval rules (especially when implemented
> in their own plugins) are expensive in both use and life-cycle management.

I do understand that.

I do not insist on a separate plugin, which actually would not make any sense at
all, if it eventually and by chance make it upstream. In that case, RelayEval.pm
probably would be the only sane choice for it to live in.

During testing, however, and for the sake of not harming my SA installation, I
did not feel like messing with that module, and instead settled for an isolated
sandbox to play in. Note that I did not care to provide info on how to use the
eval in bugzilla, just like I even pondered dropping the totally unnecessary
boiler plate stuff. I know all this is trivial for you guys, and at this point
the code is not meant as a patch or standalone addition, but a proof of concept.
I'll be most happy to provide a patch against RelayEval.pm, if desired.


Again, I'm sorry for the lack of documentation. I figured the discussion in here
would have explained the reasoning. I now see it doesn't. ;)  And apart from the
above mentioned private IP issue, the code should do exactly what I intended it
to do. :)
Comment 12 Karsten Bräckelmann 2008-02-14 19:52:25 UTC
(In reply to comment #10)
> It's just not clear to me why you're mixing
> substr with regexes and then testing if $helo has a value after you've already
> tried to take a substr of it.

To clarify further: The substr() does not alter $helo. The first substr() with
the RE test merely gets rid of the leading and trailing '!' for the RE match,
necessary because IP_PRIVATE is anchored at the beginning.

The subsequent test if $helo actually holds a value at all is unrelated. The
order of these two tests are irrelevant. I settled for the order as is, to first
return 0 in the cases where the relay is valid, and then to check for the
remaining conditions that identify what the plugin is supposed to catch.
Comment 13 Karsten Bräckelmann 2008-02-14 21:54:25 UTC
*sigh*  Yes, I believe you're right, Daryl, this can be written in a rule. I'll
have a closer look tomorrow, too late for today...
Comment 14 Daryl C. W. O'Shea 2008-02-14 22:39:55 UTC
This one is close to what you've done with the plugin (mind any wrapping):

- only one received header not added by the internal network (two relays in the
external relays header)

- an empty or dot-quad (well, numbers and dots) helo

- avoids 127.x.x.x, but not all of IP_PRIVATE networks

header DOS_RCVD_IP_TWICE_C      X-Spam-Relays-External =~ /^\s*\[
ip=(?!127)([\d.]+) [^\[]*\bhelo=(?:![\d.]{7,15}!)? [^\[]*\[ ip=\1 [^\]]*\]\s*$/
Comment 15 Karsten Bräckelmann 2008-02-15 04:16:47 UTC
Thanks, Daryl. :)  Although, I hope you didn't believe I'd not hack on that
right away, no matter the time. ;)  Shortly after my last comment, I got a first
rule somewhat working. It was mostly the testing, that had to wait until today...

So, here are two variants of the Forged Relay, MUA to MX as ordinary rules:


header FORGED_RELAY_MUA_TO_MX  X-Spam-Relays-Untrusted =~
 /^\[ ip=(?!127)([\d.]+) [^\[]*\[ ip=\1 [^\[]+ helo=(!(?!127)| )[^\[]+$/


# __RELAYS_IP_MATCH, __RELAYS_THREE_PLUS and __RELAY_MUA_HELO_IP_OR_NONE
# respectively (shortened to avoid wrapping)

header __A X-Spam-Relays-Untrusted =~ /^\[ ip=(?!127)([\d.]+) [^\[]*\[ ip=\1 /
header __B X-Spam-Relays-Untrusted =~ /(\[.+){3}/
header __C X-Spam-Relays-Untrusted =~ / helo=(!(?!127)| )[^\[]+$/

meta FORGED_RELAY_MUA_TO_MX  __A && !__B && __C


Quite embarrassing -- though at least it wouldn't have been that easy to spot
the constraint of the helo IP without my custom dbg printing in the plugin.

Anyway, all three variants result in the exact same hits on my test corpora.
Caveat: Similar to your original rule, this doesn't check all of IP_PRIVATE, but
localhosts only for simplicity.

The long names for the broken down in chunks meta rule should be pretty self
explanatory. They ensure all three constraints are satisfied: The IPs of the two
untrusted relays are identical, there are exactly two untrusted relays, and the
first relays HELO either is a non-localhost IP or none.  (Btw, you checked the
wrong relays HELO. ;)
Comment 16 Daryl C. W. O'Shea 2008-02-15 23:36:55 UTC
r628245 for testing
Comment 17 AXB 2008-02-16 01:01:47 UTC
Yummy!

#counts   FORGED_RELAY_MUA_TO_MX   342s/0h of 4437 corpus (2520s/1917h AXB-MC1)
02/16/08
FORGED_RELAY_MUA_TO_MX -- suggested score: 1.666 (of 5)

#counts   FORGED_RELAY_MUA_TO_MX   342s/0h of 4437 corpus (2520s/1917h AXB-MC1)
02/16/08
FORGED_RELAY_MUA_TO_MX -- suggested score: 1.666 (of 5)
Comment 18 Karsten Bräckelmann 2008-02-16 18:34:24 UTC
Thanks, Daryl!  And thank you Alex, for testing. :)

As mentioned earlier, on my particular spam stream, this hits even harder on
spam than Alex's results, with no FP.

Also, both FORGED_RELAY_MUA_TO_MX_A and FORGED_RELAY_MUA_TO_MX_B in r628245
should result in identical matches.
Comment 19 Karsten Bräckelmann 2008-02-20 11:24:30 UTC
And indeed, according to the ruleqa results, both variants appear to be
identical. The results look quite nice, too.

Btw, where does the T_ prefix come from? It's not in dos/70_bugs.cf

Also, is there any way to get ahold of the single *ham* it hit? I may need to
exonerate more private IPs.
Comment 20 Karsten Bräckelmann 2008-02-21 14:53:21 UTC
Boo! ;)

Chris confirmed the single FP in the total of 120k hams to be a valid, human
composed message using a real MUA. The important part (data changed) of
X-Spam-Relays-Untrusted:
 [ ip=1.1.1.1 rdns=a.b.c.net helo=a.b.c.net by=example.com ... ]
 [ ip=1.1.1.1 rdns=a.b.c.net helo=!1.1.1.1! by=a.b.c.net ... id=k1G0v .... ]

I noticed two differences between that one ham and all spam in my small-ish
ad-hoc testing corpus which matched the previous rule:  (a) The RDNS of both
untrusted hops is identical to the first hops BY, and  (b) the first hop has an ID.

Going from the meta-rule, I added another constraint of (a), which turned out to
be much sharper. It did not result in any less hits, with the notable exception
of the FP. This sub-rule is true, if the second hops RDNS is equal to the first
hops BY:

 header __RDNS_EQ_BY  X-Spam-Relays-Untrusted =~
   /^[^\]]+ rdns=([^ ]*) [^\]]+][^\]]+ by=\1 /

Note that this actualy checks the most recent untrusted relays. These are the
first and second hop due to the existing sub-rule !__RELAYS_THREE_PLUS

The new meta-rule then is:

 meta FORGED_RELAY_MUA_TO_MX  __A && !__B && __C  && !__RDNS_EQ_BY


For reference and probably discussion: I tested with a rule that checks RDNS and
BY both of the first hop. Turned out to be a subset of the above, missing both 1
ham and 1 spam of the original, naive attempt (see comment 8) which still serves
me as a testing corpus. *Both* these messages do not hit FORGED_RELAY_MUA_TO_MX
anyway, in neither of the discussed rules. The results of the meta-rule for my
small-ish test corpus are identical.

Since I am unsure about this result, here's the variant testing the first hops
data only, for reference. Maybe someone else can tell better than me, which one
to use.

 header RDNS_EQ_BY  X-S-R-U =~ / rdns=([^ ]+) [^\[]+ by=\1 [^\[]+$/
Comment 21 Karsten Bräckelmann 2008-02-21 15:05:56 UTC
Uh, strange.  The FP in Chris's ham corpus vanished without changing the
meta-rule, but now there is one in Justin's corpus.

Justin: Details, please? :)
Comment 22 Karsten Bräckelmann 2008-02-21 18:22:50 UTC
Chris informed me, that this particular FP has just expired from his ham-recent
corpus, and he pointed out again that the User-Agent is Thunderbird 1.0.

So, even though I don't like warts -- How much of an impact does a single FP in
120k hams have, given a spam hit rate of 10%+? How much would that affect the score?

Is it worth it chasing this? Does the effort of additional constraints like
comment 20 make sense?
Comment 23 Theo Van Dinter 2008-02-21 18:33:03 UTC
(In reply to comment #22)
> So, even though I don't like warts -- How much of an impact does a single FP in
> 120k hams have, given a spam hit rate of 10%+? How much would that affect the
score?
> 
> Is it worth it chasing this? Does the effort of additional constraints like
> comment 20 make sense?

IMO, the issue is more of "is the corpus clean and it's really a spam", and "is
there a bug which that message tickled which caused the FP".  a single FP is
generally not a huge deal.
Comment 24 Karsten Bräckelmann 2008-02-21 18:43:58 UTC
Yes, almost 15% spam hits in the latest mass-check results. Sixth rule from the
top, rank of 0.98. Now *that* I love. :-)  SCNR

Theo: I see, thanks.  I know about the previous, expired FP, a real graphical
MUA obviously running on a machine with a global IP, helo'ing with it's IP (boo)
and using an SMTP on the same machine. Shouldn't happen all too often. ;) 
Still, I'd be interested to see about Justins FP.
Comment 25 Karsten Bräckelmann 2008-02-21 20:15:12 UTC
Daryl, feel free to remove the happy one-size-fits-all single RE variant, which
according to rulequ is identical to the split meta version. And please add a new
C variant with the additional constraint for comparison, if you deem it worthwile.

I'll finally shut up now, time to call it a day. ;)
Comment 26 Justin Mason 2008-02-22 01:45:41 UTC
here's my FP -- /local/cor/recent/ham/priv.20070625/63 :

X-Spam-Relays-External: [ ip=64.34.202.235 rdns=harlan.tinyplanet.ca
        helo=harlan.tinyplanet.ca by=dogma.boxhost.net ident= envfrom= intl=0
        id=4A0BB310016 auth= msa=0 ] [ ip=64.34.202.235 rdns= helo=!192.168.0.203!
        by=harlan.tinyplanet.ca ident= envfrom= intl=0 id=1I2Yra-0002mm-00
auth=asmtp msa=0 ]

and the Received hdrs:

Received: from harlan.tinyplanet.ca (harlan.tinyplanet.ca [64.34.202.235])
        by dogma.boxhost.net (Postfix) with ESMTP id 4A0BB310016
        for <me@jmason.org>; Sun, 24 Jun 2007 21:38:32 +0100 (IST)
Received: from [64.34.202.235] (helo=[192.168.0.203])
        by harlan.tinyplanet.ca with asmtp (tls_cipher TLSv1:RC4-MD5:128)  (Exim
3.35 #1 (Debian))
        id 1I2Yra-0002mm-00
        for <me@jmason.org>; Sun, 24 Jun 2007 16:39:14 -0400
Comment 27 Karsten Bräckelmann 2008-02-22 08:48:16 UTC
(In reply to comment #26)
> here's my FP -- /local/cor/recent/ham/priv.20070625/63 :

Nice, helo's with its private IP.  This one would not FP with the original
plugin, which checks for any IP_PRIVATE in the helo. One option to catch these.

!__RDNS_EQ_BY as per comment 20 would do, too.
Comment 28 Daryl C. W. O'Shea 2008-02-22 13:41:47 UTC
(In reply to comment #25)
> Daryl, feel free to remove the happy one-size-fits-all single RE variant, which
> according to rulequ is identical to the split meta version.

Not that it's going to make a huge difference in this case alone, but if a rule
can be implemented in one regex rather than 3-4, one is better.

> And please add a new
> C variant with the additional constraint for comparison, if you deem it worthwile.

r630329.  Sorry, I was going to do it last night but I saw something shiny and
got distracted.
Comment 29 Daryl C. W. O'Shea 2008-02-22 13:48:20 UTC
(In reply to comment #28)
> Not that it's going to make a huge difference in this case alone, but if a rule
> can be implemented in one regex rather than 3-4, one is better.

To clarify... usually better.  If it makes the single regex a nasty mess of
backtrack hell, then it's not better.  In this case one is probably faster though.
Comment 30 Karsten Bräckelmann 2008-02-22 13:55:55 UTC
(In reply to comment #28)
> Not that it's going to make a huge difference in this case alone, but if a rule
> can be implemented in one regex rather than 3-4, one is better.

I understand.  I'll see about merging this into the single RE, once I got some
results and know if it flies. :)


> r630329.  Sorry, I was going to do it last night but I saw something shiny and
> got distracted.

Cool, thanks.  Now back to waiting-a-day state...
Comment 31 Karsten Bräckelmann 2008-02-23 14:47:46 UTC
# results of todays network mass-checks
  0.000  15.3159   0.0000    1.000   0.97    0.01  T_FORGED_RELAY_MUA_TO_MX_C
  0.000  15.3387   0.0018    1.000   0.97    0.01  T_FORGED_RELAY_MUA_TO_MX_A

More than 15% spam hit rate, no FPs.  0.02% less accuracy in detecting spam for
eliminating the only two FPs. How's that for a trade-off? :)

(FWIW: That's 87 less hits out of a total of ~59k hits. The diff has been as low
as 0.006% before Theos results came in, but then again, his corpus accounts for
the most hits, too.)
Comment 32 Loren Wilton 2008-02-23 18:55:29 UTC
Zero ham hits is nice, but with S/O ratios like those two rules have, its 
nothing to be particularly concerned about.  Either of those rules look good to 
me.

Which one has the simpler regex?  Use that one.
Comment 33 Karsten Bräckelmann 2008-02-23 19:49:54 UTC
(In reply to comment #32)
> Zero ham hits is nice, but with S/O ratios like those two rules have, its 
> nothing to be particularly concerned about.  Either of those rules look good to 
> me.

Thanks for that comment, Loren -- this pretty much is what I have been wondering
about myself (see comment 22). Is a single FP worth chasing? How does it affect
the score? Of course, I like my proposed rules to be sharp and effective. And of
course, I want it to score as much as possible by default. ;)

> Which one has the simpler regex?  Use that one.

That would be either variant A or B (which are equivalent). And that's actually
exactly what *I* do already. Now I want SA to use it...

And here we reached the point, where I need your opinion. Which one to use?

I know about my spam, and I monitor it for any FP. But getting the rule upstream
will affect users and admins all over the world. The latter is new to me, and I
can't push new rules anyway -- I rely upon your judgement, which one to pick.
There are two and a half options IMHO:

(a) Go with either variant A or B as is. The most simple RE. There's one FP in
    the current mass-check corpus, and a second FP previously recorded by Chris.

    A trivial adjustment is, to modify these to exclude all private IP rather
    than localhost only, as per my original plugin. Would get rid of the current
    single FP. Would not do that for the expired FP by Chris. Needs another
    round of testing to gather results.

(b) Go with variant C, which adds an additional, not-so-trivial constraint.
    Rewriting this into a single RE pending, needs testing.

Neither of these affects the hits on ham and spam significantly. Opinions? Your
call. I'd be most happy to provide whatever variant you prefer or would like to
see results for.
Comment 34 Justin Mason 2008-02-24 12:55:55 UTC
(In reply to comment #33)
> (In reply to comment #32)
> > Zero ham hits is nice, but with S/O ratios like those two rules have, its 
> > nothing to be particularly concerned about.  Either of those rules look good to 
> > me.
> 
> Thanks for that comment, Loren -- this pretty much is what I have been wondering
> about myself (see comment 22). Is a single FP worth chasing? How does it affect
> the score? Of course, I like my proposed rules to be sharp and effective. And of
> course, I want it to score as much as possible by default. ;)

If you look at the "STATISTICS*" files in the rules dir of the distribution,
that gives examples of FP rates and how they effect the assigned scores
in terms of GA output.

> I know about my spam, and I monitor it for any FP. But getting the rule upstream
> will affect users and admins all over the world. The latter is new to me, and I
> can't push new rules anyway -- I rely upon your judgement, which one to pick.
> There are two and a half options IMHO:
> 
> (a) Go with either variant A or B as is. The most simple RE. There's one FP in
>     the current mass-check corpus, and a second FP previously recorded by Chris.
> 
>     A trivial adjustment is, to modify these to exclude all private IP rather
>     than localhost only, as per my original plugin. Would get rid of the current
>     single FP. Would not do that for the expired FP by Chris. Needs another
>     round of testing to gather results.
> 
> (b) Go with variant C, which adds an additional, not-so-trivial constraint.
>     Rewriting this into a single RE pending, needs testing.
> 
> Neither of these affects the hits on ham and spam significantly. Opinions? Your
> call. I'd be most happy to provide whatever variant you prefer or would like to
> see results for.

personally I'd vote for (a) with the private-IP exclusion.


Comment 35 Karsten Bräckelmann 2008-02-26 04:43:30 UTC
(In reply to comment #34)
> If you look at the "STATISTICS*" files in the rules dir of the distribution,
> that gives examples of FP rates and how they effect the assigned scores
> in terms of GA output.

Ah, I see. Really encouraging seeing some rather high scoring rules in the
10-15% spam hit range with a similar FP rate. :)

> personally I'd vote for (a) with the private-IP exclusion.

That was my preference, too. However, turns out the adjustment for option one
and a half indeed is not complicated -- but it isn't exactly trivial either...


Based on the previously mentioned rules (see comment 15), here are the
instructions for modification. The resulting RE will not fit in here by any means.

Basically, this is just exchanging some text in "helo=(!(?!127)| )", the HELO
constraint of the rule. Remove "127", exactly that string only, and replace with
  (?:10|127|169\.254|172\.(?:1[6-9]|2[0-9]|3[01])|192\.168)\.

without any leading or trailing whitespace. Hope, this isn't too confusing. And
Yes, this results in something as ghastly as "(!(?!(?:"...
As per M::SA::Constants.pm this equals to all private IPs. Tested locally.

Daryl, please add variant D based on this and preferably the single RE variant A.
Comment 36 Daryl C. W. O'Shea 2008-02-28 11:42:18 UTC
(In reply to comment #35)
> Daryl, please add variant D based on this and preferably the single RE variant A.

r632099

Comment 37 Daryl C. W. O'Shea 2008-02-28 12:03:27 UTC
...and the correct one in r632107.
Comment 38 Karsten Bräckelmann 2008-02-29 13:01:42 UTC
Ah, I like those results. :-)

Eliminated the FP with a negligible detection trade-off, between variants [AB]
and C. And it's a single rule. Daryl, happy?  (The helo private IP test could be
split off, to enhance readability.)

The ruleqa details seem to be in some broken state currently, not serving
anything but the header. However, even without looking at them -- please note
that the score chart is seriously biased. Variants [ABC] somehow lost the
testing prefix, resulting in an added 3 points to all hits of variant D. All
variants and their spam hits are pretty much identical.
Comment 39 Daryl C. W. O'Shea 2008-02-29 14:25:34 UTC
(In reply to comment #38)
> Eliminated the FP with a negligible detection trade-off, between variants [AB]
> and C. And it's a single rule. Daryl, happy?

Single rule... good.  Happy no.... I wanted to skip work and go skiing today...
didn't happen.

> (The helo private IP test could be split off, to enhance readability.)

Splitting the regex into two would really only be prettier line wrapping so I'll
stick with my "one regex is better than more" theory that jm also preferred above.

> The ruleqa details seem to be in some broken state currently, not serving
> anything but the header.

The Solaris zone is swap thrashing (partly due to what looks like a bug in
ruleqa.cgi).

> However, even without looking at them -- please note
> that the score chart is seriously biased. Variants [ABC] somehow lost the
> testing prefix, resulting in an added 3 points to all hits of variant D. All
> variants and their spam hits are pretty much identical.

Yeah, that's fine (and expected).  I put the rules in without a T_ prefix so
they got auto-promoted the other day after jm fixed the auto-promotion script.

I've renamed FORGED_RELAY_MUA_TO_MX_D to FORGED_RELAY_MUA_TO_MX and commented
out the rest in r632464.
Comment 40 Karsten Bräckelmann 2008-02-29 14:56:20 UTC
(In reply to comment #39)
> Yeah, that's fine (and expected).  I put the rules in without a T_ prefix so
> they got auto-promoted the other day after jm fixed the auto-promotion script.

Right. I was just mystified as to when exactly this happens -- didn't know the
script was broken. And of course, today was bad timing. ;)

> I've renamed FORGED_RELAY_MUA_TO_MX_D to FORGED_RELAY_MUA_TO_MX and commented
> out the rest in r632464.

Great. :)
Comment 41 Henrik Krohns 2019-06-19 16:20:45 UTC
Closing old bug, seems to be resolved.