Bug 1338 - New ranking specifically for figuring out which tests to keep/not
Summary: New ranking specifically for figuring out which tests to keep/not
Status: RESOLVED LATER
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Masses (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 normal
Target Milestone: 3.0.0
Assignee: Allen Smith
URL: http://cm.bell-labs.com/cm/ms/what/sh...
Whiteboard:
Keywords:
Depends on:
Blocks: 1363 1365
  Show dependency tree
 
Reported: 2003-01-02 16:09 UTC by Allen Smith
Modified: 2004-03-11 05:57 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Patch to provide another ranking method (-N p(spam) or -n) patch None Allen Smith [NoCLA]
Revised patch to hit-frequencies patch None Allen Smith [NoCLA]
Program to compare corpi and combine ranking methods accordingly text/plain None Allen Smith [NoCLA]
Tests that the old ranking would suggest to remove but the combined method (with -N 0.1) would not text/plain None Allen Smith [NoCLA]
Tests that the combined ranking (with -N 0.1) would suggest to remove but the old would not text/plain None Allen Smith [NoCLA]
Full data on which tests the combined ranking (with -N 0.1) would suggest to remove text/plain None Allen Smith [NoCLA]
hit-frequencies-multiple.pl revision to allow for ham-{name}.log text/plain None Allen Smith [NoCLA]
detect-meta.1.pl - first draft of program to find possible new meta tests text/plain None Allen Smith [NoCLA]
create-meta.1.pl - first draft of program to create new meta tests from detect-meta.1.pl results text/plain None Allen Smith [NoCLA]
simulate-meta.1.pl - program to add simple (A && B) meta tests to mass-check logs text/plain None Allen Smith [NoCLA]
Possible new meta tests, semi-automatically created text/plain None Allen Smith [NoCLA]
hit-frequencies (for mass-check logs from Jan 4th) results (old ranking) for possible new meta tests text/plain None Allen Smith [NoCLA]
New hit-frequencies -p -m T_META with Malte's logs added text/plain None Allen Smith [NoCLA]
New version of what hit-frequencies-multiple.pl -N 0.1 would suggest to remove, with Malte's info added text/plain None Allen Smith [NoCLA]
Revised hit-frequencies version text/plain None Allen Smith [NoCLA]
hit-frequencies-multiple.pl (more flexible filenames) text/plain None Allen Smith [NoCLA]
Patch to hit-frequencies for new ranking + compensating for network tests being removed patch None Allen Smith [NoCLA]
Version of hit-frequencies-multiple.pl that allows for compensating for network tests being removed text/plain None Allen Smith [NoCLA]
New ranking results for mass-check logs (network tests older than 6 months removed) text/plain None Allen Smith [NoCLA]
Old ranking (with -A 38887:67613) for mass-check logs with net tests prior to last 6 months removed text/plain None Allen Smith [NoCLA]
Latest version of patch to hit-frequencies - license as per current License file patch None Allen Smith [NoCLA]
hit-frequencies-multiple.pl - latest version text/plain None Allen Smith [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Allen Smith 2003-01-02 16:09:42 UTC
Hi. I've been doing some reading on Bayesian statistics, and have come
across an interesting method of figuring out "how much data does this
test normally give us" that turns out to also be found in Shannon information
theory. See _Bayesian Theory_ by Bernando and Smith, pg 82 (further 
bibliographic details available on request) and Shannon, 1948, "A Mathematical 
Theory of Communication", as reprinted in _The Mathematical Theory of 
Communication_, 1949, Shannon and Weaver). I put together a mode for this
in hit-frequencies. It's specifically of use for figuring out how much utility
a test has, not so much for figuring out what the score-range should be - there
is a difference in this case. I'll attach the patch (which also allows
hit-frequencies to process net and userconf tests with a -a flag).

     -Allen
Comment 1 Allen Smith 2003-01-02 16:11:24 UTC
Created attachment 535 [details]
Patch to provide another ranking method (-N p(spam) or -n)
Comment 2 Allen Smith 2003-01-02 16:14:10 UTC
Oh. -n is the equivalent of -x for this ranking. The ranking does vary by
what the probability of spam is that one assumes; use -N p(spam) to set it
(which will also do -n if that isn't already set). I wouldn't use a p(spam)
over 0.5 (the default); p(spam) of 0.1 should be a cautious setting, although
I'm still experimenting with the effects of different p(spam) settings on this.

     -Allen
Comment 3 Duncan Findlay 2003-01-02 17:17:04 UTC
Subject: Re: [SAdev]  New ranking specifically for figuring out which tests to keep/not

Do we have a translator? :-)

Comment 4 Allen Smith 2003-01-02 17:45:59 UTC
> Do we have a translator? :-)

Heh... you speak math? And know a good way for me to type it in?

     -Allen
Comment 5 Allen Smith 2003-01-02 19:21:21 UTC
Part of this, BTW, is the realization that there can be a distinction between
"what tests do we want to keep" and "what tests should have the
highest-magnitude scores". A test can be trusted with a high-magnitude score if
it has a low false-positive/negative rate. A test should be kept if it tends to
provide a lot of data - and that means it has a lot of hits and _most_ of those
are spam or nonspam. A test that has a S/O of 1 but only a couple of hits may
not be worth doing, particularly for high-load tests (net tests, for instance) -
it doesn't provide that much information in the average case. The formula I've
programmed into hit-frequencies looks at the latter question, in that it is
trying to figure out how much information a test tends to provide.

     -Allen
Comment 6 Daniel Quinlan 2003-01-02 23:12:44 UTC
Subject: Re: [SAdev]  New ranking specifically for figuring out which tests to keep/not

bugzilla-daemon@hughes-family.org writes:

> Part of this, BTW, is the realization that there can be a distinction
> between "what tests do we want to keep" and "what tests should have
> the highest-magnitude scores".

I understand your goal (if not the math), but why not just have one
RANK?  I generally use the current RANK number to determine whether
tests are worth keeping or not, but I don't rely 100% on it -- I also
factor in a lot of factors such as S/O ratio, hit rates, speed of the
test, overlap with other tests, FP/FN reduction, etc.  Basically, if you
can come up with a new RANK that agrees with me more often than the
current RANK, I think we should switch.  :-)

Anyway, can you publish methods for selecting the following:

  - the worst 10 tests
  - the worst 10% tests

(You probably should use all available nightly test results from the
last day or two.)

Then, let's look at the worst 10 and 10% tests using the current RANK
and your new RANK and see which agrees with our gut feeling better.  If
the new one is better, let's just make it the RANK.

Another way is to remove the worst 10% using either method, run the GA,
and then see which one does better on a separate test corpus.

Comment 7 Allen Smith 2003-01-03 09:01:37 UTC
> I understand your goal (if not the math)

I only really understand what the math _means_, not enough to re-derive it! Think
of SA tests as a noisy communications channel. For a noiseless channel, the
maximum information is conveyed per-symbol if the symbols are equally likely. For
binary symbols with placement determining meaning (the closest analogy for SA
tests), the most information is conveyed if there's a 50/50 chance for a 0 or a
1. Otherwise, the more common of the pair doesn't tell you much (it's common, so
you'd expect it anyway) and while the less common of the pair means more, it's
also less common, so you don't _get_ that extra meaning very often. SA tests
have noise - have a soratio other than 1 or 0 - so communicate a bit less
information (that's _mostly_ factored in with the stuff inside the log()
function - I reduced down the original by canceling out terms, with the
original being, essentially, observed/expected (thus a log of 0 if equal)).

> why not just have one RANK?

score-ranges-from-freqs uses the existing scoring method, and as I stated,
this method of RANKing is specifically not as useful for figuring out how
trustworty a test is - how high-magnitude a test's score should be allowed to
be. If a test has a high RANK by this method but a S/O that isn't 1 or 0, then
it probably should be used but with a lower score (so that it can make a
difference in combination with others with lower scores) - if the S/O is really
too close to 0.5 than is tolerable, but it has a high RANK by this method, then
it is a likely candidate for usage as a sub-component of meta tests.

> I generally use the current RANK number to determine whether
> tests are worth keeping or not, but I don't rely 100% on it

I wouldn't do so either, including with this new one - but I believe it does
better integrate the hit ratios and soratio for determining whether tests
are worth keeping or not.

>Anyway, can you publish methods for selecting the following:

>  - the worst 10 tests
>  - the worst 10% tests

The worst are going to be the same in any case:
     A. Non-nice tests with soratios below 0.5, and nice tests with soratios
        above 0.5.
     B. 0-hitters.
See below for an alternative comparison.

>(You probably should use all available nightly test results from the
>last day or two.)

OK, I'll download them from the RSYNC server. (BTW,
http://www.pathname.com/~corpus/ needs to be adjusted temporarily for that
we're doing a HEAD, non-dated, mass-check run; it isn't including mine, probably 
because it wasn't a NIGHTLY one and because it was completed before 9AM UTC.)

>Then, let's look at the worst 10 and 10% tests using the current RANK
>and your new RANK and see which agrees with our gut feeling better.

Modulo that the best to compare will probably be, say, which ones were below
0.8 or so with the old RANK and which ones are in the equivalent decile with
the new RANK, agreed.

> If the new one is better, let's just make it the RANK.

Again, it isn't suitable for that. I have in mind one possible replacement for
the current RANK in figuring out score ranges (using a variant of the Robinson
f(w) equation on the soratio), but it's going to require more testing (GA runs)
than there's really going to be time for before we want to get 2.50 out the
door.

     -Allen
Comment 8 Allen Smith 2003-01-04 13:14:23 UTC
I've done some examination of the results of applying the new ranking method to
current rsync-server logs. The results were an interesting mixture of "yes, that's
how it should be" and "WTH?". I did some further analysis and determined:
     A. One problem was some people running with BAYES and some not, so the
        BAYES_ results were very irregular.
     B. The old method, due to its heavy usage of soratios, spotted cases where
        a test had inconsistent results between corpi - not simply some people
        having hits and others not, which is tolerable, and not simply a test
        being consistently "just OK", which is likewise tolerable if the test
        has a high enough hit rate and/or can be combined with other tests to
        eliminate some of the FPs/FNs, but where applying said test would be
        downright disasterous in some circumstances.
     C. The new method did do very well at spotting tests that simply don't have
        enough data for them for the old ranking method to be reliable, and
        at spotting tests that, if they involve heavy load for one reason or
        another (e.g., network tests), aren't good enough to be worth it under
        most circumstances.
     D. Assuming a p(spam) of 0.1 gives overall better results, as I suspected.
Given the above, I have come up with both a revised hit-frequencies patch and
a program specifically for use with multiple corpora. The latter evaluates
for each corpus (spam/nonspam file pair) whether a test is highly problematic
for that corpus. If this is true for any corpus for a given test, or if the
test has less than 10 hits across all corpi (4 would be another logical number
to use), then the worse of the old or new ranks is used for ordering. If
neither of these is the case, then the better of the old and new ranks is used
for ordering. I suggest that this combination is good for general usage. For
cases of tests with heavy load, I suggest using the new ranking method for
deciding which tests to have enabled by default, and the combination for which
ones to make available to people who can tolerate the additional load (e.g.,
have locally cached versions for RBL tests), along with other information
(overlaps et al) for both cases.

I will attach the new patch, the new program, data on which tests look "bad" on
the old comparison but not on the combined method, data on which tests look
"bad" on the combined method but not on the old method, and the full listing
of the combined method's results for the tests that look "bad".

     -Allen
Comment 9 Allen Smith 2003-01-04 13:15:20 UTC
Created attachment 538 [details]
Revised patch to hit-frequencies
Comment 10 Allen Smith 2003-01-04 13:16:08 UTC
Created attachment 539 [details]
Program to compare corpi and combine ranking methods accordingly
Comment 11 Allen Smith 2003-01-04 13:19:44 UTC
BTW, the method to use the program for comparing corpi and combining ranking
methods (hit-frequencies-multiple.pl is my current name for it) is to run it
with all applicable options for hit-frequencies, except that -x and -n are
unnecessary (-N can be used to supply a p(spam) different from 0.5; 0.1 seems to
work well), followed by the names of each of the corpuses. It expects that
corpi will be in the form spam-[name].log and nonspam-[name].log; it is capable of
using corpi with only one or the other, albeit only in the overall ranking and
not in determining tests that have severe problems on a particular corpus.

     -Allen
Comment 12 Allen Smith 2003-01-04 13:22:06 UTC
Created attachment 540 [details]
Tests that the old ranking would suggest to remove but the combined method (with -N 0.1) would not
Comment 13 Allen Smith 2003-01-04 13:23:02 UTC
Created attachment 541 [details]
Tests that the combined ranking (with -N 0.1) would suggest to remove but the old would not
Comment 14 Allen Smith 2003-01-04 13:25:37 UTC
Created attachment 542 [details]
Full data on which tests the combined ranking (with -N 0.1) would suggest to remove
Comment 15 Allen Smith 2003-01-09 11:17:26 UTC
Created attachment 556 [details]
hit-frequencies-multiple.pl revision to allow for ham-{name}.log
Comment 16 Allen Smith 2003-01-09 12:12:50 UTC
While waiting on the DNS testing to finish now that Osirusoft is back up, I decided
to work a bit on the idea of taking tests that the old ranking would suggest to
remove, but the new ranking would keep, and using them for meta tests. I took
the mass-check logs from everyone as of Jan 4th, ran my initial version of
a possible-meta detection program on it (detect-meta.1.pl; it's limited to
non-nice tests and positive correlations only at the moment, and doesn't handle
__* tests all that well as yet), did some post-filtering (especially for things
involving __* tests - I rather doubt people using (for some masochistic (sp?)
reason) Outlook as a mailer would be happy with meta tests using Outlook as a
spamsign...), and selected out some possible meta tests. (These are focused on
ones using rules that would be tossed out by the old ranking, but I also put in
a few others I happened to notice.) I then wrote another, again rather
simplistic/first-draft, to create the derived meta rules (create-meta.1.pl).

Now, before everyone screams at me that we're in a rules freeze and are about
to start the GA run, I accounted for that. I also created a program,
simulate-meta.1.pl, that simulates the simple variety (A && B) of meta tests
that the previous two programs create (I decided to stay away, for the moment,
from trying to do a full eval simulation of all meta tests - this approach is
less likely to have eval/quoting/etcetera bugs), taking as input a file of
said tests and a mass-check logfile, and outputting a logfile with the
new meta tests added. Thus, we don't need to rerun mass-check in order to
incorporate these meta tests, except that it might be a good idea to make
sure that we don't add so many that the number of them overwhelms SA... I say
this because I wound up creating a total of 3412 of the silly things!

I really don't suggest adding all of these just yet - only the ones with a
soratio of 1. I will nonetheless attach a listing of all of them plus a
hit-frequencies ranking of them all (using the old ranking method, it being
more suitable for this purpose).

     -Allen
Comment 17 Allen Smith 2003-01-09 12:13:48 UTC
Created attachment 557 [details]
detect-meta.1.pl - first draft of program to find possible new meta tests
Comment 18 Allen Smith 2003-01-09 12:14:28 UTC
Created attachment 558 [details]
create-meta.1.pl - first draft of program to create new meta tests from detect-meta.1.pl results
Comment 19 Allen Smith 2003-01-09 12:15:24 UTC
Created attachment 559 [details]
simulate-meta.1.pl - program to add simple (A && B) meta tests to mass-check logs
Comment 20 Allen Smith 2003-01-09 12:18:19 UTC
Created attachment 560 [details]
Possible new meta tests, semi-automatically created
Comment 21 Allen Smith 2003-01-09 12:20:21 UTC
Created attachment 561 [details]
hit-frequencies (for mass-check logs from Jan 4th) results (old ranking) for possible new meta tests
Comment 22 Allen Smith 2003-01-09 12:30:16 UTC
>I really don't suggest adding all of these just yet - only the ones with a
>soratio of 1. 

Make that "only the ones with a soratio of 1.000 which are in the top 100-200 by
the old ranking system", actually. I hadn't realized exactly how many have a
soratio of 1.000...

     -Allen
Comment 23 Allen Smith 2003-01-09 13:00:26 UTC
Oh:
    A. All of this is licensed as per the current License file; and
    B. Someone (I can try to do it this evening) will need to process Malte's
       mass-check files for the new ranking and for the results with the new
       meta tests. (For both of these, having info from as many people as possible
       is of even greater importance than it was before...)

    -Allen
Comment 24 Allen Smith 2003-01-09 22:35:50 UTC
Created attachment 567 [details]
New hit-frequencies -p -m T_META with Malte's logs added
Comment 25 Allen Smith 2003-01-09 22:37:00 UTC
Created attachment 568 [details]
New version of what hit-frequencies-multiple.pl -N 0.1 would suggest to remove, with Malte's info added
Comment 26 Justin Mason 2003-01-10 08:58:44 UTC
I would suggest that getting the GA runs finished and new
score sets generated, then doing a release, will be enough
work in the short term ;) -- esp. if we add in the new DNS
code as well.  If we add this stuff now, we'll never get
2.50 out ;)

IMO, better to leave this one for the new dev cycle when
that starts.  (the logs will, of course, still be valid
at that stage, so we can just pick it up as-is at that
point.)

opinions?
Comment 27 Allen Smith 2003-01-10 09:15:39 UTC
>I would suggest that getting the GA runs finished and new
>score sets generated, then doing a release, will be enough
>work in the short term ;) -- esp. if we add in the new DNS
>code as well.

I certainly hope it can be added...

>If we add this stuff now, we'll never get 2.50 out ;)

Point. I'd tried to simplify adding this stuff in as much as possible via
simulate-meta.1.pl, but I can see how it might wind up being a bit overwhelming.
I would like to ask that the new ranking method be kept in mind when deciding
which tests to delete and which to keep. (I will be travelling on Monday
and Tuesday and am not likely to be able to be in touch at all, BTW.)

>IMO, better to leave this one for the new dev cycle when
>that starts.  (the logs will, of course, still be valid
>at that stage, so we can just pick it up as-is at that
>point.)

The logs can be reused via simulate-meta.1.pl, yes.

     -Allen
Comment 28 Theo Van Dinter 2003-01-11 21:02:11 UTC
I agree that this is not required for 2.50, so I'm setting to 3.0.
Comment 29 Allen Smith 2003-01-11 21:27:41 UTC
All material regarding new meta tests on this bug should instead be on
1363, which is for 2.6/3.0.

     -Allen
Comment 30 Allen Smith 2003-01-16 12:43:39 UTC
Current network-testing results, new ranking (-N 0.1), using:

ham-nobayes-net-easmith.log
ham-nobayes-net-jm.log
ham-nobayes-net-kramer.log
ham-nobayes-net-mss.log
ham-nobayes-net-quinlan2.log	(Dan, is this the one that should be used?)
ham-nobayes-net-rODbegbie.log
ham-nobayes-net-theo.log

and the equivalent spam-nobayes-net-*.log files. I will attach the new versions
of hit-frequencies-multiple.pl and hit-frequencies.

OVERALL     SPAM  NONSPAM     S/O   RANK1   RANK2  SCORE NAME
 166976    39614   127362    0.237   0.00   0.00   0.00  (all messages)
  28428    18284    10144    0.853   0.66   0.99   0.38  RCVD_IN_OSIRUSOFT_COM
   2678     2669        9    0.999   0.96   0.94   0.30  X_OSIRU_SPAMWARE_SITE
  19188    12798     6390    0.866   0.67   0.98   0.01  RCVD_IN_NJABL
   6838     6269      569    0.973   0.90   0.97   2.00  X_NJABL_OPEN_PROXY
   5274     5106      168    0.990   0.95   0.97   3.18  RCVD_IN_SBL
   1382     1381        1    1.000   0.96   0.88   3.25  RCVD_IN_DSBL
   2145     2136        9    0.999   0.96   0.92   0.01  RAZOR2_CF_RANGE_91_100
   4277     4137      140    0.990   0.94   0.96   3.91  RAZOR2_CHECK
  11555     7678     3877    0.864   0.65   0.95   2.28  RCVD_IN_RFCI
   1000        0     1000    0.000   0.96   0.49  -4.00  HABEAS_SWE
  18486     9646     8840    0.778   0.49   0.93   0.77 RCVD_IN_UNCONFIRMED_DSBL
   1871     1854       17    0.997   0.95   0.90   1.00  RCVD_IN_OPM
     86       86        0    1.000   0.96   0.41   0.01  RAZOR2_CF_RANGE_81_90
   1494     1412       82    0.982   0.91   0.86   0.61  RCVD_IN_RELAYS_ORDB_ORG
    407      398        9    0.993   0.94   0.66   0.01  RAZOR2_CF_RANGE_21_30
    145      138        7    0.984   0.91   0.47   0.01  RAZOR2_CF_RANGE_41_50
     69       65        4    0.981   0.90   0.34   0.01  RAZOR2_CF_RANGE_61_70
   1665     1557      108    0.979   0.90   0.87   2.25  RCVD_IN_ORBS
    160      145       15    0.969   0.87   0.45   0.01  RAZOR2_CF_RANGE_11_20
     61       57        4    0.979   0.90   0.32   0.01  RAZOR2_CF_RANGE_31_40
     45       42        3    0.978   0.90   0.28   0.01  RAZOR2_CF_RANGE_51_60
    155      137       18    0.961   0.85   0.43   0.01  RAZOR2_CF_RANGE_01_10
     86       79        7    0.973   0.88   0.36   1.00  ROUND_THE_WORLD_LOCAL
     40       37        3    0.975   0.89   0.26   0.01  RAZOR2_CF_RANGE_71_80
     41       32        9    0.920   0.74   0.21   2.80  ROUND_THE_WORLD
    413       17      396    0.121   0.65   0.26 -10.00  RCVD_IN_BONDEDSENDER
  35489    15176    20313    0.706   0.39   0.93   3.09  NO_MX_FOR_FROM
   1819      796     1023    0.714   0.35   0.44   0.50  X_NJABL_DIALUP
   1779      667     1112    0.659   0.28   0.33   0.62  X_OSIRU_DUL
      0        0        0    0.500   0.12   0.07   8.00  HABEAS_VIOLATOR
      0        0        0    0.500   0.12   0.07   4.00  HABEAS_HIL
      0        0        0    0.500   0.12   0.06   2.62  RCVD_IN_VISI
      0        0        0    0.500   0.12   0.06   2.73  X_OSIRU_SPAM_SRC
      0        0        0    0.500   0.12   0.06   2.72  X_OSIRU_OPEN_RELAY
      0        0        0    0.500   0.12   0.03   0.00  RCVD_IN_BL_SPAMCOP_NET
      0        0        0    0.500   0.12   0.03   0.00  RAZOR_CHECK
      0        0        0    0.500   0.12   0.03   0.00  RCVD_IN_DUL_FH
      0        0        0    0.500   0.12   0.03   0.00  RCVD_IN_RSS
      0        0        0    0.500   0.12   0.03   0.00  RCVD_IN_DUL
      0        0        0    0.500   0.12   0.03   0.00  RCVD_IN_RBL
   5649     1070     4579    0.429   0.08   0.00   0.36  X_OSIRU_DUL_FH
   8479      881     7598    0.272   0.02   0.00   0.81  RCVD_IN_MULTIHOP_DSBL
Comment 31 Allen Smith 2003-01-16 12:44:21 UTC
Created attachment 586 [details]
Revised hit-frequencies version
Comment 32 Allen Smith 2003-01-16 12:44:48 UTC
Created attachment 587 [details]
hit-frequencies-multiple.pl (more flexible filenames)
Comment 33 Theo Van Dinter 2003-01-16 13:14:39 UTC
Subject: Re: [SAdev]  New ranking specifically for figuring out which tests to keep/not

On Thu, Jan 16, 2003 at 12:43:39PM -0800, bugzilla-daemon@hughes-family.org wrote:
> ham-nobayes-net-quinlan2.log	(Dan, is this the one that should be used?)

Per a message he sent me pre-trip-to-Boston this morning, yes.

>   35489    15176    20313    0.706   0.39   0.93   3.09  NO_MX_FOR_FROM

Yeah, we need to come up with what timeframe we should be using the
netchecks from (I have a lot of ham hits for this rule for instance, all
from hosts that haven't existed in X years...)  I then need to figure
out how to remove net checks from only a certain section of log file
based on timestamp (since we want the results in the whole file for the
rest of set1 testing).

To do so, we'd probably want to split the logs based on the timestamp,
check the net results for that section, then do the above removal based
on the best looking timestamp.  I'm thinking a max of 3 months for net
tests, anyone else?


We also need to agree on what standards we're going to use to nuke rules
before the GA run.

Comment 34 Allen Smith 2003-01-16 13:59:10 UTC
Subject: Re:  New ranking specifically for figuring out which tests to keep/not


>------- Additional Comments From felicity@kluge.net  2003-01-16 13:14
>Subject: Re: [SAdev]  New ranking specifically for figuring out which tests
>to keep/not
>
>On Thu, Jan 16, 2003 at 12:43:39PM -0800, bugzilla-daemon@hughes-family.org
>wrote:
>> ham-nobayes-net-quinlan2.log	(Dan, is this the one that should be used?)
>
>Per a message he sent me pre-trip-to-Boston this morning, yes.

Ah, thank you, I'd forgotten about his Boston trip.

>>   35489    15176    20313    0.706   0.39   0.93   3.09  NO_MX_FOR_FROM
>
>Yeah, we need to come up with what timeframe we should be using the
>netchecks from

No more than 6 months, I'd say - I actually didn't even _do_ net-checks for
stuff before August of 2002. Less than 6 months is possible.

>(I have a lot of ham hits for this rule for instance, all
>from hosts that haven't existed in X years...)

Yes. My sole FP for that particular one is from August of 2002, as it turns
out...

>I then need to figure out how to remove net checks from only a certain
>section of log file based on timestamp (since we want the results in the
>whole file for the rest of set1 testing).
>
>To do so, we'd probably want to split the logs based on the timestamp,
>check the net results for that section,

?

>then do the above removal based on the best looking timestamp.

Ah.

> I'm thinking a max of 3 months for net tests, anyone else?

Not sure as yet - will have to see the results of different levels of
removals.

>We also need to agree on what standards we're going to use to nuke rules
>before the GA run.

Yes. Some combo of old & new ranks, S/O, and considerations of how much of a
load on SA the rule-type in question creates - low hits for a high-load rule
is one indicator (and the GA _doesn't_ have info on how high the load for a
rule is, unlike most of the other info (it also doesn't have some of the
info that goes into the new ranking, namely inconsistencies on rules between
corpi)).

   -Allen

Comment 35 Allen Smith 2003-01-17 20:26:25 UTC
Created attachment 591 [details]
Patch to hit-frequencies for new ranking + compensating for network tests being removed
Comment 36 Allen Smith 2003-01-17 20:26:54 UTC
Created attachment 592 [details]
Version of hit-frequencies-multiple.pl that allows for compensating for network tests being removed
Comment 37 Allen Smith 2003-01-17 20:53:55 UTC
Attached are the results of doing hit-frequencies-multiple.pl -r -N 0.1 -P
nobayes-net on the following:

ham-nobayes-net-easmith.log
ham-nobayes-net-jm.log
ham-nobayes-net-kramer.log
ham-nobayes-net-martinr.log
ham-nobayes-net-mgm.log
ham-nobayes-net-mss.log
ham-nobayes-net-quinlan2.log
ham-nobayes-net-rODbegbie.log
ham-nobayes-net-rpuhek.log
ham-nobayes-net-theo.log
spam-nobayes-net-easmith.log
spam-nobayes-net-jm.log
spam-nobayes-net-kramer.log
spam-nobayes-net-martinr.log
spam-nobayes-net-mgm.log
spam-nobayes-net-mss.log
spam-nobayes-net-quinlan2.log
spam-nobayes-net-rODbegbie.log
spam-nobayes-net-rpuhek.log
spam-nobayes-net-theo.log

after removing network tests older than 6 months (see Bug 1365).

	-Allen
Comment 38 Allen Smith 2003-01-17 20:54:44 UTC
Created attachment 593 [details]
New ranking results for mass-check logs (network tests older than 6 months removed)
Comment 39 Allen Smith 2003-01-17 21:08:40 UTC
Created attachment 594 [details]
Old ranking (with -A 38887:67613) for mass-check logs with net tests prior to last 6 months removed
Comment 40 Justin Mason 2003-01-19 15:53:44 UTC
Allen,  I reckon the name of this stat should change from "RANK"
or "RANK2" to "INFO".  That's more accurate, as, as you say, it's
not a good reflection of its "rank" in the traditional sense, more
of how much information it provides (in the Shannon sense if I
heard right ;).
Comment 41 Allen Smith 2003-01-19 16:01:49 UTC
Subject: Re:  New ranking specifically for figuring out which tests to keep/not

>------- Additional Comments From jm@jmason.org  2003-01-19 15:53
>Allen,  I reckon the name of this stat should change from "RANK"
>or "RANK2" to "INFO".  That's more accurate, as, as you say, it's
>not a good reflection of its "rank" in the traditional sense, more
>of how much information it provides

Good point; thanks! I'm currently converting it into rankings mainly because
of headaches figuring out other ways of interpreting it than "more is
better", at least at the moment...

> (in the Shannon sense if I heard right ;).

Essentially, although a bit of a scaling adjustment would need to be made to
convert it into informational entropy terms, since I simply used log()
instead of log-base-2.

	-Allen

Comment 42 Allen Smith 2003-01-20 09:03:22 UTC
Created attachment 598 [details]
Latest version of patch to hit-frequencies - license as per current License file
Comment 43 Allen Smith 2003-01-20 13:41:00 UTC
Created attachment 601 [details]
hit-frequencies-multiple.pl - latest version
Comment 44 Theo Van Dinter 2003-02-03 21:49:37 UTC
Since this isn't going to be in 2.50, I'm aiming it at 2.60 instead. :)
Comment 45 Theo Van Dinter 2004-03-11 14:57:37 UTC
old ticket, no movement.