SA Bugzilla – Bug 7943
TxRep gives nonsensical scores?
Last modified: 2023-11-24 09:12:42 UTC
TxRep seems to return nonsensical scores. I'm using MySQL table if it matters (as DB files have long ago become unusable to me due to heavy locking & timeouts). I've finally taken some time to try to debug it, and first issue was that 3.4.6 was generating many same MSGID tokens ("da39a3ee5e6b4b0d3255bfef95601890afd80709@sa_generated" had count>10 in a few minutes), which would then get reused by ham and spam because "that mail was already seen". (I've partially tracked that problem down to the with how sha1 hash for "xxxxxx@sa_generated" is created in 3.4.6 - TxRep was using "Mail::SpamAssassin::Plugin::Bayes->get_msgid()" which seems to be case-sensitive and only works for one case of "Message-Id", otherwise it tries to fall back to using hash of date/body but...) Anyway I've seen SVN trunk has changed that part of the code, so I've simply disabled MSGID tokens with "txrep_track_messages 0" and truncated the txrep table, hoping that would solve the issue. It did not - it still returned strange results (spammy score for hams etc.) I've then tried getting SVN trunk TxRep.pm version, with no luck (it still worked wrong, and I've had to copy new generate_msgid() to make it work) I've then nuked the txrep table; added some debug, and start feeding one clearly ham e-mail several times through "spamassassin -L -t". This is how mysql table looked for first 5 runs (I'm only focusing on EMAILIP tag here, but the same problem is with others): +----------+---------------+------+----------+----------+----------+---------------------+ | username | email | ip | msgcount | totscore | signedby | last_hit | +----------+---------------+------+----------+----------+----------+---------------------+ 1st | amavis | hepi@hep.hr | none | 1 | -10.21 | spf | 2021-11-12 03:07:03 | 2nd | amavis | hepi@hep.hr | none | 2 | -10.21 | spf | 2021-11-12 03:09:27 | 3rd | amavis | hepi@hep.hr | none | 3 | -10.21 | spf | 2021-11-12 03:10:24 | 4th | amavis | hepi@hep.hr | none | 4 | -10.21 | spf | 2021-11-12 03:11:17 | 5th | amavis | hepi@hep.hr | none | 5 | -10.21 | spf | 2021-11-12 03:12:54 | I've added following debug just after: $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore; dbg("TxRep: mn %s _formula delta = (total()=%0.3f + msgscore=%0.3f) / (1 + count()=%0.3f) - msgscore=%0.3f = %0.3f", $tag_id, $self->total(), $msgscore, $self->count(), $msgscore, $delta); And this is what it printed for that first 5 runs: dbg: TxRep: mn EMAILIP _formula delta = (total()=0.000 + msgscore=-10.210) / (1 + count()=0.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=1.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=2.000) - msgscore=-10.210 = 3.403 dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=3.000) - msgscore=-10.210 = 5.105 dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=4.000) - msgscore=-10.210 = 6.126 This looks wrong. I've started with TXREP=0 SA score, and after receiving 5 HAM messages from that sender, TXREP now returns high positive SPAM score: 3.1 TXREP TXREP: Score normalizing based on sender's reputation The more HAM I feed it, the higher the SPAM score gets. I'm thinking $delta is supposed to get slightly more negative with each HAM that passes through, or at least remain the same, and definitely not start classifying the email as SPAM. Is my assumption correct? Any idea how $delta calculation should actually work here?
One observation: it seems that "totscore" is not always being changed while "msgcount" is. Should it have been? Because, if it were changed at the same rate, then that formula *would* keep delta at zero, e.g.: dbg: TxRep: mn EMAILIP _formula delta = (total()=0.000 + msgscore=-10.210) / (1 + count()=0.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=1.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-20.420 + msgscore=-10.210) / (1 + count()=2.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-30.630 + msgscore=-10.210) / (1 + count()=3.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-40.840 + msgscore=-10.210) / (1 + count()=4.000) - msgscore=-10.210 = 0.000 I've seen in code that calling add_score() is sometimes connected to (non-default) "txrep_autolearn 1". Enabling autolearn does indeed make "totscore" change, but in a wrong way too, and also "msgcount" gets increased by 2 instead of by 1. The miscalculation leading from ham to spam is still there, even with autolearn enabled though: +----------+---------------+------+----------+----------+----------+---------------------+ | username | email | ip | msgcount | totscore | signedby | last_hit | +----------+---------------+------+----------+----------+----------+---------------------+ | amavis | hepi@hep.hr | none | 2 | -30.21 | spf | 2021-11-12 04:41:52 | | amavis | hepi@hep.hr | none | 4 | -23.4033 | spf | 2021-11-12 04:43:22 | | amavis | hepi@hep.hr | none | 6 | -22.042 | spf | 2021-11-12 04:43:58 | | amavis | hepi@hep.hr | none | 8 | -21.4586 | spf | 2021-11-12 04:44:30 | | amavis | hepi@hep.hr | none | 10 | -21.1344 | spf | 2021-11-12 04:44:59 | dbg: TxRep: mn EMAILIP _formula delta = (total()=0.000 + msgscore=-10.210) / (1 + count()=0.000) - msgscore=-10.210 = 0.000 dbg: TxRep: mn EMAILIP _formula delta = (total()=-30.210 + msgscore=-10.210) / (1 + count()=2.000) - msgscore=-10.210 = -3.263 dbg: TxRep: mn EMAILIP _formula delta = (total()=-23.403 + msgscore=-10.210) / (1 + count()=4.000) - msgscore=-10.210 = 3.487 dbg: TxRep: mn EMAILIP _formula delta = (total()=-22.042 + msgscore=-10.210) / (1 + count()=6.000) - msgscore=-10.210 = 5.603 dbg: TxRep: mn EMAILIP _formula delta = (total()=-21.459 + msgscore=-10.210) / (1 + count()=8.000) - msgscore=-10.210 = 6.691 3.3 TXREP TXREP: Score normalizing based on sender's reputation
Hi - thanks for taking the time to get all this information together. I think this could be partly linked to https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7965 Regarding the +2 - this is as "intended", as coded - the first score is for the standard reputation, and a second score for the key is the "learned" score - be this autolearning or manually learned. https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?revision=1896315&view=markup#l1283 and https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?revision=1896315&view=markup#l1504 etc With this in mind and the recent adjustment to trunk, could you retest your situation? Feel free to come back with more information to help pinpoint the issue if these updates don't help
Thanks Paul for your efforts! Unfortunately, I hadn't had a change to try your fix yet, as I've had to drop TxRep in favor of AWL in early 2022 in order to make production functional again, and not having time to try to test and bring it back... However, since AWL with SQL backend also seems buggy, and I'll have to invest time to rebuild database anyway, I think I might give TxRep another try. Might be worth doing it before 4.0 gets out, in order to iron out bugs there and save other people some headaches. However, I've found another bug in SQLBasedAddrList.pm which seems it might be affecting not only AWL but TxRep as well: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072 Could you take a look there if that would affect TxRep as well? Also, can I just grab SQLBasedAddrList.pm and TxRep.pm from trunk; or do I have to go full-trunk (which would be much harder to swallow as I can basically test it only by deploying it in production) ?
Created attachment 5883 [details] Possible fix Delta formula is: $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore; If we consider the case when: - TxRep database has 15 emails that matches ($self->count() = 15) - spam message has score 40 (spam) - calculated TxRep score is 20 (spam) - new TxRep score will be (20 + 40) / ( 1 + 15 ) - 40 = -36.25 In this case the spam message will have a total score of 40 - 36.25 = 3.75 and it won't be flagged as spam. The attached patch doesn't consider those messages in the delta calculation.
Sending lib/Mail/SpamAssassin/Plugin/TxRep.pm Transmitting file data .done Committing transaction... Committed revision 1909608.