Bug 7943 - TxRep gives nonsensical scores?
Summary: TxRep gives nonsensical scores?
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 3.4.6
Hardware: PC Linux
: P2 normal
Target Milestone: 4.0.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-12 03:03 UTC by Matija Nalis
Modified: 2023-11-24 09:12 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Possible fix patch None Giovanni Bechis [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Matija Nalis 2021-11-12 03:03:58 UTC
TxRep seems to return nonsensical scores. I'm using MySQL table if it matters (as DB files have long ago become unusable to me due to heavy locking & timeouts).

I've finally taken some time to try to debug it, and first issue was that 3.4.6 was generating many same MSGID tokens ("da39a3ee5e6b4b0d3255bfef95601890afd80709@sa_generated" had count>10 in a few minutes), which would then get reused by ham and spam because "that mail was already seen".

(I've partially tracked that problem down to the with how sha1 hash for "xxxxxx@sa_generated" is created in 3.4.6 - TxRep was using "Mail::SpamAssassin::Plugin::Bayes->get_msgid()" which seems to be  case-sensitive and only works for one case of "Message-Id", otherwise it tries to fall back to using hash of date/body but...) 

Anyway I've seen SVN trunk has changed that part of the code, so I've simply disabled MSGID tokens with "txrep_track_messages 0" and truncated the txrep table, hoping that would solve the issue. It did not - it still returned strange results (spammy score for hams etc.)

I've then tried getting SVN trunk TxRep.pm version, with no luck (it still worked wrong, and I've had to copy new generate_msgid() to make it work)

I've then nuked the txrep table; added some debug, and start feeding one clearly ham e-mail several times through "spamassassin -L -t". This is how mysql table looked for first 5 runs (I'm only focusing on EMAILIP tag here, but the same problem is with others):

        +----------+---------------+------+----------+----------+----------+---------------------+
        | username | email         | ip   | msgcount | totscore | signedby | last_hit            |
        +----------+---------------+------+----------+----------+----------+---------------------+
1st     | amavis   | hepi@hep.hr   | none |        1 |   -10.21 | spf      | 2021-11-12 03:07:03 |
2nd     | amavis   | hepi@hep.hr   | none |        2 |   -10.21 | spf      | 2021-11-12 03:09:27 |
3rd     | amavis   | hepi@hep.hr   | none |        3 |   -10.21 | spf      | 2021-11-12 03:10:24 |
4th     | amavis   | hepi@hep.hr   | none |        4 |   -10.21 | spf      | 2021-11-12 03:11:17 |
5th     | amavis   | hepi@hep.hr   | none |        5 |   -10.21 | spf      | 2021-11-12 03:12:54 |

I've added following debug just after:
 $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;

dbg("TxRep:   mn %s _formula delta = (total()=%0.3f + msgscore=%0.3f) / (1 + count()=%0.3f) - msgscore=%0.3f = %0.3f", $tag_id, $self->total(), $msgscore, $self->count(), $msgscore, $delta);


And this is what it printed for that first 5 runs:
dbg: TxRep: mn EMAILIP _formula delta = (total()=0.000 + msgscore=-10.210) / (1 + count()=0.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=1.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=2.000) - msgscore=-10.210 = 3.403
dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=3.000) - msgscore=-10.210 = 5.105
dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=4.000) - msgscore=-10.210 = 6.126

This looks wrong. I've started with TXREP=0 SA score, and after receiving 5 HAM messages from that sender, TXREP now returns high positive SPAM score:
 3.1 TXREP                  TXREP: Score normalizing based on sender's reputation

The more HAM I feed it, the higher the SPAM score gets.

I'm thinking $delta is supposed to get slightly more negative with each HAM that passes through, or at least remain the same, and definitely not start classifying the email as SPAM. Is my assumption correct? Any idea how $delta calculation should actually work here?
Comment 1 Matija Nalis 2021-11-12 03:48:07 UTC
One observation: it seems that  "totscore" is not always being changed while "msgcount" is. Should it have been?
Because, if it were changed at the same rate, then that formula *would* keep delta at zero, e.g.:

dbg: TxRep: mn EMAILIP _formula delta = (total()=0.000 + msgscore=-10.210) / (1 + count()=0.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-10.210 + msgscore=-10.210) / (1 + count()=1.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-20.420 + msgscore=-10.210) / (1 + count()=2.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-30.630 + msgscore=-10.210) / (1 + count()=3.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-40.840 + msgscore=-10.210) / (1 + count()=4.000) - msgscore=-10.210 = 0.000


I've seen in code that calling add_score()  is sometimes connected to (non-default) "txrep_autolearn 1". Enabling autolearn does indeed make "totscore" change, but in a wrong way too, and also "msgcount" gets increased by 2 instead of by 1. The miscalculation leading from ham to spam is still there, even with autolearn enabled though:

+----------+---------------+------+----------+----------+----------+---------------------+
| username | email         | ip   | msgcount | totscore | signedby | last_hit            |
+----------+---------------+------+----------+----------+----------+---------------------+
| amavis   | hepi@hep.hr   | none |        2 |   -30.21 | spf      | 2021-11-12 04:41:52 |
| amavis   | hepi@hep.hr   | none |        4 | -23.4033 | spf      | 2021-11-12 04:43:22 |
| amavis   | hepi@hep.hr   | none |        6 |  -22.042 | spf      | 2021-11-12 04:43:58 |
| amavis   | hepi@hep.hr   | none |        8 | -21.4586 | spf      | 2021-11-12 04:44:30 |
| amavis   | hepi@hep.hr   | none |       10 | -21.1344 | spf      | 2021-11-12 04:44:59 |




dbg: TxRep: mn EMAILIP _formula delta = (total()=0.000 + msgscore=-10.210) / (1 + count()=0.000) - msgscore=-10.210 = 0.000
dbg: TxRep: mn EMAILIP _formula delta = (total()=-30.210 + msgscore=-10.210) / (1 + count()=2.000) - msgscore=-10.210 = -3.263
dbg: TxRep: mn EMAILIP _formula delta = (total()=-23.403 + msgscore=-10.210) / (1 + count()=4.000) - msgscore=-10.210 = 3.487
dbg: TxRep: mn EMAILIP _formula delta = (total()=-22.042 + msgscore=-10.210) / (1 + count()=6.000) - msgscore=-10.210 = 5.603
dbg: TxRep: mn EMAILIP _formula delta = (total()=-21.459 + msgscore=-10.210) / (1 + count()=8.000) - msgscore=-10.210 = 6.691

 3.3 TXREP                  TXREP: Score normalizing based on sender's reputation
Comment 2 Paul Stead 2022-04-11 10:19:48 UTC
Hi - thanks for taking the time to get all this information together.

I think this could be partly linked to https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7965

Regarding the +2 - this is as "intended", as coded - the first score is for the standard reputation, and a second score for the key is the "learned" score - be this autolearning or manually learned.

https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?revision=1896315&view=markup#l1283
and
https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?revision=1896315&view=markup#l1504
etc

With this in mind and the recent adjustment to trunk, could you retest your situation? Feel free to come back with more information to help pinpoint the issue if these updates don't help
Comment 3 Matija Nalis 2022-11-09 23:45:59 UTC
Thanks Paul for your efforts! 

Unfortunately, I hadn't had a change to try your fix yet, as I've had to drop TxRep in favor of AWL in early 2022 in order to make production functional again, and not having time to try to test and bring it back...

However, since AWL with SQL backend also seems buggy, and I'll have to invest time to rebuild database anyway, I think I might give TxRep another try. Might be worth doing it before 4.0 gets out, in order to iron out bugs there and save other people some headaches. 

However, I've found another  bug in SQLBasedAddrList.pm which seems it might be affecting not only AWL but TxRep as well: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8072

Could you take a look there if that would affect TxRep as well?

Also, can I just grab SQLBasedAddrList.pm and TxRep.pm from trunk; or do I have to go full-trunk (which would be much harder to swallow as I can basically test it only by deploying it in production) ?
Comment 4 Giovanni Bechis 2023-05-02 08:43:22 UTC
Created attachment 5883 [details]
Possible fix

Delta formula is:
$delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore;

If we consider the case when:
- TxRep database has 15 emails that matches ($self->count() = 15)
- spam message has score 40 (spam)
- calculated TxRep score is 20 (spam)
- new TxRep score will be (20 + 40) / ( 1 + 15 ) - 40 = -36.25
In this case the spam message will have a total score of 40 - 36.25 = 3.75 and it won't be flagged as spam.

The attached patch doesn't consider those messages in the delta calculation.
Comment 5 Giovanni Bechis 2023-05-04 16:15:02 UTC
Sending        lib/Mail/SpamAssassin/Plugin/TxRep.pm
Transmitting file data .done
Committing transaction...
Committed revision 1909608.