SA Bugzilla – Bug 7235
Plugin::TxRep: txrep.count field reaching 32-bit maxint
Last modified: 2018-10-29 12:56:52 UTC
Found the following in our logs after using the TxRep plugin for a couple of months: SA info: auto-whitelist: sql-based add_score/update 4294967295|-1.709|vscan|no-reply@dropbox.com|amazonses.com|dropbox.com|none: SQL error: Out of range value for column 'count' at row 1 SA info: auto-whitelist: sql-based add_score/update 4294967295|-1.709|vscan|amazonses.com dropbox.com|amazonses.com|dropbox.com|none: SQL error: Out of range value for column 'count' at row 1 SA info: auto-whitelist: sql-based add_score/update 4294967295|2.265|vscan|messages-noreply@fitbit.com|email.fitbit.com|fitbit.com|none: SQL error: Out of range value for column 'count' at row 1 SA info: auto-whitelist: sql-based add_score/update 4294967295|2.265|vscan|email.fitbit.com fitbit.com|email.fitbit.com|fitbit.com|none: SQL error: Out of range value for column 'count' at row 1 SA info: auto-whitelist: sql-based add_score/update 4294967295|2.786|vscan|messages-noreply@fitbit.com|email.fitbit.com|fitbit.com|none: SQL error: Out of range value for column 'count' at row 1 SA info: auto-whitelist: sql-based add_score/update 4294967295|2.786|vscan|email.fitbit.com fitbit.com|email.fitbit.com|fitbit.com|none: SQL error: Out of range value for column 'count' at row 1 Seems the txrep.count in MySQL table is limited to a 32-bit integer. Can't imagine what chain of events lead to this large value in a couple of months, we are not receiving hundreds of messages per second. The info() is reported by SQLBasedAddrList.pm, called from TxRep.pm line 1569: sub add_score [...] } else { $self->{checker}->add_score($self->{entry}, $score); } mysql> select count(*) from txrep where count=2147483647; +----------+ | count(*) | +----------+ | 330 | +----------+ mysql> select count(*) from txrep where count>2147483647; +----------+ | count(*) | +----------+ | 0 | +----------+ mysql> select count(*) from txrep; +----------+ | count(*) | +----------+ | 1374582 | +----------+ mysql> select email,ip,count,totscore,signedby from txrep order by count desc limit 10; +---------------------------------------+------+------------+----------+--------------------+ | email | ip | count | totscore | signedby | +---------------------------------------+------+------------+----------+--------------------+ | 500px.com mandrillapp.com | none | 2147483647 | 88.6 | 500px.com | | amazon-offers@amazon.co.uk | none | 2147483647 | 23.79 | amazonses.com | | 30.memberemail.com novusbio-email.com | none | 2147483647 | -10.291 | 30.memberemail.com | | amazonses.com 1nadan.si | none | 2147483647 | 331.591 | amazonses.com | | amazonses.com 1nadan.si | none | 2147483647 | 331.591 | 1nadan.si | | 30.memberemail.com novusbio-email.com | none | 2147483647 | -10.291 | novusbio-email.com | | 500px.com mandrillapp.com | none | 2147483647 | 88.6 | mandrillapp.com | | amazon-offers@amazon.co.uk | none | 2147483647 | 23.79 | amazon.co.uk | | amavis.org gmail.com | none | 2147483647 | -171.07 | gmail.com | | amavis.org gmail.com | none | 2147483647 | -171.07 | amavis.org | +---------------------------------------+------+------------+----------+--------------------+ After some 350 records the sorted txrep.count slowly starts to fall to lower (but still large) values.
Yep, gotta be a bug. I see it as well.
This is caused by the following lines in lib/Mail/SpamAssassin/SQLBasedAddrList.pm while ( defined($aryref = $sth->fetchrow_arrayref()) ) { if (defined $entry->{count} && defined $aryref->[1]) { $entry->{count} += $aryref->[0]; $entry->{totscore} += $aryref->[1]; } When a message is signed by multiple domains, $entry->{count} is the total of all rows for those different signers. Then when they get written back to the table, each row's new count value is $entry->{count} + 1. With two signers, this changes the behavior from (n + 1) to (2n + 1), meaning you get to that max value after just 31 iterations. For a reproduction, test any message that has multiple valid DKIM signatures from different domains. Not sure how to fix it yet, since it's pretty deep into SQLBasedAddrList and that count value is used in many places throughout txrep.
Presumably $entry->{count} is needed for the calculation, but the individual counts should be separately incremented in the database. The currently behaviour is only right for a single row.
Due to the number of places that would have to be touched in order to properly handle the fact that multiple reputation rows match the message, as a workaround for now we'll just use the newest record's data. If anyone wants to put the time into a better fix, feel free. Committed revision 1698165.