Bug 8064 - Sa-learn takes a very long time to learn each letter
Summary: Sa-learn takes a very long time to learn each letter
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 3.4.6
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-14 14:09 UTC by Allex
Modified: 2022-10-17 11:52 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Log and settings application/zip None Allex [NoCLA]
MSG 111.msg and 222.msg application/zip None Allex [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Allex 2022-10-14 14:09:47 UTC
Created attachment 5845 [details]
Log and settings

When I teach him the Bayesian classifier, a lot of time is spent on each letter, more than 30 seconds! I can't understand why this is happening. Here is a piece of the sa-learn log where you can see the delay:


---- begin -----
sa-learn -D --spam --no-sync --username=vmail /tmp/111.msg
...
Oct 14 16:18:14.126 [482455] dbg: uri: canonicalizing parsed uri: mailto:allex@mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: cleaned uri: mailto:allex@mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: added host: mydomain.com domain: mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: canonicalizing domainkeys uri: domainkeys:mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: cleaned uri: domainkeys:mydomain.com
Oct 14 16:18:14.126 [482455] dbg: uri: added host: mydomain.com domain: mydomain.com
Oct 14 16:18:14.358 [482455] dbg: bayes: tokenized body: 11 tokens
Oct 14 16:18:14.358 [482455] dbg: bayes: tokenized uri: 5 tokens
Oct 14 16:18:14.358 [482455] dbg: bayes: tokenized invisible: 0 tokens
Oct 14 16:18:14.360 [482455] dbg: bayes: tokenized header: 145 tokens
Oct 14 16:18:49.346 [482455] dbg: bayes: tokenized body: 11 tokens
Oct 14 16:18:49.346 [482455] dbg: bayes: tokenized uri: 5 tokens
Oct 14 16:18:49.346 [482455] dbg: bayes: tokenized invisible: 0 tokens
Oct 14 16:18:49.347 [482455] dbg: bayes: tokenized header: 145 tokens
Oct 14 16:19:25.725 [482455] dbg: bayes: seen (92892bf23689ce621c550aee0ed36d2e8264a618@sa_generated) put
Oct 14 16:19:25.725 [482455] dbg: bayes: learned '92892bf23689ce621c550aee0ed36d2e8264a618@sa_generated', atime: 1665752160
Oct 14 16:19:25.725 [482455] dbg: TxRep: learning a message
Oct 14 16:19:25.725 [482455] dbg: check: pms new, time limit in 228.393 s
Oct 14 16:19:25.725 [482455] dbg: message: using Return-Path header as EnvelopeFrom: 'allex@mydomain.com'
Oct 14 16:19:25.725 [482455] dbg: check: tagrun - tag SENDERDOMAIN is now ready, value: mydomain.com
Oct 14 16:19:25.725 [482455] dbg: check: tagrun - tag AUTHORDOMAIN is now ready, value: mydomain.com
...

...
----- end ------

I thought at first that it might be Ackdns, I tried to comment out the plugin in the v340.pre file, but it didn't help.

I can't understand why there is a delay in these places. I tried running spamassassin without using Mysql - the delay in training is about the same.
I didn't include any exclusive parameters. Everything was set up with a clean install.
I attach the full output of sa-lern logs, as well as all my configuration files.

Otherwise, spamassassin works as it should in a bundle of Postfix+Dovecot+Spamassassin+Roundcube (Ubuntu 20.04). I need to get rid of the delay, because when a user clicks the "spam" button in Roundcube, it takes a very long time until the email is examined. Users complain about such a long delay.
Comment 1 Henrik Krohns 2022-10-15 12:33:05 UTC
Is this for any message or that specific 111.msg? Can you share it?
Comment 2 Allex 2022-10-17 11:52:25 UTC
Created attachment 5850 [details]
MSG 111.msg and 222.msg

The delay occurs not only on this email, it occurs on all emails. I have now tried to move some emails to spam using the "Spam" button in Roundcube. Some emails move quickly ~2-3 seconds. The delay is different each time, from 10 seconds to 60 seconds. I am attaching letter 111 .msg, letter 222.msg moved very quickly through the Roundcube web interface. I suspected that maybe it was because of DNS, I tried different DNS on the server (1.1.1.1, 8.8.8.8) - the problem still exists.

I do not know what else can be done. :(