Bug 7881 - sa-learn queries uribl for already known messages
Summary: sa-learn queries uribl for already known messages
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 3.4.4
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-20 14:04 UTC by Alexander Kauer
Modified: 2021-11-10 01:00 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Kauer 2021-01-20 14:04:38 UTC
When running sa-learn on already known messages uribl queries for urls in them are made. This can result in getting blacklisted by uribl (URIBL_BLOCKED) due to too many requests in a short time if one has a lot messages. A caching name server doesn't help, as a lot of different urls can be contained in the messages.

How to reproduce:

1. Start tcpdump to obtain all queries made:

        tcpdump -i lo udp and port 53 | grep uribl

   If you don't use a local dns server you might need to adjust "lo" to your network device.

2. Run sa-learn, e.g.

        /usr/bin/sa-learn --ham /home/USER/Maildir/cur/

3. You can see in the output of tcpdump a lot domains like example.org.multi.uribl.com. .

4. Repeat running sa-learn.

5. You see the dns queries, again.

My bayes_seen file seems to get updated, at least the file's timestamp changes. Running sa-learn with -L gets rid of the queries and tcpdump's output (also related: bug 5837 ). The issue isn't general queries of sa-learn though, but that they are done for already known messages.
Comment 1 Henrik Krohns 2021-04-08 07:05:50 UTC
I can't reproduce this with 3.4.4 or newer versions.

What does sa-learn display if you add -D (debug) option? Are there dns: or async: lines that clearly show launching queries?
Comment 2 Henrik Krohns 2021-04-15 05:07:01 UTC
Please try latest version. Make sure the queries don't come from elsewhere on the server. If you still have the problem, feel free to reopen.
Comment 3 Henrik Krohns 2021-04-15 12:53:21 UTC
Well it seems using TxRep with 3.4 launches DNS queries for some reason..
Comment 4 Greg Troxel 2021-04-15 16:29:19 UTC
I have seen this to, when running sa-learn with TXREP enabled, without -L, on 3.4.5.

The basic problem (summarizingly unreasonably) seems to be that sa-learn is an interface to the learn method of any plugin that has one, and that while the man page makes claims about network access, they aren't true.   Also, there may be an expectation from the man page that learning is a quick process of scanning tokens (as it is for bayes) but for TXREP it seems to involve scoring the message.

I suggest two changes:

Fix sa-learn(1) to say that it's an interface to learning methods, and that the behavior is controlled by the plugin.  Remove the notion that -L is not needed, and instead say that -L is strongly recommended for bulk learning.   Add a command to show the currrent learn methods that would be invoked.

Change txrep to not rescore on learning, and instead just use the configure ham/spam scores.   Once the user has declared ham/spam the score is not really of interest.
Comment 5 Henrik Krohns 2021-04-18 14:47:21 UTC
Some plugins in 3.4 launched queries at parsed_metadata. Which apparenly wasn't called without TxRep, which explicitly calls extract_message_metadata() because it needs relay data..

Committed some fixes for 3.4, but I doubt 3.4.7 release will be seen, all work is for 4.0 now.

Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/ASN.pm
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/AskDNS.pm
Sending        spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        spamassassin-3.4/lib/Mail/SpamAssassin.pm
Transmitting file data ....done
Committing transaction...
Committed revision 1888903.
Comment 6 Greg Troxel 2021-04-18 14:50:34 UTC
Are you saying this is fixed in 4, or wasn't a bug there, or ?
Comment 7 Henrik Krohns 2021-04-18 14:54:00 UTC
(In reply to Greg Troxel from comment #6)
> Are you saying this is fixed in 4, or wasn't a bug there, or ?

Plugins in 4 work differently so no issue there.