SA Bugzilla – Bug 7881
sa-learn queries uribl for already known messages
Last modified: 2021-11-10 01:00:19 UTC
When running sa-learn on already known messages uribl queries for urls in them are made. This can result in getting blacklisted by uribl (URIBL_BLOCKED) due to too many requests in a short time if one has a lot messages. A caching name server doesn't help, as a lot of different urls can be contained in the messages. How to reproduce: 1. Start tcpdump to obtain all queries made: tcpdump -i lo udp and port 53 | grep uribl If you don't use a local dns server you might need to adjust "lo" to your network device. 2. Run sa-learn, e.g. /usr/bin/sa-learn --ham /home/USER/Maildir/cur/ 3. You can see in the output of tcpdump a lot domains like example.org.multi.uribl.com. . 4. Repeat running sa-learn. 5. You see the dns queries, again. My bayes_seen file seems to get updated, at least the file's timestamp changes. Running sa-learn with -L gets rid of the queries and tcpdump's output (also related: bug 5837 ). The issue isn't general queries of sa-learn though, but that they are done for already known messages.
I can't reproduce this with 3.4.4 or newer versions. What does sa-learn display if you add -D (debug) option? Are there dns: or async: lines that clearly show launching queries?
Please try latest version. Make sure the queries don't come from elsewhere on the server. If you still have the problem, feel free to reopen.
Well it seems using TxRep with 3.4 launches DNS queries for some reason..
I have seen this to, when running sa-learn with TXREP enabled, without -L, on 3.4.5. The basic problem (summarizingly unreasonably) seems to be that sa-learn is an interface to the learn method of any plugin that has one, and that while the man page makes claims about network access, they aren't true. Also, there may be an expectation from the man page that learning is a quick process of scanning tokens (as it is for bayes) but for TXREP it seems to involve scoring the message. I suggest two changes: Fix sa-learn(1) to say that it's an interface to learning methods, and that the behavior is controlled by the plugin. Remove the notion that -L is not needed, and instead say that -L is strongly recommended for bulk learning. Add a command to show the currrent learn methods that would be invoked. Change txrep to not rescore on learning, and instead just use the configure ham/spam scores. Once the user has declared ham/spam the score is not really of interest.
Some plugins in 3.4 launched queries at parsed_metadata. Which apparenly wasn't called without TxRep, which explicitly calls extract_message_metadata() because it needs relay data.. Committed some fixes for 3.4, but I doubt 3.4.7 release will be seen, all work is for 4.0 now. Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/ASN.pm Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/AskDNS.pm Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm Sending spamassassin-3.4/lib/Mail/SpamAssassin.pm Transmitting file data ....done Committing transaction... Committed revision 1888903.
Are you saying this is fixed in 4, or wasn't a bug there, or ?
(In reply to Greg Troxel from comment #6) > Are you saying this is fixed in 4, or wasn't a bug there, or ? Plugins in 4 work differently so no issue there.