Bug 7934 - archive-iterator errors in sa-learn when something else marks messages as read in maildir
Summary: archive-iterator errors in sa-learn when something else marks messages as rea...
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Learner (show other bugs)
Version: 3.4.6
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
Depends on:
Reported: 2021-10-06 05:07 UTC by david
Modified: 2021-10-06 05:51 UTC (History)
1 user (show)

Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description david 2021-10-06 05:07:42 UTC
If I run sa-learn on the cur folder of a Maildir, and simultaneously mark an unread message as read in that same Maildir, I get errors like this:

archive-iterator: unable to open [redacted]:2,: No such file or directory

Looking at https://github.com/apache/spamassassin/blob/spamassassin_release_3_4_6/lib/Mail/SpamAssassin/ArchiveIterator.pm I think what's happening is that _scan_targets is finding all the files before I mark the message as read, then _run is iterating through those messages. I think that while _run is iterating through the messages, marking a message as read renames the file, and when _run_file is called on that file, it calls _mail_open, which fails to open the old filename. It looks like right below where _run_file calls _mail_open, https://github.com/apache/spamassassin/blob/4a1fe99da9296364be0c50f02d2a73b5af74207a/lib/Mail/SpamAssassin/ArchiveIterator.pm#L354-L357 checks if the file exists, and ignores it if it doesn't. Should that check happen before the call to _mail_open? I'm not used to perl though, so I'm not confident I understood that correctly.

Would it be possible to ignore files that disappear from a Maildir while sa-learn is running? Or maybe scan for files and run at the same time, to shorten the window for this problem happening?
Comment 1 Henrik Krohns 2021-10-06 05:51:44 UTC
Yeah the code is a bit convoluted..

To properly ignore disappeared maildir files, _run_message/_run_file need to be passed info that Maildir is being handled and act accordingly, otherwise it might not warn for other cases of missing files.