Bug 7445

Summary: The default mbox separator regex is dangerously pedantic
Product: Spamassassin Reporter: RW <rwmaillists>
Component: LibrariesAssignee: SpamAssassin Developer Mailing List <dev>
Status: RESOLVED FIXED    
Severity: blocker CC: billcole, kmcgrail
Priority: P2    
Version: 3.4.1   
Target Milestone: 3.4.2   
Hardware: All   
OS: All   
Whiteboard:
Attachments: patch to allow single spacing

Description RW 2017-07-10 20:36:34 UTC
Created attachment 5456 [details]
patch to allow  single spacing

In the user list thread "sa-learn won't read db created via MSTOR" sa-learn found no emails in an mbox file because the separator looked like this:

From - Sat Jul 8 01:02:28 2017

The important thing is the single space before the 8. The default regex looks for " .\d " so a single digit date has to be justified with either an extra space or a leading 0. 

What's particularly bad is that it's not consistent. The OP was lucky that it happened on the 8th, a couple of days later and it would have appeared to work. The worst case is where the dates are mixed, in which case many of the emails will get concatenated.  Changing "." to ".?" fixes the problem - see patch.
Comment 1 Bill Cole 2018-09-04 22:28:30 UTC
This change looks good to me. Makes padding of single-digit dates in mbox separator lines optional. 

I believe this qualifies as trivial under this criteria on the DevelopmentMode page:


    very simple, non-controversial, and absolutely safe bug fixes (i.e.: removing repetitive my() enclosing sections)

Committed in r1840072