|
SA Bugzilla – Full Text Bug Listing |
Summary: | [review] MIME_QP_LONG_LINE triggering on valid email | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Jason Haar <jhaar> |
Component: | Rules | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | axb.lists, blentz, bugzilla.spamassassin.org, kmcgrail, ralston |
Priority: | P5 | ||
Version: | 3.2.0 | ||
Target Milestone: | 3.3.2 | ||
Hardware: | Other | ||
OS: | other | ||
Whiteboard: | ready to commit | ||
Attachments: |
email demonstrating bad QP
Proposed patch to change a QP line limit from 76 to 78 |
Description
Jason Haar
2007-05-31 19:32:02 UTC
Created attachment 3955 [details]
email demonstrating bad QP
I concur, I've had several FPs at my site, with this rule being the straw that breaks the camel's back. This is mostly happening on newsletter-type emails (Boston Globe, for example), scoring a combination of MIME_HTML_ONLY + MIME_QP_LONG_LINE + DCC/Razor2/Pyzor (because it's bulk, but it's not spam). I'm thinking maybe the point value should be adjusted down, rather than changing the length, if the existing length is based on a particular RFC defining QP. FWIW, I wrote up a quoted-printable length function similar to the base64 length function. Here are some results: 0.352 0.4118 0.0100 0.976 1.00 0.00 T_QP_LENGTH_84_85 0.253 0.2971 0.0000 1.000 0.92 0.00 T_QP_LENGTH_82_83 0.314 0.3666 0.0100 0.973 0.88 0.00 T_QP_LENGTH_83_84 2.567 2.9953 0.0903 0.971 0.79 0.00 T_QP_LENGTH_81_82 4.209 4.8874 0.2911 0.944 0.71 0.00 T_QP_LENGTH_79_80 14.397 16.4744 2.3989 0.873 0.53 0.00 MIME_QP_LONG_LINE 4.975 5.7196 0.6725 0.895 0.50 0.00 T_QP_LENGTH_90_INF 5.701 6.4911 1.1342 0.851 0.46 0.00 T_QP_LENGTH_78_79 0.142 0.1668 0.0000 1.000 0.45 0.00 T_QP_LENGTH_89_90 2.755 3.1656 0.3814 0.892 0.42 0.00 T_QP_LENGTH_80_81 10.542 9.4239 17.0029 0.357 0.25 0.00 T_QP_LENGTH_77_78 0.230 0.2571 0.0703 0.785 0.24 0.00 T_QP_LENGTH_87_88 0.267 0.2902 0.1305 0.690 0.15 0.00 T_QP_LENGTH_86_87 0.243 0.2624 0.1305 0.668 0.03 0.00 T_QP_LENGTH_85_86 0.095 0.0990 0.0703 0.585 0.00 0.00 T_QP_LENGTH_88_89 So there's no clear winner here, though 81-85 may be interesting. 76 is the max BUT this is supposed to 'exclude' trailing CR/LF (if used)... so testing for 76 or 78 seems acceptable... I can confirm this behavior as of 2009-07-07. This mailer: X-Mailer: Apple Mail (2.935.3) has an error in the way it performs QP-encoding. Specifically, if the last character in a line is a raw "=" character, it doesn't seem to include the length of the expansion caused by encoding ("=" -> "=3D") in the line length calculation, which means that it will generate a QP-line that is 2 characters too long. For example, this 66-character line: blah blah blah blah blah blah blah blah blah foo foo foo foo ===== gets QP-encoded to this 77-character line: blah blah blah blah blah blah blah blah blah foo foo foo foo =3D=3D=3D=3D=3D= This trips the MIME_QP_LONG_LINE test. I suspect "Apple Mail (2.935.3)" could produce a 78-character line as well, depending on the column in which the final raw "=" falls. Can we change the maximum length for the MIME_QP_LONG_LINE test from 76 characters to 78 characters, please? That should stop this test from erroneously hitting on ham "Apple Mail (2.935.3)" mail... Status? I'm still seeing problems with this hitting on HAM, granted I'm still running 3.2.5. I got a note from the user that the mail that triggered this was composed using Outlook Web Access. Created attachment 4766 [details] Proposed patch to change a QP line limit from 76 to 78 - if (length > 77) { + # RFC 5322: Each line SHOULD be no more than 78 characters, + # excluding the CRLF + # RFC 2045: The Quoted-Printable encoding REQUIRES that + # encoded lines be no more than 76 characters long. + # Bug 5491: 6% of email classified as HAM by SA triggered the + # MIME_QP_LONG_LINE rule. Apple Mail can generate a QP-line + # that is 2 chars too long. Same goes for Outlook Web Access. + # lines include one trailing \n character + # if (length > 76+1) { # conforms to RFC 5322 and RFC 2045 + if (length > 78+1) { # conforms to RFC 5322 only, not RFC 2045 trunk: Bug 5491: MIME_QP_LONG_LINE triggering on valid email change a QP line limit from 76 to 78 Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm Committed revision 951065. (In reply to comment #8) > Created an attachment (id=4766) [details] > Proposed patch to change a QP line limit from 76 to 78 > > - if (length > 77) { > + # RFC 5322: Each line SHOULD be no more than 78 characters, > + # excluding the CRLF > + # RFC 2045: The Quoted-Printable encoding REQUIRES that > + # encoded lines be no more than 76 characters long. > + # Bug 5491: 6% of email classified as HAM by SA triggered the > + # MIME_QP_LONG_LINE rule. Apple Mail can generate a QP-line > + # that is 2 chars too long. Same goes for Outlook Web > Access. > + # lines include one trailing \n character > + # if (length > 76+1) { # conforms to RFC 5322 and RFC 2045 > + if (length > 78+1) { # conforms to RFC 5322 only, not RFC 2045 > > > trunk: > Bug 5491: MIME_QP_LONG_LINE triggering on valid email > change a QP line limit from 76 to 78 > Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm > Committed revision 951065. +1 Before we commit this, can we create a test rule that gives us an idea of the change in HAM/SPAM %'s that hit the rule when it's changed from 76 to 78? In short, the code looks great but my concern is that by NOT enforcing the RFC, is the rule going to be pointless. (In reply to comment #10) > Before we commit this, can we create a test rule that gives us an idea of the > change in HAM/SPAM %'s that hit the rule when it's changed from 76 to 78? > > In short, the code looks great but my concern is that by NOT enforcing the RFC, > is the rule going to be pointless. I'll change my vote to a +1. To delay the change is almost as pointless. KAM branch 3.3: Bug 5491: MIME_QP_LONG_LINE triggering on valid email Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm Committed revision 1100005. Closing. |