|
SA Bugzilla – Full Text Bug Listing |
Summary: | BODY_SINGLE_WORD triggers on base64 encoded text with more than one word. | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Mark London <mrl> |
Component: | Rules | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | billcole, rwmaillists |
Priority: | P2 | ||
Version: | 3.4.0 | ||
Target Milestone: | Undefined | ||
Hardware: | Other | ||
OS: | Linux | ||
Whiteboard: | |||
Attachments: | Email shows problem. |
That message is badly malformed. The Content-Type header is invalid (missing spaces,) there is no MIME-Version header, the Message-ID header is invalid (missing angle brackets) and some of the putative MIME parts are improperly encoded into lines an order of magnitude longer than MIME allows. As a result, there is no formally correct way to parse this message. That any software can make any sense of it is a tribute to how lenient mail software is. It is unclear to me why it is hitting BODY_SINGLE_WORD but it is also hitting HTML_IMAGE_ONLY_20 and BODY_URI_ONLY incorrectly and I expect that all of these are due to SA being confused by the compound pathology of the message. Note that the rules it correctly hits (BASE64_LENGTH_79_INF, BAYES_50, MIME_HEADER_CTYPE_ONLY, MISSING_SUBJECT, and INVALID_MSGID) add up to 5.3, so even if we figured out precisely how the 3 bogus hits happened and fixed that, SA would (by default) still call it spam. The "garbage in, garbage out" principle applies here. It is not a bug for SpamAssassin to misparse a message that technically has no correct parsing. Actually it is a bug that I pointed-out some time ago - I don't recall the bug number. The problem is in body __BODY_TEXT_LINE /^\s*\S/ body __BODY_TEXT_LINE multiple maxhits=3 the count usually include the Subject line, but only if the header is present and contain a non-space character. In the attached email the multi-word paragraph is counted as if it were the subject. IMO it should be body __BODY_TEXT_LINE_FULL /^\s*\S/ body __BODY_TEXT_LINE_FULL multiple maxhits=3 header __SUBJECT_HAS_NON_SPACE Subject =~ /\S/ meta __BODY_TEXT_LINE __BODY_TEXT_LINE_FULL - __SUBJECT_HAS_NON_SPACE The arithmetic for __BODY_SINGLE_WORD, __BODY_URI_ONLY & __EMPTY_BODY then needs to be adjusted for __BODY_TEXT_LINE being one smaller. |
Created attachment 5493 [details] Email shows problem. See attachment. There is a paragraph of text in the following mime attachment, but the it's triggering the "one word text message" rule. Content-Type: text/plain;Name="text_0.txt";Charset="utf-8" Content-Disposition: Attachment;Filename="text_0.txt";Charset="utf-8" Content-Location: text_0.txt Content-Transfer-Encoding: base64