SA Bugzilla – Bug 7844
U+1D5B5 MATHEMATICAL SANS-SERIF...
Last modified: 2020-08-12 19:54:16 UTC
It seems this slips through spamassassin: From: "π‘π πππππ π€π£" <newsletter@express.be> Subject: ππΎππΎπ ππΎπΎπ½ π΅ππΊπππΊ π ππΊππ $ unicode π΅ππΊπππΊ | grep ^U+ U+1D5B5 MATHEMATICAL SANS-SERIF CAPITAL V U+1D5C2 MATHEMATICAL SANS-SERIF SMALL I U+1D5BA MATHEMATICAL SANS-SERIF SMALL A U+1D5C0 MATHEMATICAL SANS-SERIF SMALL G U+1D5CB MATHEMATICAL SANS-SERIF SMALL R U+1D5BA MATHEMATICAL SANS-SERIF SMALL A
Do you have a spample of this you can put on pastebin?
Just copy and paste the raw Subject above into a new test mail. SA Bugzilla did not mangle the Unicode. The raw characters you need are right there embedded into this bug report web page.
Has anybody else started working on this already? If not, I'll get started.
John, I was looking for a spample and was going to run it through my tests and see whether the replace_tags in KAM.cf and the replace_tags in stock hit. They might just need a few small adjustments. However, I don't recommend working on a snippet alone. We need to see what the entire message scored as and whether this would move the needed. Without a real-world spample, I think this should be paused.
Try $ echo test | mail -s 'ππΎππΎπ ππΎπΎπ½ π΅ππΊπππΊ π ππΊππ' $USER That is all you need. I am certain.
(In reply to jidanni from comment #2) > Just copy and paste the raw Subject above into a new test mail. > SA Bugzilla did not mangle the Unicode. The raw characters you need are > right there embedded into this bug report web page. The exact formatting of the subject is important and synthesizing it is not good. For example, when I process just what you post, it looks like the V is unicode uD835uDDB5 but you posted u1D5B5 so I can't get a hit with replacetags that works. The subject will have to be encoded to work because I think subjects have to be in ascii. What's the exact subject code from a source view of the email? This is why spamples are important.
$ echo ... | mutt -s ... made Subject: =?utf-8?B?8J2WrfCdlr7wnZeP8J2WvvCdl4sg?= =?utf-8?B?8J2WrfCdlr7wnZa+8J2WvSDwnZa18J2XgvCdlrrwnZeA8J2Xi/Cdlrog?= =?utf-8?B?8J2WoPCdl4DwnZa68J2XgvCdl4c=?=
$ echo π΅ππΊπππΊ|base64 8J2WtfCdl4LwnZa68J2XgPCdl4vwnZa6Cg==
Why can't you provide a real-world spample?
The original message it "too dangerous to ever let anyone see" OK? Just make sure spamassassin can catch the word Viagra in a subject, no matter what charset. Thanks.
Without a spample, this is going nowhere. At a minimum, recommend you provide an unadultered Subject header and From headers. John, I recommend closing as worksforme otherwise. You decide.
If the danger is you feel it's a security issue, reclassify the bug to security. That makes the information non-public.
Created attachment 5713 [details] Crafted example of the issue I have constructed an example of the issue.
I'll see what the spample does.
(In reply to John Hardin from comment #14) > I'll see what the spample does. Assuming that spample is indicative of a real world spam, it requires two additions to the replace tag rules in KAM.cf. Look at __KAM_VIAGRA2 and the replace tag for G1 and R1.
(The original message was full of https://en.wikipedia.org/wiki/Personal_data https://en.wikipedia.org/wiki/Web_beacon Even asking the user to retrieve it via his Goofy Inc. Pro Mail Browser etc. to send it to you guys would have triggered them. So you will have to just do with the Subject, (which by the way was just raw, not base64, QP, etc. it turns out.) Anyway the point is: just catch the V word in the subject. Thanks.)
(In reply to jidanni from comment #16) > (The original message was full of > https://en.wikipedia.org/wiki/Personal_data > https://en.wikipedia.org/wiki/Web_beacon > Even asking the user to retrieve it via his > Goofy Inc. Pro Mail Browser etc. > to send it to you guys would have triggered them. > So you will have to just do with the Subject, Which the attached pseudo-message uses. > (which by the way was just raw, not base64, QP, etc. > it turns out.) Really? That's not what you said earlier, when you provided a base64-encoded version of the header. It's hard to know what to believe, absent an actual message. Real raw non-ASCII characters in headers are non-compliant with relevant RFCs and may actually cause concrete problems with some software, so they are rare.
Without a spample, this is a waste of too many people's time and energy.
I think this is a bit harsh. He may have stated it badly, but we know what the problem is. It not as if it's new, the use of mathematical sans serif for obfuscation was discussed in the user list thread "base64 encoded sextorsion". I suspect that the use of any of the mathematical typesetting characters in a Subject header is worth scoring in its own right. As regards encoding I would hope it makes no difference to header rules (without :raw). Non-encoded 8-bit header text should be left as it is.
(In reply to jidanni from comment #16) > (The original message was full of > https://en.wikipedia.org/wiki/Personal_data > https://en.wikipedia.org/wiki/Web_beacon > Even asking the user to retrieve it via his > Goofy Inc. Pro Mail Browser etc. Sanitizing the PII would probably not corrupt the analysis of the message, but to avoid even that risk you could provide the spample privately to *one* of us as a gzipped email attachment. I have long experience dealing with confidential information, I'm sure Kevin does too. Are you willing to send an intact copy to me privately so that I can test changes against it? Absent that, I can ensure all of the mathematical glyphs are in the replace list and *hope* that works. > to send it to you guys would have triggered them. SA doesn't follow links when it scans and I doubt any of us will view a spample in an HTML-enabled MUA. > So you will have to just do with the Subject, > (which by the way was just raw, not base64, QP, etc. > it turns out.) Which means that the spample Bill ginned up for the bug is not an accurate representation of your real-life spam. > Anyway the point is: just catch the V word in the subject. > Thanks.)
Committed revision 1880592. This should fix "viagra". I'm working on adding the remaining letters for full coverage.
(Well in general telling Grandma/Grandpa "That's terrible that you got a spam. Send me a copy so can tell the SpamAssassin team." Will end up in them certainly "opening the spam message" with all the dangers involved. Indeed, opening it several times, before figuring out how to forward it.)
The changes to the tags brings in 0.8 FUZZY_VPILL I think this is more of a hammer shaped problem. If anyone is interested, I'm going with this: header SUBJ_UCMATH Subject =~ /\xf0\x9d[\x90-\x9f][\x80-\xbf]/ meta SUBJ_UNENC_UCMATH UBJ_UCMATH && __SUBJECT_NEEDS_MIME I doubt these characters are particularly common in email and probably much less common in subjects. Even if someone pastes them into a subject field they will normally be MIME encoded. I've received a few with this kind of subject obfuscation through gmail. Like the OP's they had no encoding (technically RFC compliant at gmail) or broken encoding (see bug 6352 for an example).
Yeah, except for some math professors, nobody should be using MATHEMATICAL SANS-SERIF CAPITALs etc. so nab 'em!
(In reply to jidanni from comment #22) > (Well in general telling Grandma/Grandpa "That's terrible that you got a > spam. Send me a copy so can tell the SpamAssassin team." Will end up in them > certainly "opening the spam message" with all the dangers involved. Indeed, > opening it several times, before figuring out how to forward it.) Ah, ok. I was assuming this was something you had administrative access to.
Committed revision 1880815 This adds the remaining letters from the mathematical glyphs ranges.