SA Bugzilla – Bug 6541
ReplaceTags: Experience matches french word "expérience"
Last modified: 2011-05-16 05:44:10 UTC
I am sending a newsletter in French that contains the word "expérience". When the mail passes through SpamAssassin, it triggers the "ReplaceTags: Experience" rule which adds 3.0 points. I don't think this rule should match valid words with accented characters. It seems to be far too easy for a legitimate mail written in French to be marked as spam this way. I'm guessing it could also happen with other French words like médication, crédit, ...).
I could have sworn this had already been reported, but can't find the bug. Regardless, I've long since disabled FRT_EXPERIENCE and FRT_APPROV locally due to FPs in French. Quickly scanning logs, FRT_DIPLOMA also occasionally hits on "diplomé", "diplôme", though this scores less and rarely causes FPs.
Checked in r1075489 and r1077335 to introduce variants the following words: credit penis medication million approve experience diploma I did not add an exclusion for "médication" because it's way too obscure, though to counter that point, we now exclude the Polish "dyplom" for diploma. Anybody looking to help on this front should look at rulesrc/sandbox/emailed/00_FVGT_File001.cf and rules/25_replace.cf or perhaps the entire collection with commands like these: egrep -ri '^(raw|body|header.*subject).*\(\?![a-z?]{2,}\)' rules* grep -ri '(?![^)]*[\[(?\\].*).*><' rules* I'm resolving this bug. Feel free to re-open with new FP examples.
(In reply to comment #2) > grep -ri '(?![^)]*[\[(?\\].*).*><' rules* Okay, that's hard to do without grep -P ... here's a more complete query: grep --color -riP '\(\?\!\K[^)]*[\[(?\\\w].*(?=\)[<>\w]{1,30}><)' or else using UNIX grep plus perl: grep -r . rules* |perl -ne 'print if /\(\?\![^)]*[\[(?\\\w].*\)[<>\w]{1,30}></' ... with colors: grep -r . trunk/rules* |perl -ne ' if (/^([^:]*)(.*\(\?\!)([^)]*[\[(?\\\w].*)(\)[<>\w]{1,30}><.*)/) { print "\e[0;35m$1\e[0;0m$2\e[1;32m$3\e[0;0m$4\n"; }'
Created attachment 4886 [details] mail sent to french debian list - FP on french "experience"