Bug 7497 - Certain Language Specific Rule entries are breaking rule gen on the new box
Summary: Certain Language Specific Rule entries are breaking rule gen on the new box
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: RuleQA (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Windows NT
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-03 12:22 UTC by Kevin A. McGrail
Modified: 2019-06-26 07:19 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin A. McGrail 2017-11-03 12:22:08 UTC
Thanks to Merijn van den Kroonenberg, he identified that the new box had some grep issues on a line on a long-standing rule in my sandbox that had multiple language definitions.

Re: the grep issue, on older CentOS boxes, I would be looking for a LANG=en_US setting as one possible culprit.  Any ideas where that might be on Ubuntu?

For triage, I nuked the line:

[kmcgrail@talon2 kmcgrail]$ svn commit -m 'Removing language specific descriptions which are not grepping properly on new box for masscheck'
Sending        20_rules_to_sandbox.cf
Transmitting file data .
Committed revision 1813992.

This worked but the underlying issue of why I had to do this remains and this ticket exists so we can revert the rule change.
Comment 1 Merijn van den Kroonenberg 2017-11-03 13:00:14 UTC
The actual code causing the problem:

./masses/rule-update-score-gen/generate-new-scores.sh:202:grep -v ^score rules/72_active.cf > rules/72_active.cf-scoreless
./masses/rule-update-score-gen/generate-new-scores.sh:203:mv -f rules/72_active.cf-scoreless rules/72_active.cf

basically all score lines are stripped from 72_active.cf

Additionally I am wondering if those grep statements might be removed altogether, because right now they are not doing a thing, as all score statements in the 72_active.cf are commented out. I think this is done by the build/mkrules program.
 
build/mkrules line 455:
      # comment "score" lines for sandbox rules (bug 5558)
      # use generated scores, though, if the rule is active
      if ($type eq 'score' && $issandbox &&
        !($isscores && $active_rules->{$name}))
      {
        $orig =~ s/^/#/g;
      }
 
But I don’t understand above code enough to be sure.

To test or experiment with grep and special characters:

wget http://sa-update.ena.com/1813258.tar.gz
extract the 72_active.cf
Then experiment with locale settings(eg. SET LANG=en_US)
and run:
grep -v ^score 72_active.cf > test.cf
And test.cf should not end with the line: 
Binary file 72_active.cf matches
Comment 2 Bill Cole 2017-11-03 18:18:38 UTC
(In reply to Kevin A. McGrail from comment #0)
 
> Re: the grep issue, on older CentOS boxes, I would be looking for a
> LANG=en_US setting as one possible culprit.  Any ideas where that might be
> on Ubuntu?

/etc/default/locale (or /etc/environment on older versions)
Comment 3 Kevin A. McGrail 2018-08-28 23:40:34 UTC
Merijn & Dave, is this resolved
Comment 4 Henrik Krohns 2019-06-26 07:19:09 UTC
So is the locale fixed?