Bug 6527 - mkrules erroneously omits nopublish rules from masscheck
Summary: mkrules erroneously omits nopublish rules from masscheck
Status: NEW
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Building & Packaging (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-28 02:49 UTC by Warren Togami
Modified: 2016-06-08 16:38 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Warren Togami 2010-12-28 02:49:44 UTC
http://www.gossamer-threads.com/lists/spamassassin/devel/158708
Earlier I discovered that certain rules wrapped in ifplugin were no longer being included in the nightly/weekly masscheck.  For example:

http://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/wtogami/20_bug_6220_sem.cf

ifplugin Mail::SpamAssassin::Plugin::URIDNSBL
# SEM-URI
urirhssub SEM_URI uribl.spameatingmonkey.net. A 2
body SEM_URI eval:check_uridnsbl('SEM_URI')
describe SEM_URI Contains a URI listed by SEM-URI
tflags SEM_URI net nopublish
...
endif

This and many other rules are excluded from masscheck due to a buggy new behavior in mkrules, apparently related to being marked nopublish.  The intent of nopublish is to prevent them from being published in sa-update, not to prevent them from being tested in masscheck.  Indeed other rules like RCVD_IN_SEMBLACK are marked nopublish, but they continue to work in masschecks.

# build/mkrules --exit_on_no_src --src rulesrc --out rules --manifest MANIFEST --manifestskip MANIFEST.SKIP
rulesrc/10_force_active.cf: 0 active rules, 0 other
rulesrc/sandbox/axb/20_axb_misc.cf: 2 active rules, 9 other
rulesrc/sandbox/axb/20_axb_pdf.cf: 2 active rules, 0 other
rulesrc/sandbox/axb/20_bug_6215.cf: 1 active rules, 0 other
rulesrc/sandbox/dos/20_uri.cf: 1 active rules, 2 other
rulesrc/sandbox/dos/70_bugs.cf: 0 active rules, 0 other
rulesrc/sandbox/dos/70_other.cf: 14 active rules, 108 other
rulesrc/sandbox/duncf/20_debt.cf: 2 active rules, 0 other
rulesrc/sandbox/duncf/20_header.cf: 1 active rules, 2 other
rulesrc/sandbox/emailed/00_FVGT_File001.cf: 234 active rules, 371 other
rulesrc/sandbox/emailed/99_alex_dev.cf: 6 active rules, 3 other
rulesrc/sandbox/emailed/99_alex_test.cf: 1 active rules, 0 other
rulesrc/sandbox/fanf/10_headers.cf: 1 active rules, 3 other
rulesrc/sandbox/fanf/20_uri_tests.cf: 0 active rules, 1 other
rulesrc/sandbox/fanf/30_text.cf: 3 active rules, 1 other
rulesrc/sandbox/felicity/70_dnswl.cf: 5 active rules, 0 other
rulesrc/sandbox/felicity/70_iadb.cf: 27 active rules, 0 other
rulesrc/sandbox/felicity/70_other.cf: 56 active rules, 47 other
rulesrc/sandbox/felicity/70_phishing.cf: 10 active rules, 46 other
rulesrc/sandbox/fredt/99_zFVGT_FakeReply.cf: 1 active rules, 70 other
rulesrc/sandbox/hege/20_hk.cf: 19 active rules, 87 other
rulesrc/sandbox/hstern/20_body_tests.cf: 6 active rules, 2 other
rulesrc/sandbox/hstern/20_head_tests.cf: 3 active rules, 2 other
rulesrc/sandbox/hstern/20_meta_tests.cf: 1 active rules, 0 other
rulesrc/sandbox/hstern/20_uri_tests.cf: 3 active rules, 13 other
rulesrc/sandbox/hstern/70_syndicate.cf: 1 active rules, 15 other
rulesrc/sandbox/jhardin/20_MIME_no_text.cf: 3 active rules, 3 other
rulesrc/sandbox/jhardin/20_advance_fee_reevolved.cf: 7 active rules, 9 other
rulesrc/sandbox/jhardin/20_fillform.cf: 14 active rules, 0 other
rulesrc/sandbox/jhardin/20_lotsa_money.cf: 29 active rules, 94 other
rulesrc/sandbox/jhardin/20_misc_testing.cf: 54 active rules, 94 other
rulesrc/sandbox/jhardin/20_tbird_image_spam.cf: 11 active rules, 13 other
rulesrc/sandbox/jhardin/20_uri_obfu_ws.cf: 2 active rules, 0 other
rulesrc/sandbox/jm/20_basic.cf: 48 active rules, 69 other
rulesrc/sandbox/jm/20_bug_5920.cf: 0 active rules, 3 other
rulesrc/sandbox/jm/20_bug_5984.cf: 2 active rules, 0 other
rulesrc/sandbox/jm/20_bug_6000.cf: 0 active rules, 2 other
rulesrc/sandbox/jm/20_bug_6152.cf: 1 active rules, 0 other
rulesrc/sandbox/jm/20_dob.cf: 1 active rules, 0 other
rulesrc/sandbox/jm/20_games.cf: 0 active rules, 3 other
rulesrc/sandbox/jm/20_sought.cf: 0 active rules, 192 other
rulesrc/sandbox/jm/20_sought2.cf: 0 active rules, 3 other
rulesrc/sandbox/jm/20_sought_fraud.cf: 3 active rules, 481 other
rulesrc/sandbox/jm/20_storm.cf: 0 active rules, 2 other
rulesrc/sandbox/jm/20_xmailer.cf: 43 active rules, 86 other
rulesrc/sandbox/jm/22_bug_5667.cf: 0 active rules, 0 other
rulesrc/sandbox/jm/70_tt_drugs.cf: 2 active rules, 10 other
rulesrc/sandbox/kb/20_bug_6156.cf: 2 active rules, 0 other
rulesrc/sandbox/kb/20_header.cf: 3 active rules, 2 other
rulesrc/sandbox/kb/70_misc.cf: 7 active rules, 19 other
rulesrc/sandbox/kb/75_de.cf: 0 active rules, 1 other
rulesrc/sandbox/khopesh/20_bug_6271.cf: 8 active rules, 0 other
rulesrc/sandbox/khopesh/20_chickenpox.cf: 0 active rules, 10 other
rulesrc/sandbox/khopesh/20_isc_attackers.cf: 0 active rules, 2 other
rulesrc/sandbox/khopesh/20_khop_bl.cf: 5 active rules, 15 other
rulesrc/sandbox/khopesh/20_khop_blessed.cf: 6 active rules, 9 other
rulesrc/sandbox/khopesh/20_khop_dynamic.cf: 0 active rules, 12 other
rulesrc/sandbox/khopesh/20_khop_experimental.cf: 8 active rules, 76 other
rulesrc/sandbox/khopesh/20_khop_general.cf: 8 active rules, 6 other
rulesrc/sandbox/khopesh/20_khop_lists.cf: 0 active rules, 16 other
rulesrc/sandbox/khopesh/20_khop_sc_bug_6114.cf: 0 active rules, 22 other
rulesrc/sandbox/khopesh/20_rcd_rdns.cf: 0 active rules, 24 other
rulesrc/sandbox/khopesh/20_s25r.cf: 2 active rules, 11 other
rulesrc/sandbox/khopesh/20_spamdb_subjects.cf: 0 active rules, 1 other
rulesrc/sandbox/khopesh/20_trust.cf: 5 active rules, 2 other
rulesrc/sandbox/khopesh/65_debian.cf: 0 active rules, 11 other
rulesrc/sandbox/kmcgrail/20_test.cf: 2 active rules, 41 other
rulesrc/sandbox/kmcgrail/70_mx.cf: 0 active rules, 4 other
rulesrc/sandbox/kmcgrail/70_phishing.cf: 0 active rules, 1 other
rulesrc/sandbox/kmcgrail/80_deadrules.cf: 6 active rules, 0 other
rulesrc/sandbox/maddoc/99_doc_test.cf: 11 active rules, 35 other
rulesrc/sandbox/maddoc/99_fsl_testing.cf: 7 active rules, 8 other
rulesrc/sandbox/mkettler/20_drugs.cf: 2 active rules, 0 other
rulesrc/sandbox/mkettler/25_uribl.cf: 5 active rules, 0 other
rulesrc/sandbox/mmartinec/20_misc.cf: 0 active rules, 13 other
rulesrc/sandbox/mmartinec/20_rpvalid.cf: 1 active rules, 0 other
rulesrc/sandbox/sidney/70_other.cf: 0 active rules, 1 other
rulesrc/sandbox/wtogami/20_bug_6212_hostkarma.cf: 4 active rules, 5 other
rulesrc/sandbox/wtogami/20_bug_6220_sem.cf: 3 active rules, 1 other
rulesrc/sandbox/wtogami/20_googlecache.cf: 0 active rules, 1 other
rulesrc/sandbox/wtogami/20_mailspike.cf: 0 active rules, 15 other
rulesrc/sandbox/wtogami/20_misc.cf: 0 active rules, 0 other
rulesrc/sandbox/wtogami/20_rp_certified.cf: 3 active rules, 0 other
rulesrc/sandbox/wtogami/20_ubl.cf: 0 active rules, 0 other
rulesrc/sandbox/wtogami/20_unsafe.cf: 0 active rules, 0 other
rulesrc/sandbox/wtogami/20_vanity.cf: 1 active rules, 22 other



omitting rule T_DATE_IN_DISTANT_FUTURE ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish)
omitting rule T_DATE_IN_FUTURE_1Y_4Y ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish)
omitting rule T_DATE_IN_FUTURE_96_WEEK ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish)
omitting rule T_DATE_IN_FUTURE_MONTH ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish)
omitting rule T_DATE_IN_FUTURE_WEEK ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish)
omitting rule T_DATE_IN_FUTURE_YEAR ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish)
omitting rule T_DNSBL_INDIRECT ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish)
omitting rule T_DNSBL_INDIRECT_UNSAFE ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish)
omitting rule T_DNSBL_INDIRECT_UNSAFE_2 ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish)
omitting rule T_KHOP_FOREIGN_CLICK due to tflags nopublish (tflags nopublish)
omitting rule T_KHOP_PGP_INLINE if ! plugin (Mail::SpamAssassin::Plugin::OpenPGP) due to tflags nopublish (tflags nice noautolearn nopublish)
omitting rule T_KHOP_PGP_SIGNED if ! plugin (Mail::SpamAssassin::Plugin::OpenPGP) due to tflags nopublish (tflags nice noautolearn nopublish)
omitting rule T_MONEY_FREEMAIL ifplugin Mail::SpamAssassin::Plugin::FreeMail due to tflags nopublish (tflags nopublish)
omitting rule T_RCVD_IN_NIX_SPAM ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish)
omitting rule T_RCVD_IN_SPAMCOP ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish)
omitting rule T_SEM_FRESH ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_SEM_URI ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_SEM_URIRED ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_URIBL_HOSTKARMA_BL ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_URIBL_HOSTKARMA_BR ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_URIBL_HOSTKARMA_FRESH_10D ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_URIBL_HOSTKARMA_FRESH_2D ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
omitting rule T_URIBL_META_SURBL_ANY ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
Comment 1 Warren Togami 2010-12-28 02:57:17 UTC
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6297#c14
This bug was introduced here.
Comment 2 Kevin A. McGrail 2011-06-27 21:38:31 UTC
Additional problems:

1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out of masscheck

2 - remnants of rules that are #testrules with no publish still get published.

3 - If it doesn't break masschecks, we should consider code to avoid publishing ANY rules T_ and also add a safety valve parameter that requires a --test-rules to run them even if they are published.
Comment 3 Warren Togami 2011-06-27 21:47:24 UTC
(In reply to comment #2)
> Additional problems:
> 
> 1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out
> of masscheck
> 
> 2 - remnants of rules that are #testrules with no publish still get published.

#testrules and "tflags nopublish" are two separate parameters.  It seems #testrules in this case was only partially successful.  SEM rules are lacking "tflags nopublish" which likely would have stopped this.

> 
> 3 - If it doesn't break masschecks, we should consider code to avoid publishing
> ANY rules T_ and also add a safety valve parameter that requires a --test-rules
> to run them even if they are published.

In my opinion --test-rules is a needless complication.
Comment 4 Kevin A. McGrail 2011-06-27 21:50:08 UTC
> #testrules and "tflags nopublish" are two separate parameters.  It seems
> #testrules in this case was only partially successful.  

#testrules was completely succesful.  It published the rules as T_...

> SEM rules are lacking
> "tflags nopublish" which likely would have stopped this.

Correct which someone removed I'm guessing when nopublish took rules out of 70_sandbox.cf avoiding masscheck.

> > 3 - If it doesn't break masschecks, we should consider code to avoid publishing
> > ANY rules T_ and also add a safety valve parameter that requires a --test-rules
> > to run them even if they are published.
> 
> In my opinion --test-rules is a needless complication.

We'll agree to disagree.  I don't like egg on my face and don't want to see this happen a third time.
Comment 5 Kevin A. McGrail 2011-06-27 22:48:43 UTC
(In reply to comment #2)
> Additional problems:
> 
> 1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out
> of masscheck
> 
> 2 - remnants of rules that are #testrules with no publish still get published.
> 
> 3 - If it doesn't break masschecks, we should consider code to avoid publishing
> ANY rules T_ and also add a safety valve parameter that requires a --test-rules
> to run them even if they are published.

4 - According to http://wiki.apache.org/spamassassin/SaUpdateBackend, T_ rules are never published and this page needs to be updated to document #testrules
Comment 6 Kevin A. McGrail 2011-06-28 13:56:05 UTC
Additional problems (recapping as many as I know about from discussions on dev):
 
1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out of masscheck

2 - remnants of rules that are #testrules with no publish still get published.
 
3 - If it doesn't break masschecks, we should consider code to avoid publishing ANY rules T_ and also add a safety valve parameter that requires a --test-rules to run them even if they are published.

4 - According to http://wiki.apache.org/spamassassin/SaUpdateBackend, T_ rules are never published and this page needs to be updated to document #testrules

5 - There are rules like T_SURBL_MULTI2 that are in mkettler's sandbox.  There is currently no reason I can see for why these are being published.  This is NOT an error in his sandbox.

6 - There might be race conditions where items on SaUpdateBackend flag for always publish and never publish.  Which takes precedence?  I clearly thing NOT publish should be the default but these conditions need to be clarified and the code changed to reflect the conditions so that we never have this messed up again.
Comment 7 Justin Mason 2011-06-28 20:37:59 UTC
(In reply to comment #6)
> 5 - There are rules like T_SURBL_MULTI2 that are in mkettler's sandbox.  There
> is currently no reason I can see for why these are being published.  This is
> NOT an error in his sandbox.

If I recall correctly, T_ rules *can* be dragged into updates if they are required as dependencies of a published meta rule.
Comment 8 Adam Katz 2011-11-08 01:59:43 UTC
Any updates on this?

Why not just rename rules that have been auto-promoted via dependencies so that they aren't published with the T_ prefix?  (I assume the GA will end up scoring these things near-zero, but if that's an unfair assumption, the promotion process should score them at +/- 0.001 so they are effectively __ rules.)

There are currently 73 T_ rules live in 3.3.2 and 77 on the trunk.

% host -ttxt 0.4.3.updates.spamassassin.org
0.4.3.updates.spamassassin.org is an alias for 2.3.3.updates.spamassassin.org.
2.3.3.updates.spamassassin.org descriptive text "1195874"
% host -ttxt  mirrors.updates.spamassassin.org.
mirrors.updates.spamassassin.org descriptive text
"http://spamassassin.apache.org/updates/MIRRORED.BY"
% curl http://spamassassin.apache.org/updates/MIRRORED.BY |grep -m1 ^h
http://daryl.dostech.ca/sa-update/asf/ weight=5
% curl -s http://daryl.dostech.ca/sa-update/asf/1139740.tar.gz |tar -zxf -
% awk '$1$2 !~ /#/ && $2 ~ /^T_/ && $1 !~ /^(reuse|describe|replace_rules)/ {rule++} END {print rule}' *.cf
73

Here is the list of rules on trunk (remove the 'pr -t3' and perhaps the '{print $2}' if you want to do something with this list):

% awk '$1$2 !~ /#/ && $2 ~ /^T_/ && $1 !~ /^(reuse|describe|replace_rules)/ {print $2}' trunk/rules/72_active.cf |sort |uniq |pr -t3

T_DKIM_INVALID		T_FRT_HOUR		T_HK_NAME_FM_MR_MRS
T_DOS_OUTLOOK_TO_MX_IMA T_FRT_INCOME		T_HK_NAME_FROM
T_DOS_ZIP_HARDCORE	T_FRT_INTEREST		T_HK_NAME_MR_MRS
T_EMRCP			T_FRT_LITTLE		T_HTML_ATTACH
T_FILL_THIS_FORM_SHORT	T_FRT_LOLITA1		T_KHOP_FOREIGN_CLICK
T_FORGED_TBIRD_IMG_SIZE T_FRT_OPPORTUN1		T_LARGE_PCT_AFTER_MANY
T_FREEMAIL_DOC_PDF	T_FRT_PACKAGE		T_LFUZ_PWRMALE
T_FREEMAIL_FORGED_FROMD T_FRT_PAYMENT		T_LOTTO_AGENT_FM
T_FREEMAIL_RVW_ATTCH	T_FRT_PHARMAC		T_LOTTO_AGENT_RPLY
T_FRM_SILVER_GOLD	T_FRT_POSSIBLE		T_LOTTO_DEPT
T_FRT_ABSOLUT		T_FRT_PROFILE1		T_LOTTO_URI
T_FRT_ADULT2		T_FRT_PROFILE2		T_MIME_NO_TEXT
T_FRT_BEFORE		T_FRT_PROFIT1		T_MONEY_PERCENT
T_FRT_BELOW2		T_FRT_PROFIT2		T_OBFU_DOC_ATTACH
T_FRT_CANSPAM		T_FRT_PUSSY		T_OBFU_GIF_ATTACH
T_FRT_CLICK		T_FRT_SLUT		T_OBFU_HTML_ATTACH
T_FRT_COCK		T_FRT_STOCK1		T_OBFU_JPG_ATTACH
T_FRT_CONTACT		T_FRT_STOCK2		T_OBFU_PDF_ATTACH
T_FRT_ESTABLISH		T_FRT_VIRGIN1		T_REMOTE_IMAGE
T_FRT_EXPERIENCE	T_FUZZY_OPTOUT		T_SHARE_50_50
T_FRT_FOLLOW1		T_FUZZY_SPRM		T_TVD_FUZZY_SECTOR
T_FRT_FOLLOW2		T_HEADER_FROM_DIFFERENT T_TVD_FUZZY_SECURITIES
T_FRT_FREE		T_HK_NAME_DR		T_TVD_FW_GRAPHIC_ID2
T_FRT_FRIEND		T_HK_NAME_FM_DR		T_TVD_MIME_EPI
T_FRT_FUCK1		T_HK_NAME_FM_FROM	T_TVD_MIME_NO_HEADERS
T_FRT_HEALTH
Comment 9 AXB 2011-11-08 08:15:20 UTC
could you pls check if this only happens with rules which are encapsulated with 
"ifplugin"
Comment 10 John Hardin 2016-06-08 16:38:41 UTC
(In reply to Justin Mason from comment #7)
> (In reply to comment #6)
> > 5 - There are rules like T_SURBL_MULTI2 that are in mkettler's sandbox.  There
> > is currently no reason I can see for why these are being published.  This is
> > NOT an error in his sandbox.
> 
> If I recall correctly, T_ rules *can* be dragged into updates if they are
> required as dependencies of a published meta rule.

This zombie thread is perhaps a better place to make this comment than a new bug...

I just noticed that if a T_ rule *does* get published, it gets published *without* an explicit score - so if the rule definition in the sandbox has a score limit that's less than one, the T_ rule starts adding a full point to the message's score when it hits, when it would add fewer points if it were properly published.

I see two possible approaches to fix:

(1) If a T_ rule *does* get published, and the score limit is < 1, an explicit score for it should also be published at that score limit.

(2) T_ rules should always be published with an explicit "informational" score (0.0001) so that they still can be used in metas but they don't materially affect the overall score.


Should this be a new, separate bug?