SA Bugzilla – Bug 6527
mkrules erroneously omits nopublish rules from masscheck
Last modified: 2024-03-04 04:23:50 UTC
http://www.gossamer-threads.com/lists/spamassassin/devel/158708 Earlier I discovered that certain rules wrapped in ifplugin were no longer being included in the nightly/weekly masscheck. For example: http://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/wtogami/20_bug_6220_sem.cf ifplugin Mail::SpamAssassin::Plugin::URIDNSBL # SEM-URI urirhssub SEM_URI uribl.spameatingmonkey.net. A 2 body SEM_URI eval:check_uridnsbl('SEM_URI') describe SEM_URI Contains a URI listed by SEM-URI tflags SEM_URI net nopublish ... endif This and many other rules are excluded from masscheck due to a buggy new behavior in mkrules, apparently related to being marked nopublish. The intent of nopublish is to prevent them from being published in sa-update, not to prevent them from being tested in masscheck. Indeed other rules like RCVD_IN_SEMBLACK are marked nopublish, but they continue to work in masschecks. # build/mkrules --exit_on_no_src --src rulesrc --out rules --manifest MANIFEST --manifestskip MANIFEST.SKIP rulesrc/10_force_active.cf: 0 active rules, 0 other rulesrc/sandbox/axb/20_axb_misc.cf: 2 active rules, 9 other rulesrc/sandbox/axb/20_axb_pdf.cf: 2 active rules, 0 other rulesrc/sandbox/axb/20_bug_6215.cf: 1 active rules, 0 other rulesrc/sandbox/dos/20_uri.cf: 1 active rules, 2 other rulesrc/sandbox/dos/70_bugs.cf: 0 active rules, 0 other rulesrc/sandbox/dos/70_other.cf: 14 active rules, 108 other rulesrc/sandbox/duncf/20_debt.cf: 2 active rules, 0 other rulesrc/sandbox/duncf/20_header.cf: 1 active rules, 2 other rulesrc/sandbox/emailed/00_FVGT_File001.cf: 234 active rules, 371 other rulesrc/sandbox/emailed/99_alex_dev.cf: 6 active rules, 3 other rulesrc/sandbox/emailed/99_alex_test.cf: 1 active rules, 0 other rulesrc/sandbox/fanf/10_headers.cf: 1 active rules, 3 other rulesrc/sandbox/fanf/20_uri_tests.cf: 0 active rules, 1 other rulesrc/sandbox/fanf/30_text.cf: 3 active rules, 1 other rulesrc/sandbox/felicity/70_dnswl.cf: 5 active rules, 0 other rulesrc/sandbox/felicity/70_iadb.cf: 27 active rules, 0 other rulesrc/sandbox/felicity/70_other.cf: 56 active rules, 47 other rulesrc/sandbox/felicity/70_phishing.cf: 10 active rules, 46 other rulesrc/sandbox/fredt/99_zFVGT_FakeReply.cf: 1 active rules, 70 other rulesrc/sandbox/hege/20_hk.cf: 19 active rules, 87 other rulesrc/sandbox/hstern/20_body_tests.cf: 6 active rules, 2 other rulesrc/sandbox/hstern/20_head_tests.cf: 3 active rules, 2 other rulesrc/sandbox/hstern/20_meta_tests.cf: 1 active rules, 0 other rulesrc/sandbox/hstern/20_uri_tests.cf: 3 active rules, 13 other rulesrc/sandbox/hstern/70_syndicate.cf: 1 active rules, 15 other rulesrc/sandbox/jhardin/20_MIME_no_text.cf: 3 active rules, 3 other rulesrc/sandbox/jhardin/20_advance_fee_reevolved.cf: 7 active rules, 9 other rulesrc/sandbox/jhardin/20_fillform.cf: 14 active rules, 0 other rulesrc/sandbox/jhardin/20_lotsa_money.cf: 29 active rules, 94 other rulesrc/sandbox/jhardin/20_misc_testing.cf: 54 active rules, 94 other rulesrc/sandbox/jhardin/20_tbird_image_spam.cf: 11 active rules, 13 other rulesrc/sandbox/jhardin/20_uri_obfu_ws.cf: 2 active rules, 0 other rulesrc/sandbox/jm/20_basic.cf: 48 active rules, 69 other rulesrc/sandbox/jm/20_bug_5920.cf: 0 active rules, 3 other rulesrc/sandbox/jm/20_bug_5984.cf: 2 active rules, 0 other rulesrc/sandbox/jm/20_bug_6000.cf: 0 active rules, 2 other rulesrc/sandbox/jm/20_bug_6152.cf: 1 active rules, 0 other rulesrc/sandbox/jm/20_dob.cf: 1 active rules, 0 other rulesrc/sandbox/jm/20_games.cf: 0 active rules, 3 other rulesrc/sandbox/jm/20_sought.cf: 0 active rules, 192 other rulesrc/sandbox/jm/20_sought2.cf: 0 active rules, 3 other rulesrc/sandbox/jm/20_sought_fraud.cf: 3 active rules, 481 other rulesrc/sandbox/jm/20_storm.cf: 0 active rules, 2 other rulesrc/sandbox/jm/20_xmailer.cf: 43 active rules, 86 other rulesrc/sandbox/jm/22_bug_5667.cf: 0 active rules, 0 other rulesrc/sandbox/jm/70_tt_drugs.cf: 2 active rules, 10 other rulesrc/sandbox/kb/20_bug_6156.cf: 2 active rules, 0 other rulesrc/sandbox/kb/20_header.cf: 3 active rules, 2 other rulesrc/sandbox/kb/70_misc.cf: 7 active rules, 19 other rulesrc/sandbox/kb/75_de.cf: 0 active rules, 1 other rulesrc/sandbox/khopesh/20_bug_6271.cf: 8 active rules, 0 other rulesrc/sandbox/khopesh/20_chickenpox.cf: 0 active rules, 10 other rulesrc/sandbox/khopesh/20_isc_attackers.cf: 0 active rules, 2 other rulesrc/sandbox/khopesh/20_khop_bl.cf: 5 active rules, 15 other rulesrc/sandbox/khopesh/20_khop_blessed.cf: 6 active rules, 9 other rulesrc/sandbox/khopesh/20_khop_dynamic.cf: 0 active rules, 12 other rulesrc/sandbox/khopesh/20_khop_experimental.cf: 8 active rules, 76 other rulesrc/sandbox/khopesh/20_khop_general.cf: 8 active rules, 6 other rulesrc/sandbox/khopesh/20_khop_lists.cf: 0 active rules, 16 other rulesrc/sandbox/khopesh/20_khop_sc_bug_6114.cf: 0 active rules, 22 other rulesrc/sandbox/khopesh/20_rcd_rdns.cf: 0 active rules, 24 other rulesrc/sandbox/khopesh/20_s25r.cf: 2 active rules, 11 other rulesrc/sandbox/khopesh/20_spamdb_subjects.cf: 0 active rules, 1 other rulesrc/sandbox/khopesh/20_trust.cf: 5 active rules, 2 other rulesrc/sandbox/khopesh/65_debian.cf: 0 active rules, 11 other rulesrc/sandbox/kmcgrail/20_test.cf: 2 active rules, 41 other rulesrc/sandbox/kmcgrail/70_mx.cf: 0 active rules, 4 other rulesrc/sandbox/kmcgrail/70_phishing.cf: 0 active rules, 1 other rulesrc/sandbox/kmcgrail/80_deadrules.cf: 6 active rules, 0 other rulesrc/sandbox/maddoc/99_doc_test.cf: 11 active rules, 35 other rulesrc/sandbox/maddoc/99_fsl_testing.cf: 7 active rules, 8 other rulesrc/sandbox/mkettler/20_drugs.cf: 2 active rules, 0 other rulesrc/sandbox/mkettler/25_uribl.cf: 5 active rules, 0 other rulesrc/sandbox/mmartinec/20_misc.cf: 0 active rules, 13 other rulesrc/sandbox/mmartinec/20_rpvalid.cf: 1 active rules, 0 other rulesrc/sandbox/sidney/70_other.cf: 0 active rules, 1 other rulesrc/sandbox/wtogami/20_bug_6212_hostkarma.cf: 4 active rules, 5 other rulesrc/sandbox/wtogami/20_bug_6220_sem.cf: 3 active rules, 1 other rulesrc/sandbox/wtogami/20_googlecache.cf: 0 active rules, 1 other rulesrc/sandbox/wtogami/20_mailspike.cf: 0 active rules, 15 other rulesrc/sandbox/wtogami/20_misc.cf: 0 active rules, 0 other rulesrc/sandbox/wtogami/20_rp_certified.cf: 3 active rules, 0 other rulesrc/sandbox/wtogami/20_ubl.cf: 0 active rules, 0 other rulesrc/sandbox/wtogami/20_unsafe.cf: 0 active rules, 0 other rulesrc/sandbox/wtogami/20_vanity.cf: 1 active rules, 22 other omitting rule T_DATE_IN_DISTANT_FUTURE ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish) omitting rule T_DATE_IN_FUTURE_1Y_4Y ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish) omitting rule T_DATE_IN_FUTURE_96_WEEK ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish) omitting rule T_DATE_IN_FUTURE_MONTH ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish) omitting rule T_DATE_IN_FUTURE_WEEK ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish) omitting rule T_DATE_IN_FUTURE_YEAR ifplugin Mail::SpamAssassin::Plugin::HeaderEval due to tflags nopublish (tflags nopublish) omitting rule T_DNSBL_INDIRECT ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish) omitting rule T_DNSBL_INDIRECT_UNSAFE ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish) omitting rule T_DNSBL_INDIRECT_UNSAFE_2 ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish) omitting rule T_KHOP_FOREIGN_CLICK due to tflags nopublish (tflags nopublish) omitting rule T_KHOP_PGP_INLINE if ! plugin (Mail::SpamAssassin::Plugin::OpenPGP) due to tflags nopublish (tflags nice noautolearn nopublish) omitting rule T_KHOP_PGP_SIGNED if ! plugin (Mail::SpamAssassin::Plugin::OpenPGP) due to tflags nopublish (tflags nice noautolearn nopublish) omitting rule T_MONEY_FREEMAIL ifplugin Mail::SpamAssassin::Plugin::FreeMail due to tflags nopublish (tflags nopublish) omitting rule T_RCVD_IN_NIX_SPAM ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish) omitting rule T_RCVD_IN_SPAMCOP ifplugin Mail::SpamAssassin::Plugin::DNSEval # { due to tflags nopublish (tflags net nopublish) omitting rule T_SEM_FRESH ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_SEM_URI ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_SEM_URIRED ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_URIBL_HOSTKARMA_BL ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_URIBL_HOSTKARMA_BR ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_URIBL_HOSTKARMA_FRESH_10D ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_URIBL_HOSTKARMA_FRESH_2D ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish) omitting rule T_URIBL_META_SURBL_ANY ifplugin Mail::SpamAssassin::Plugin::URIDNSBL due to tflags nopublish (tflags net nopublish)
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6297#c14 This bug was introduced here.
Additional problems: 1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out of masscheck 2 - remnants of rules that are #testrules with no publish still get published. 3 - If it doesn't break masschecks, we should consider code to avoid publishing ANY rules T_ and also add a safety valve parameter that requires a --test-rules to run them even if they are published.
(In reply to comment #2) > Additional problems: > > 1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out > of masscheck > > 2 - remnants of rules that are #testrules with no publish still get published. #testrules and "tflags nopublish" are two separate parameters. It seems #testrules in this case was only partially successful. SEM rules are lacking "tflags nopublish" which likely would have stopped this. > > 3 - If it doesn't break masschecks, we should consider code to avoid publishing > ANY rules T_ and also add a safety valve parameter that requires a --test-rules > to run them even if they are published. In my opinion --test-rules is a needless complication.
> #testrules and "tflags nopublish" are two separate parameters. It seems > #testrules in this case was only partially successful. #testrules was completely succesful. It published the rules as T_... > SEM rules are lacking > "tflags nopublish" which likely would have stopped this. Correct which someone removed I'm guessing when nopublish took rules out of 70_sandbox.cf avoiding masscheck. > > 3 - If it doesn't break masschecks, we should consider code to avoid publishing > > ANY rules T_ and also add a safety valve parameter that requires a --test-rules > > to run them even if they are published. > > In my opinion --test-rules is a needless complication. We'll agree to disagree. I don't like egg on my face and don't want to see this happen a third time.
(In reply to comment #2) > Additional problems: > > 1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out > of masscheck > > 2 - remnants of rules that are #testrules with no publish still get published. > > 3 - If it doesn't break masschecks, we should consider code to avoid publishing > ANY rules T_ and also add a safety valve parameter that requires a --test-rules > to run them even if they are published. 4 - According to http://wiki.apache.org/spamassassin/SaUpdateBackend, T_ rules are never published and this page needs to be updated to document #testrules
Additional problems (recapping as many as I know about from discussions on dev): 1 - 70_sandbox.cf is not correct from the fixes in 6297 which leave rules out of masscheck 2 - remnants of rules that are #testrules with no publish still get published. 3 - If it doesn't break masschecks, we should consider code to avoid publishing ANY rules T_ and also add a safety valve parameter that requires a --test-rules to run them even if they are published. 4 - According to http://wiki.apache.org/spamassassin/SaUpdateBackend, T_ rules are never published and this page needs to be updated to document #testrules 5 - There are rules like T_SURBL_MULTI2 that are in mkettler's sandbox. There is currently no reason I can see for why these are being published. This is NOT an error in his sandbox. 6 - There might be race conditions where items on SaUpdateBackend flag for always publish and never publish. Which takes precedence? I clearly thing NOT publish should be the default but these conditions need to be clarified and the code changed to reflect the conditions so that we never have this messed up again.
(In reply to comment #6) > 5 - There are rules like T_SURBL_MULTI2 that are in mkettler's sandbox. There > is currently no reason I can see for why these are being published. This is > NOT an error in his sandbox. If I recall correctly, T_ rules *can* be dragged into updates if they are required as dependencies of a published meta rule.
Any updates on this? Why not just rename rules that have been auto-promoted via dependencies so that they aren't published with the T_ prefix? (I assume the GA will end up scoring these things near-zero, but if that's an unfair assumption, the promotion process should score them at +/- 0.001 so they are effectively __ rules.) There are currently 73 T_ rules live in 3.3.2 and 77 on the trunk. % host -ttxt 0.4.3.updates.spamassassin.org 0.4.3.updates.spamassassin.org is an alias for 2.3.3.updates.spamassassin.org. 2.3.3.updates.spamassassin.org descriptive text "1195874" % host -ttxt mirrors.updates.spamassassin.org. mirrors.updates.spamassassin.org descriptive text "http://spamassassin.apache.org/updates/MIRRORED.BY" % curl http://spamassassin.apache.org/updates/MIRRORED.BY |grep -m1 ^h http://daryl.dostech.ca/sa-update/asf/ weight=5 % curl -s http://daryl.dostech.ca/sa-update/asf/1139740.tar.gz |tar -zxf - % awk '$1$2 !~ /#/ && $2 ~ /^T_/ && $1 !~ /^(reuse|describe|replace_rules)/ {rule++} END {print rule}' *.cf 73 Here is the list of rules on trunk (remove the 'pr -t3' and perhaps the '{print $2}' if you want to do something with this list): % awk '$1$2 !~ /#/ && $2 ~ /^T_/ && $1 !~ /^(reuse|describe|replace_rules)/ {print $2}' trunk/rules/72_active.cf |sort |uniq |pr -t3 T_DKIM_INVALID T_FRT_HOUR T_HK_NAME_FM_MR_MRS T_DOS_OUTLOOK_TO_MX_IMA T_FRT_INCOME T_HK_NAME_FROM T_DOS_ZIP_HARDCORE T_FRT_INTEREST T_HK_NAME_MR_MRS T_EMRCP T_FRT_LITTLE T_HTML_ATTACH T_FILL_THIS_FORM_SHORT T_FRT_LOLITA1 T_KHOP_FOREIGN_CLICK T_FORGED_TBIRD_IMG_SIZE T_FRT_OPPORTUN1 T_LARGE_PCT_AFTER_MANY T_FREEMAIL_DOC_PDF T_FRT_PACKAGE T_LFUZ_PWRMALE T_FREEMAIL_FORGED_FROMD T_FRT_PAYMENT T_LOTTO_AGENT_FM T_FREEMAIL_RVW_ATTCH T_FRT_PHARMAC T_LOTTO_AGENT_RPLY T_FRM_SILVER_GOLD T_FRT_POSSIBLE T_LOTTO_DEPT T_FRT_ABSOLUT T_FRT_PROFILE1 T_LOTTO_URI T_FRT_ADULT2 T_FRT_PROFILE2 T_MIME_NO_TEXT T_FRT_BEFORE T_FRT_PROFIT1 T_MONEY_PERCENT T_FRT_BELOW2 T_FRT_PROFIT2 T_OBFU_DOC_ATTACH T_FRT_CANSPAM T_FRT_PUSSY T_OBFU_GIF_ATTACH T_FRT_CLICK T_FRT_SLUT T_OBFU_HTML_ATTACH T_FRT_COCK T_FRT_STOCK1 T_OBFU_JPG_ATTACH T_FRT_CONTACT T_FRT_STOCK2 T_OBFU_PDF_ATTACH T_FRT_ESTABLISH T_FRT_VIRGIN1 T_REMOTE_IMAGE T_FRT_EXPERIENCE T_FUZZY_OPTOUT T_SHARE_50_50 T_FRT_FOLLOW1 T_FUZZY_SPRM T_TVD_FUZZY_SECTOR T_FRT_FOLLOW2 T_HEADER_FROM_DIFFERENT T_TVD_FUZZY_SECURITIES T_FRT_FREE T_HK_NAME_DR T_TVD_FW_GRAPHIC_ID2 T_FRT_FRIEND T_HK_NAME_FM_DR T_TVD_MIME_EPI T_FRT_FUCK1 T_HK_NAME_FM_FROM T_TVD_MIME_NO_HEADERS T_FRT_HEALTH
could you pls check if this only happens with rules which are encapsulated with "ifplugin"
(In reply to Justin Mason from comment #7) > (In reply to comment #6) > > 5 - There are rules like T_SURBL_MULTI2 that are in mkettler's sandbox. There > > is currently no reason I can see for why these are being published. This is > > NOT an error in his sandbox. > > If I recall correctly, T_ rules *can* be dragged into updates if they are > required as dependencies of a published meta rule. This zombie thread is perhaps a better place to make this comment than a new bug... I just noticed that if a T_ rule *does* get published, it gets published *without* an explicit score - so if the rule definition in the sandbox has a score limit that's less than one, the T_ rule starts adding a full point to the message's score when it hits, when it would add fewer points if it were properly published. I see two possible approaches to fix: (1) If a T_ rule *does* get published, and the score limit is < 1, an explicit score for it should also be published at that score limit. (2) T_ rules should always be published with an explicit "informational" score (0.0001) so that they still can be used in metas but they don't materially affect the overall score. Should this be a new, separate bug?