SA Bugzilla – Bug 6419
[review] SpamD/SpamC report email as not spam but also reports Hits = Required Threshold
Last modified: 2010-07-31 08:06:26 UTC
spamc called with parameter -R is printing the report with "The first line of the output is the message score and the threshold, in this format: score/threshold" However, the first line does NOT agree with the content analysis details. Here's the output from /usr/local/bin/spamc -R -z -d localhost -u root < /tmp/3: 6.5/6.5 ...(STUFF REMOVED) Content analysis details: (6.4 points, 6.5 required) 1.1 URIBL_GREY Contains an URL listed in the URIBL greylist [URIs: constantcontact.com] 1.0 EXTRA_MPART_TYPE Header has extraneous Content-type:...type= entry -0.5 TEMP TEMP 2.4 ONLINE_PHARMACY BODY: Online Pharmacy 1.2 TVD_VISIT_PHARMA BODY: TVD_VISIT_PHARMA 0.0 HTML_MESSAGE BODY: HTML included in message 2.3 ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian 419) 0.0 T_LOTS_OF_MONEY Huge... sums of money -1.0 KAM_RPTR_PASSED Passed Mail Relay Reverse DNS Test -0.0 AWL AWL: From: address is in the auto white-list I am using spamc -R and using the first line of output from spamc to determine if the email was spam by checking if score is >= thresholdto test. OK, so long and short, this is bug 2607 rearing it's head in spamd. I have a patch against 3.3.0 that works. However, centralizing the routine that I've duplicated for PerMsgStatus.pm would likely be a better solution.
Created attachment 4750 [details] Patch of PerMsgStatus.pm for clarity and duplication of routine in spamd
I believe this is a fairly big bug as spamd is reporting different scores that PerMsgStatus.pm. I have re-written the patch to move the rounding function from PMS to Util.pm and changed PMS and Spamd to both use the same routine for consistency.
Created attachment 4752 [details] Patch of PerMsgStatus.pm, Util.pm and spamd to round spam score consistently using the same routine
+1, looks good Even though I would have preferred the public function to stay in PMS, but that might just be me and weird in another way. Oh, and could you please do an s/;;/;/ on the Util.pm hunk of the patch? ;)
Forgot to mention: I didn't run the code, just checked the diff -- based on the previous discussion on the dev list.
Created attachment 4761 [details] VERY small change to remove an extra semi-colon (In reply to comment #4) > Even though I would have preferred the public function to stay in PMS, but that > might just be me and weird in another way. Oh, and could you please do an > s/;;/;/ on the Util.pm hunk of the patch? ;) From memory, PMS is a class and I needed a procedural call to use the same function uniformly throughout. This issue really came down to implementations to do the same functionality that were not kept in sync. re: double semi-colon, new patch attached. (In reply to comment #5) > Forgot to mention: I didn't run the code, just checked the diff -- based on the > previous discussion on the dev list. No worries, I've been running in production now since the day of the patch so I stand behind it.
+0.1 I don't mind the proposed change. I'm just rather skeptical on this entire rounding business - the deeper these tricks go and the more complex they become, the more difficult it is to explain how this all works. Like opening a can of worms. If it were my code, I'd ditch the whole rounding magic and just let the sprintf %f do its rounding job to whatever format is required in each presentation text. Users should realize by now (during the last 50 years) how this floating point arithmetics works.
(In reply to comment #7) > +0.1 > > I don't mind the proposed change. I'm just rather skeptical > on this entire rounding business - the deeper these tricks go > and the more complex they become, the more difficult it is > to explain how this all works. Like opening a can of worms. > If it were my code, I'd ditch the whole rounding magic and > just let the sprintf %f do its rounding job to whatever format > is required in each presentation text. Users should realize > by now (during the last 50 years) how this floating point > arithmetics works. Mark, I don't disagree but bug 2607 was the original bug. All this patch is doing is using the same code everywhere that was previously implemented. The previous implementation missed code in spamc/spamd that has been overlooked until now. And I think for me, a major change would be to report to 5 decimal places. Because this really isn't a floating point arithmetic issue. This is a rounding up for the reporting but using the real number for the is_spam flag. Regards, KAM
+1 ready for commit to 3.3 branch
Shouldn't this have been committed to trunk already?
(In reply to comment #10) > Shouldn't this have been committed to trunk already? I don't know. I usually wait for the vote to apply to trunk. Am I doing that incorrectly? Trunk commit: svn commit -m 'Bug 6419 patch commit to resolve rounding issue irregularity with spamd/spamc' Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Sending lib/Mail/SpamAssassin/Util.pm Sending spamd/spamd.raw Transmitting file data ... Committed revision 966355. 3.3 branch commit: svn commit -m 'Bug 6419 patch commit to resolve rounding issue irregularity with spamd/spamc' Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Sending lib/Mail/SpamAssassin/Util.pm Sending spamd/spamd.raw Transmitting file data ... Committed revision 966357. Regards, KAM
(in reply to comment #11) Commits to trunk are CTR (Commit then review), no votes required (or rather, only the one implied vote of the person committing it is required) but anyone can notice a problem and veto it to have the commit reverted. Commits to branches are RTC, requiring three votes and no vetoes before committing. Since only one vote is needed to commit to trunk, in general the committer who is proposing a patch for branch commits to trunk at the same time that they post the patch and call for votes for branch. Of course it can go somewhat differently if you are asking for opinions or review before you are sure that you like your own proposal, or if the patch is submitted by a non-committer and so doesn't even have one vote yet. But it is important to have the process such that it is difficult to accidentally forget to commit something to trunk that has been committed to branch. That has happened before with unfortunate results.
(In reply to comment #12) > (in reply to comment #11) > > Commits to trunk are CTR (Commit then review), no votes required (or rather, > only the one implied vote of the person committing it is required) but anyone > can notice a problem and veto it to have the commit reverted. > > Commits to branches are RTC, requiring three votes and no vetoes before > committing. > > Since only one vote is needed to commit to trunk, in general the committer who > is proposing a patch for branch commits to trunk at the same time that they > post the patch and call for votes for branch. > > Of course it can go somewhat differently if you are asking for opinions or > review before you are sure that you like your own proposal, or if the patch is > submitted by a non-committer and so doesn't even have one vote yet. > > But it is important to have the process such that it is difficult to > accidentally forget to commit something to trunk that has been committed to > branch. That has happened before with unfortunate results. Thanks for the detailed explanation. I will change my behavior with the trunk commits to be in line with this procedure. Regards, KAM
(in reply to comment #13) Oh, I should have mentioned that trunk goes to RTC for a freeze period as part of the release procedure, and you should see the wiki page about all this for more details: http://wiki.apache.org/spamassassin/DevelopmentMode
Just so there's the proper number of votes +1 to commit to 3.3 I agree that we don't really want a ton of decimal places, and variable could cause formatting issues (although we could possibly make this a configure-time option at some point, for those who really want more, that seems like a lot of work for not much point.) Regardless, if we're going to round, it is crucial we do it the same way everywhere.