Bug 1373 - Excessive Commenting in code
Summary: Excessive Commenting in code
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: unspecified
Hardware: Other other
: P2 enhancement
Target Milestone: 2.60
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-01-14 12:07 UTC by Robert J. Accettura
Modified: 2003-05-19 04:06 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Robert J. Accettura 2003-01-14 12:07:39 UTC
SpamAssassin should check to see if emails contain excessive amounts of comments
in the html 

<!-- SOME TEXT, OR NO TEXT-->

This is often used by spammers to disrupt the use of spam filters such as
spamassassin from correctly detecting email.  There is no reason to use a
comment in HTML mail anyway, since it's never hand edited, and isn't processed
by a server (SSI tags and such) so it's a pretty good indication of spam.
Comment 1 Theo Van Dinter 2003-01-14 12:26:38 UTC
Comments in general or comments in the middle of words?  The latter is OBFUSCATING_COMMENT?  If the former, please attach a sample.
Comment 2 Daniel Quinlan 2003-01-14 12:34:59 UTC
I thought he meant comments in general, but now I'm not quite so sure.

If he did...

This rule might be usable if we create a meta and "and" it with CTYPE_JUST_HTML.
Actually, now that I think about it, a lot of our failed HTML rules might work
good enough to use if we "and" them with CTYPE_JUST_HTML.
Comment 3 Robert J. Accettura 2003-01-14 12:52:43 UTC
I meant in general, since that catches both.  There is no reason for comments in
HTML mail, since it isn't hand edited, nor does it go through a parser.  It's
limited to being a spammer technique.
Comment 4 Daniel Quinlan 2003-01-14 13:11:31 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

robert@accettura.com writes:

> I meant in general, since that catches both.  There is no reason for
> comments in HTML mail, since it isn't hand edited, nor does it go
> through a parser.  It's limited to being a spammer technique.

You have to be careful.  It's difficult to tell the difference between a
legitimate attached HTML file and an HTML part that's the HTML version
of text (I suppose you could compare them, but we're not set up for
that.)

Also, some legitimate newsletters use HTML editors as well as
hand-edited HTML and contain mark-up not found in your average HTML
email, which does, in fact, sometimes include comments.

That's why I said it would be necessary to require CTYPE_JUST_HTML (and
even then, it might not work well-enough for general use due to
newsletters and such).  Actually, just doing a quick hand-test.  63 of
the 335 messages containing text/html in my spam corpus contain HTML
comments.

Comment 5 Robert J. Accettura 2003-01-14 13:18:53 UTC
But newsletters tend to be sent out by majordomo, auto whitelisted, etc.  The
typical legit-newsletter rules (I'm researching if there are some more to be
added)  If those rules are accurate enough (and should get better), the comments
could get a small point value without really harming legitimate mail no?

I think it's worth less than 1.0 (at least at this time).  Perhaps a 0.2.

>You have to be careful.  It's difficult to tell the difference between a
>legitimate attached HTML file and an HTML part that's the HTML version
>of text (I suppose you could compare them, but we're not set up for
>that.)

Didn't realize that.  Coming to 3.0?
Comment 6 Justin Mason 2003-01-14 14:35:24 UTC
Subject: Re: [SAdev]  Excessive Commenting in code 


> I meant in general, since that catches both.  There is no reason for
> comments in HTML mail, since it isn't hand edited, nor does it go
> through a parser.  It's limited to being a spammer technique.

data point: actually, MS Word documents saved as HTML do contain comments
(it stores conditional statements in there).

Comment 7 Michael Moncur 2003-01-14 15:00:53 UTC
I grepped my nonspam corpus for HTML comments and found:

 - Several messages from friends that appear to be composed in MS Word 
(WordMail for Outlook probably) and have a bunch of comments (metadata, not 
human-specified comments)
 - Daily Dilbert newsletters from unitedmedia.com, which use comments around a 
script
 - A GoDaddy.com domain renewal notice that has the entire text repeated in an 
HTML comment for some strange reason
 - A Register.com renewal notice with a "saved from url" comment
 - An HTML newsletter from a software comany that has the text version in an 
HTML comment rather than a MIME part
 - An HTML newsletter from an online store with some kind of tracking 
information in a comment
 - A message sent to a YahooGroups list via their online interface, containing 
a blank comment for some reason

...so I'm not sure how useful this test will be.
Comment 8 Malte S. Stretz 2003-01-14 15:23:20 UTC
I've got various obviously script-generated newsletters containing many  
comments. I'd say WONTFIX. 
Comment 9 Daniel Quinlan 2003-01-14 22:14:06 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

> I've got various obviously script-generated newsletters containing many  
> comments. I'd say WONTFIX. 

It's possible some spam has insanely high percentages of comments, like
90% commenting.  Maybe worth a try after 2.50 is out.

Do something like:

in html_comment, keep a counter like:

  $self->{html}{comment_length} += length($text) + 7;	# 7 = "<!--" + "-->"

then do:

  if ($self->{html}{non_uri_len}) {
    $self->{html}{comment_ratio} = $self->{html}{comment_length} / $self->{html}{non_uri_len};
  }

then a range test in 10% increments, etc.

Comment 10 Robert J. Accettura 2003-01-15 06:55:17 UTC
I've been hand anysising spam in three categories:

Spam
Newletters/Mass legit mailings
Normal Mail


What I have found is that spammers tend to use random data in comments between
words, as well as what to me looks like totally random placement.  I guess it's
to obscure code from easy view.

What I have found is that ones that do so, tend to use massive quantities so
that it appears to be greater than 40% of the code (just an estimate).  It tends
to be all out, or none at all.
Comment 11 Theo Van Dinter 2003-03-02 16:42:40 UTC
ok, I've put this in testing for 2.60.  It looks pretty good for me, but I 
don't get a lot of HTML mail so ...   The results I got were:

Just the ratios, not looking for MIME_HTML_ONLY:

  4.059   6.0792   0.2261    0.964   0.81    1.00  __HTML_COMMENT_RATIO_00_10
  1.558   2.3703   0.0174    0.993   0.89    1.00  __HTML_COMMENT_RATIO_10_20
  0.402   0.6052   0.0174    0.972   0.82    1.00  __HTML_COMMENT_RATIO_20_30
  0.600   0.9123   0.0087    0.991   0.88    1.00  __HTML_COMMENT_RATIO_30_40
  0.189   0.2888   0.0000    1.000   0.90    1.00  __HTML_COMMENT_RATIO_40_50
  0.183   0.2797   0.0000    1.000   0.90    1.00  __HTML_COMMENT_RATIO_50_60
  0.165   0.2476   0.0087    0.966   0.80    1.00  __HTML_COMMENT_RATIO_60_70
  0.315   0.4814   0.0000    1.000   0.91    1.00  __HTML_COMMENT_RATIO_70_80
  0.201   0.3072   0.0000    1.000   0.90    1.00  __HTML_COMMENT_RATIO_80_90
  0.000   0.0000   0.0000    0.500   0.00    1.00  __HTML_COMMENT_RATIO_90_100

Making it a meta with MIME_HTML_ONLY ...

  2.426   3.6860   0.0348    0.991   0.88    0.01  T_HTML_COMMENT_RATIO_00_10
  1.225   1.8705   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_10_20
  0.357   0.5456   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_20_30
  0.564   0.8619   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_30_40
  0.177   0.2705   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_40_50
  0.159   0.2430   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_50_60
  0.156   0.2384   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_60_70
  0.306   0.4676   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_70_80
  0.195   0.2980   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_80_90
  0.000   0.0000   0.0000    0.500   0.00    0.01  T_HTML_COMMENT_RATIO_90_100
Comment 12 Daniel Quinlan 2003-03-02 20:36:23 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

felicity@kluge.net wrote:

> Just the ratios, not looking for MIME_HTML_ONLY:

How about without MIME_HTML_ONLY?  That rule is so effective (and has a
high spam hit rate) that I've started worrying way too easy to rely on
it as a FP reduction tool.  I have no data to back this up, though.  :-)

Looking at HTML rules with __MIME_HTML or HTML_MESSAGE seems pretty
safe to me, though.
 
Comment 13 Theo Van Dinter 2003-03-02 20:41:59 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

On Sun, Mar 02, 2003 at 08:36:24PM -0800, bugzilla-daemon@hughes-family.org wrote:
> How about without MIME_HTML_ONLY?  That rule is so effective (and has a
> high spam hit rate) that I've started worrying way too easy to rely on
> it as a FP reduction tool.  I have no data to back this up, though.  :-)

I'm not sure what you're asking.  Are you asking for the comment ratio
results with the meta, or for "HTML_COMMENT_RATIO... && !MIME_HTML_ONLY"?

If the former, that was posted.  If the latter, I don't know, but we
could try it if you think it would be a useful set of tests.

Comment 14 Daniel Quinlan 2003-03-02 21:38:07 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

bugzilla-daemon@hughes-family.org writes:

> I'm not sure what you're asking.  Are you asking for the comment ratio
> results with the meta, or for "HTML_COMMENT_RATIO... && !MIME_HTML_ONLY"?
> 
> If the former, that was posted.  If the latter, I don't know, but we
> could try it if you think it would be a useful set of tests.

I was mostly saying I'd rather not have this meta test require
MIME_HTML_ONLY if it works well enough with HTML_MESSAGE or __MIME_HTML.

It might be interesting to see results for (HTML_COMMENT_RATIO... &&
__MIME_HTML) and (HTML_COMMENT_RATIO... && HTML_MESSAGE) compared with
the (HTML_COMMENT_RATIO... && MIME_HTML_ONLY) ones.

Comment 15 Theo Van Dinter 2003-03-03 06:27:26 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

On Sun, Mar 02, 2003 at 09:38:08PM -0800, bugzilla-daemon@hughes-family.org wrote:
> It might be interesting to see results for (HTML_COMMENT_RATIO... &&
> __MIME_HTML) and (HTML_COMMENT_RATIO... && HTML_MESSAGE) compared with
> the (HTML_COMMENT_RATIO... && MIME_HTML_ONLY) ones.

Below are my results, sorted by rule name.  MIME_HTML_ONLY produces the
best S/O ratios at 0.991 (0-10) or 1.0 (10-100) while catching 5.57% of
all messages.  HTML_MESSAGE has S/O ratios ranging from 0.963 - 1 and
catches 7.65% of all messages.  __MIME_HTML has S/O ratios of 0.962 -
1 and catches 7.42% of all messages.  So I still like MIME_HTML_ONLY:
It catches less messages overall, but is more accurate.  I've committed
the new rules for testing in a larger arena. :)

  4.069   6.0889   0.2343    0.963   0.80    0.01  T_HTML_COMMENT_RATIO_00_10_HTML_MESSAGE
  2.431   3.6935   0.0347    0.991   0.88    0.01  T_HTML_COMMENT_RATIO_00_10_MIME_HTML_ONLY
  4.009   5.9974   0.2343    0.962   0.80    0.01  T_HTML_COMMENT_RATIO_00_10___MIME_HTML
  1.551   2.3587   0.0174    0.993   0.89    0.01  T_HTML_COMMENT_RATIO_10_20_HTML_MESSAGE
  1.222   1.8651   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_10_20_MIME_HTML_ONLY
  1.362   2.0708   0.0174    0.992   0.88    0.01  T_HTML_COMMENT_RATIO_10_20___MIME_HTML
  0.401   0.6034   0.0174    0.972   0.82    0.01  T_HTML_COMMENT_RATIO_20_30_HTML_MESSAGE
  0.356   0.5440   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_20_30_MIME_HTML_ONLY
  0.398   0.5988   0.0174    0.972   0.82    0.01  T_HTML_COMMENT_RATIO_20_30___MIME_HTML
  0.596   0.9051   0.0087    0.991   0.88    0.01  T_HTML_COMMENT_RATIO_30_40_HTML_MESSAGE
  0.563   0.8594   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_30_40_MIME_HTML_ONLY
  0.599   0.9097   0.0087    0.991   0.88    0.01  T_HTML_COMMENT_RATIO_30_40___MIME_HTML
  0.186   0.2834   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_40_50_HTML_MESSAGE
  0.177   0.2697   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_40_50_MIME_HTML_ONLY
  0.189   0.2880   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_40_50___MIME_HTML
  0.177   0.2697   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_50_60_HTML_MESSAGE
  0.159   0.2423   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_50_60_MIME_HTML_ONLY
  0.183   0.2788   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_50_60___MIME_HTML
  0.168   0.2514   0.0087    0.967   0.80    0.01  T_HTML_COMMENT_RATIO_60_70_HTML_MESSAGE
  0.162   0.2468   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_60_70_MIME_HTML_ONLY
  0.171   0.2560   0.0087    0.967   0.81    0.01  T_HTML_COMMENT_RATIO_60_70___MIME_HTML
  0.305   0.4663   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_70_80_HTML_MESSAGE
  0.305   0.4663   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_70_80_MIME_HTML_ONLY
  0.314   0.4800   0.0000    1.000   0.91    0.01  T_HTML_COMMENT_RATIO_70_80___MIME_HTML
  0.195   0.2971   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_80_90_HTML_MESSAGE
  0.195   0.2971   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_80_90_MIME_HTML_ONLY
  0.195   0.2971   0.0000    1.000   0.90    0.01  T_HTML_COMMENT_RATIO_80_90___MIME_HTML
  0.000   0.0000   0.0000    0.500   0.00    0.01  T_HTML_COMMENT_RATIO_90_100_HTML_MESSAGE
  0.000   0.0000   0.0000    0.500   0.00    0.01  T_HTML_COMMENT_RATIO_90_100_MIME_HTML_ONLY
  0.000   0.0000   0.0000    0.500   0.00    0.01  T_HTML_COMMENT_RATIO_90_100___MIME_HTML

Comment 16 Daniel Quinlan 2003-03-05 21:14:46 UTC
Reopening for further comment (not a simple rule in terms of what we should do).

Here are the HTML_MESSAGE scores for this rule (rod/theo/quinlan) for last
night's corpus run:

  0.460   0.5072   0.0000    1.000   0.96    0.01 
T_HTML_COMMENT_RATIO_70_80_MIME_HTML_ONLY
  0.330   0.3644   0.0000    1.000   0.96    0.01 
T_HTML_COMMENT_RATIO_80_90_MIME_HTML_ONLY
  0.477   0.5184   0.0725    0.877   0.65    0.01 
T_HTML_COMMENT_RATIO_70_80___MIME_HTML
  0.477   0.5184   0.0725    0.877   0.65    0.01 
T_HTML_COMMENT_RATIO_70_80_HTML_MESSAGE
  0.351   0.3794   0.0725    0.840   0.57    0.01 
T_HTML_COMMENT_RATIO_80_90_HTML_MESSAGE
  0.344   0.3719   0.0725    0.837   0.56    0.01 
T_HTML_COMMENT_RATIO_80_90___MIME_HTML
  1.991   2.1000   0.9424    0.690   0.32    0.01  T_HTML_COMMENT_BLANK
  0.000   0.0000   0.0000    0.500   0.12    0.01 
T_HTML_COMMENT_RATIO_90_100_MIME_HTML_ONLY
  0.000   0.0000   0.0000    0.500   0.12    0.01 
T_HTML_COMMENT_RATIO_90_100___MIME_HTML
  0.000   0.0000   0.0000    0.500   0.12    0.01 
T_HTML_COMMENT_RATIO_90_100_HTML_MESSAGE
  1.014   1.0106   1.0511    0.490   0.11    0.01 
T_HTML_COMMENT_RATIO_30_40_MIME_HTML_ONLY
  5.484   5.2707   7.5390    0.411   0.07    0.01 
T_HTML_COMMENT_RATIO_00_10_MIME_HTML_ONLY
 10.181   9.3317  18.3762    0.337   0.05    0.01 
T_HTML_COMMENT_RATIO_00_10___MIME_HTML
 10.290   9.4181  18.7024    0.335   0.05    0.01 
T_HTML_COMMENT_RATIO_00_10_HTML_MESSAGE
  0.310   0.2893   0.5074    0.363   0.05    0.01 
T_HTML_COMMENT_RATIO_60_70_MIME_HTML_ONLY
  2.481   2.2841   4.3856    0.342   0.04    0.01  T_HTML_COMMENT_NO_ALPHANUM
  0.327   0.3005   0.5799    0.341   0.04    0.01 
T_HTML_COMMENT_RATIO_60_70_HTML_MESSAGE
  0.327   0.3005   0.5799    0.341   0.04    0.01 
T_HTML_COMMENT_RATIO_60_70___MIME_HTML
  2.607   2.3705   4.8931    0.326   0.04    0.01 
T_HTML_COMMENT_RATIO_10_20_MIME_HTML_ONLY
  0.807   0.7363   1.4860    0.331   0.04    0.01 
T_HTML_COMMENT_RATIO_20_30_MIME_HTML_ONLY
  3.469   3.0392   7.6115    0.285   0.03    0.01 
T_HTML_COMMENT_RATIO_10_20_HTML_MESSAGE
  3.213   2.7725   7.4665    0.271   0.02    0.01 
T_HTML_COMMENT_RATIO_10_20___MIME_HTML
  1.256   1.0857   2.8996    0.272   0.02    0.01 
T_HTML_COMMENT_RATIO_30_40___MIME_HTML
  1.256   1.0857   2.8996    0.272   0.02    0.01 
T_HTML_COMMENT_RATIO_30_40_HTML_MESSAGE
  1.008   0.8077   2.9358    0.216   0.01    0.01 
T_HTML_COMMENT_RATIO_20_30___MIME_HTML
  1.014   0.8115   2.9721    0.214   0.01    0.01 
T_HTML_COMMENT_RATIO_20_30_HTML_MESSAGE
  0.425   0.3343   1.3048    0.204   0.01    0.01 
T_HTML_COMMENT_RATIO_40_50_MIME_HTML_ONLY
  0.432   0.3268   1.4498    0.184   0.01    0.01 
T_HTML_COMMENT_RATIO_50_60___MIME_HTML
  0.432   0.3268   1.4498    0.184   0.01    0.01 
T_HTML_COMMENT_RATIO_50_60_HTML_MESSAGE
  0.470   0.3531   1.5948    0.181   0.01    0.01 
T_HTML_COMMENT_RATIO_40_50_HTML_MESSAGE
  0.466   0.3494   1.5948    0.180   0.01    0.01 
T_HTML_COMMENT_RATIO_40_50___MIME_HTML
  0.391   0.2930   1.3411    0.179   0.01    0.01 
T_HTML_COMMENT_RATIO_50_60_MIME_HTML_ONLY

It looks like 70 and above are usable.  The average rank for 70 and above
is higher for the MIME_HTML_ONLY versions, so I would be okay using it.
It looks safe to use __MIME_HTML_ONLY, though, so I'd suggest that just
in case someone manages to successfully forge a hotmail message.

The S/O ratio is also so low for the lower end of the range that we might
as well leave all of these rules in and see if any are usable by the GA
as compensation rules (we don't have to explicitly tag stuff as nice, do we?)

Dan
Comment 17 Daniel Quinlan 2003-03-05 21:15:37 UTC
Let me try that table again:

  0.460   0.5072   0.0000    1.000   0.96    0.01 
T_HTML_COMMENT_RATIO_70_80_MIME_HTML_ONLY
  0.330   0.3644   0.0000    1.000   0.96    0.01 
T_HTML_COMMENT_RATIO_80_90_MIME_HTML_ONLY
  0.477   0.5184   0.0725    0.877   0.65    0.01 
T_HTML_COMMENT_RATIO_70_80___MIME_HTML
  0.477   0.5184   0.0725    0.877   0.65    0.01 
T_HTML_COMMENT_RATIO_70_80_HTML_MESSAGE
  0.351   0.3794   0.0725    0.840   0.57    0.01 
T_HTML_COMMENT_RATIO_80_90_HTML_MESSAGE
  0.344   0.3719   0.0725    0.837   0.56    0.01 
T_HTML_COMMENT_RATIO_80_90___MIME_HTML
  1.991   2.1000   0.9424    0.690   0.32    0.01  T_HTML_COMMENT_BLANK
  0.000   0.0000   0.0000    0.500   0.12    0.01 
T_HTML_COMMENT_RATIO_90_100_MIME_HTML_ONLY
  0.000   0.0000   0.0000    0.500   0.12    0.01 
T_HTML_COMMENT_RATIO_90_100___MIME_HTML
  0.000   0.0000   0.0000    0.500   0.12    0.01 
T_HTML_COMMENT_RATIO_90_100_HTML_MESSAGE
  1.014   1.0106   1.0511    0.490   0.11    0.01 
T_HTML_COMMENT_RATIO_30_40_MIME_HTML_ONLY
  5.484   5.2707   7.5390    0.411   0.07    0.01 
T_HTML_COMMENT_RATIO_00_10_MIME_HTML_ONLY
 10.181   9.3317  18.3762    0.337   0.05    0.01 
T_HTML_COMMENT_RATIO_00_10___MIME_HTML
 10.290   9.4181  18.7024    0.335   0.05    0.01 
T_HTML_COMMENT_RATIO_00_10_HTML_MESSAGE
  0.310   0.2893   0.5074    0.363   0.05    0.01 
T_HTML_COMMENT_RATIO_60_70_MIME_HTML_ONLY
  2.481   2.2841   4.3856    0.342   0.04    0.01  T_HTML_COMMENT_NO_ALPHANUM
  0.327   0.3005   0.5799    0.341   0.04    0.01 
T_HTML_COMMENT_RATIO_60_70_HTML_MESSAGE
  0.327   0.3005   0.5799    0.341   0.04    0.01 
T_HTML_COMMENT_RATIO_60_70___MIME_HTML
  2.607   2.3705   4.8931    0.326   0.04    0.01 
T_HTML_COMMENT_RATIO_10_20_MIME_HTML_ONLY
  0.807   0.7363   1.4860    0.331   0.04    0.01 
T_HTML_COMMENT_RATIO_20_30_MIME_HTML_ONLY
  3.469   3.0392   7.6115    0.285   0.03    0.01 
T_HTML_COMMENT_RATIO_10_20_HTML_MESSAGE
  3.213   2.7725   7.4665    0.271   0.02    0.01 
T_HTML_COMMENT_RATIO_10_20___MIME_HTML
  1.256   1.0857   2.8996    0.272   0.02    0.01 
T_HTML_COMMENT_RATIO_30_40___MIME_HTML
  1.256   1.0857   2.8996    0.272   0.02    0.01 
T_HTML_COMMENT_RATIO_30_40_HTML_MESSAGE
  1.008   0.8077   2.9358    0.216   0.01    0.01 
T_HTML_COMMENT_RATIO_20_30___MIME_HTML
  1.014   0.8115   2.9721    0.214   0.01    0.01 
T_HTML_COMMENT_RATIO_20_30_HTML_MESSAGE
  0.425   0.3343   1.3048    0.204   0.01    0.01 
T_HTML_COMMENT_RATIO_40_50_MIME_HTML_ONLY
  0.432   0.3268   1.4498    0.184   0.01    0.01 
T_HTML_COMMENT_RATIO_50_60___MIME_HTML
  0.432   0.3268   1.4498    0.184   0.01    0.01 
T_HTML_COMMENT_RATIO_50_60_HTML_MESSAGE
  0.470   0.3531   1.5948    0.181   0.01    0.01 
T_HTML_COMMENT_RATIO_40_50_HTML_MESSAGE
  0.466   0.3494   1.5948    0.180   0.01    0.01 
T_HTML_COMMENT_RATIO_40_50___MIME_HTML
  0.391   0.2930   1.3411    0.179   0.01    0.01 
T_HTML_COMMENT_RATIO_50_60_MIME_HTML_ONLY
Comment 18 Theo Van Dinter 2003-03-06 06:30:20 UTC
Subject: Re: [SAdev]  Excessive Commenting in code

On Wed, Mar 05, 2003 at 09:14:47PM -0800, bugzilla-daemon@hughes-family.org wrote:
> The S/O ratio is also so low for the lower end of the range that we might
> as well leave all of these rules in and see if any are usable by the GA
> as compensation rules (we don't have to explicitly tag stuff as nice, do we?)

In the current GA code, yes, we would have to.  Any rule not marked as
"nice" is forced to have a >= 0 score.

I'd first like to pick what set of those rules we want to use, then leave
in the whole set for further testing (right now we only have 3 people
total doing nightly runs, so the results are telling but not conclusive.)

Comment 19 Theo Van Dinter 2003-05-19 12:06:53 UTC
there is now a version checked into 2.60:

  0.266   0.8842   0.0061    0.993   0.94    1.00  HTML_COMMENT_RATIO

which works out pretty well I think. :)