Bug 6149 - False positives for ISO-2022-JP (Japanese)
Summary: False positives for ISO-2022-JP (Japanese)
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Score Generation (show other bugs)
Version: 3.2.5
Hardware: All All
: P2 normal
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-07 08:38 UTC by Thomas Maadie
Modified: 2009-08-18 13:13 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
Triggers PLING_QUERY,TVD_SPACE_RATIO,WEIRD_QUOTING text/plain None Thomas Maadie [NoCLA]
Triggers GAPPY_SUBJECT,TVD_SPACE_RATIO text/plain None Thomas Maadie [NoCLA]
Triggers PLING_QUERY,TVD_SPACE_RATIO text/plain None Thomas Maadie [NoCLA]
Triggers OBSCURED_EMAIL,TVD_SPACE_RATIO,WEIRD_QUOTING text/plain None Thomas Maadie [NoCLA]
Test: Put "Roadrunner" in Japanese katakana into the Subject. Encode in ISO-2022-JP before sending it... text/plain None Thomas Maadie [NoCLA]
Test case 1, part 2: Using the word "Roadrunner" in Japanese katakana in the subject line, encoded in UTF-8. text/plain None Thomas Maadie [NoCLA]
Test case 2: Using the word "Print" in Japanese katakana in the subject line, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 3: Using the word "Amazon" in double wide (2-byte) characters in the Subject line, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 4: Japanese Subject line test for GAPPY_SUBJECT, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 5: Japanese Subject line test for GAPPY_SUBJECT, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 6: Japanese Subject line test for GAPPY_SUBJECT, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 7: Japanese Subject line test for PLING_QUERY, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 8: Japanese Subject line test for PLING_QUERY, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]
Test case 9: Japanese Subject line test for PLING_QUERY, encoded in ISO-2022-JP. text/plain None Thomas Maadie [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Maadie 2009-07-07 08:38:12 UTC
High rate of false positives occur for ISO-2022-JP encoded e-mails.  This is a very critical problem for all Japanese users.

As examples, merely having any one of the following 3 legitimate lines of text in the Subject will trigger PLING_QUERY (2.2) and TVD_SPACE_RATIO (2.9), flagging the e-mail as SPAM:

【麻生内閣メールマガジン 第36号】安心・活力・責任(2009/06/25)

コンピュータサービスの件

[物件内見]メールフォームからメールが送信されました

In the e-mail's Body, certain legitimate text may also trigger WEIRD_QUOTING (2.8), OBSCURED_EMAIL (1.9), or other conditions.  Examples will be supplied upon request.

Such false positives DO NOT OCCUR for utf-8 encoding using the exact same text.

(This problem may be due to double-byte-character processing.  If so, other double-byte-character sets should be checked for the same problem, such as Big5 in Chinese.)
Comment 1 Justin Mason 2009-07-07 08:47:03 UTC
hi.  please attach plenty of false positives; the more we have, the more likely this can be fixed.
Comment 2 Thomas Maadie 2009-07-07 09:06:05 UTC
Created attachment 4478 [details]
Triggers PLING_QUERY,TVD_SPACE_RATIO,WEIRD_QUOTING

Test case 1:  Triggers PLING_QUERY,TVD_SPACE_RATIO,WEIRD_QUOTING

Use ISO-2022-JP encoding.

Use attached text as Body.

Use the following line as Subject:  
【麻生内閣メールマガジン 第36号】安心・活力・責任(2009/06/25)
Comment 3 Thomas Maadie 2009-07-07 09:19:44 UTC
Created attachment 4479 [details]
Triggers GAPPY_SUBJECT,TVD_SPACE_RATIO

Test Case 2:  Triggers GAPPY_SUBJECT,TVD_SPACE_RATIO

Use ISO-2022-JP encoding.

Use attachment as text.

Use the following subject line:  
ロードランナー
Comment 4 Thomas Maadie 2009-07-07 09:35:05 UTC
Created attachment 4480 [details]
Triggers PLING_QUERY,TVD_SPACE_RATIO

Test Case 3:  Triggers PLING_QUERY,TVD_SPACE_RATIO

Use ISO-2022-JP encoding.

Use attachment as body. (This is the same text as in "Test Case 2".)

Use the following line as Subject:
[物件内見]メールフォームからメールが送信されました
Comment 5 Thomas Maadie 2009-07-07 10:04:28 UTC
Created attachment 4481 [details]
Triggers OBSCURED_EMAIL,TVD_SPACE_RATIO,WEIRD_QUOTING

Test case 4:  Triggers OBSCURED_EMAIL,TVD_SPACE_RATIO,WEIRD_QUOTING

Use ISO-2022-JP encoding

Use attached text as body

Use the following line as Subject:
【麻生内閣メールマガジン 第37号】国家・国民の安全と繁栄を目指して(2009/07/02)
Comment 6 Thomas Maadie 2009-07-07 10:22:42 UTC
(In reply to comment #1)
> hi.  please attach plenty of false positives; the more we have, the more likely
> this can be fixed.

I'm done supplying attachments for the e-mail Body (Cases 1 thru 4).  I believe WEIRD_QUOTING and OBSCURED_EMAIL have to do with the Body only.

The other 3 conditions (PLING_QUERY, GAPPY_SUBJECT, and TVD_SPACE_RATIO) can probably be tested using the Subject line only, with an empty Body.
Comment 7 Thomas Maadie 2009-07-07 13:07:57 UTC
Please do the easy testing first.  And here they are...
Here are examples in the SUBJECT LINE which trigger GAPPY_SUBJECT,TVD_SPACE_RATIO...

ロードランナー
プリント
Amazon
X-cart テンプレート
FW: web のFTP情報変更とご連絡願い
Fwd: FW: コンドミニアム2410?2の矢野 8月の宿泊予約について
RE: Corporation  のIPアドレス、Subnet 、Gateway、DNS

Please put any one of the above 7 examples in your subject line.  And please test them using ISO-2022-JP encoding, and then use utf-8 encoding for comparison.

If the above qualifies as "G.a.p.p.y" in the Subject line, where are the offending spaces or characters in-between?  These are false positives.
Comment 8 Thomas Maadie 2009-07-07 20:54:29 UTC
I notice that most of the false positives for ISO-2022-JP occur in the header.  I presume that SpamAssassin cannot correctly parse Japanese headers encoded in ISO-2022-JP.  Is this so?

May I suggest that you use MIME decoders (and encoders) specific to ISO-2022-JP, before parsing them?

For decoding ISO-2022-JP header text, you should consider using &mimedecode($string,$kout) in mimer.pl (below)...
http://www.cc.rim.or.jp/~ikuta/mime_pls/mimer.pl

For encoding ISO-2022-JP header text, you should consider using &mimeencode($string) in mimew.pl (below)...
http://www.cc.rim.or.jp/~ikuta/mime_pls/mimew.pl

You can download these by clicking mime_pls202.tgz or mime_pls202.rat (under "Ver.2.02") here...
http://www.cc.rim.or.jp/~ikuta/mime_pls/download.html

If you need to contact the author, Mr. Noboru Ikuta, for permission or acknowledgement, you can contact him at noboru@ikuta.ichihara.chiba.jp .

Let me know if you need translation when contacting Mr. Ikuta, or if I should contact him directly in Japanese.
Comment 9 Justin Mason 2009-07-15 14:01:26 UTC
will look at this for 3.3.0
Comment 10 Justin Mason 2009-07-23 04:45:13 UTC
hi -- sorry for the delay, but these samples are not usable; they are not full email messages.  We need RFC-822 format messages: http://wiki.apache.org/spamassassin/Rfc822Format

in particular, you will need to ensure the raw encoding format is used in the headers and body, without any decoding from 7-bit ASCII.  we can't use the decoded form of ISO-2022-JP data.
Comment 11 Justin Mason 2009-07-23 04:45:54 UTC
will re-add to milestone when we get usable samples
Comment 12 Thomas Maadie 2009-07-23 12:09:20 UTC
(In reply to comment #10)
> hi -- sorry for the delay, but these samples are not usable; they are not full
> email messages.  We need RFC-822 format messages:
> http://wiki.apache.org/spamassassin/Rfc822Format
> 
> in particular, you will need to ensure the raw encoding format is used in the
> headers and body, without any decoding from 7-bit ASCII.  we can't use the
> decoded form of ISO-2022-JP data.

Hello Justin,

I appreciate your looking into this.

In addition to the ISO-2022-JP encoding, do you also
need the utf-8 encoded version for comparison?
You'll see different SpamAssassin scores,
depending on the encoding I use.

I'll use Outlook Express 6 and/or Thunderbird 2.0 to create
the raw format, per your linked instructions.
(I don't know if my Japanese version of Eudora 6.2 will do the job,
but I'll try that also.)

I'll work on it over the weekend.

Thomas Maadie
Comment 13 Justin Mason 2009-07-23 13:07:49 UTC
  (In reply to comment #12)
> I'll use Outlook Express 6 and/or Thunderbird 2.0 to create
> the raw format, per your linked instructions.
> (I don't know if my Japanese version of Eudora 6.2 will do the job,
> but I'll try that also.)

actually, Eudora 6.2 can almost definitely get a decent copy of the original messages more easily than Outlook Express can. look for some way of getting access to the "mbox" format of the folder, or ask on the users mailing list and I'm sure someone there will know how to do it...
Comment 14 Warren Togami 2009-08-03 05:22:33 UTC
I split my two Japanese users into their own masscheck named wt-japanese.  Unfortunately I have only ~1000 messages in this corpus, but it is showing definite signs of some rules being bad.  These rules seem to have very low Correctly Spam ratios across all corpora.

http://ruleqa.spamassassin.org/20090801-r799815-n/TVD_SPACE_RATIO/detail
26.7% FP rate
http://ruleqa.spamassassin.org/20090801-r799815-n/PLING_QUERY/detail
11.5% FP rate
http://ruleqa.spamassassin.org/20090802-r800007-n/OBSCURED_EMAIL/detail
6.5% FP rate
http://ruleqa.spamassassin.org/20090801-r799815-n/WEIRD_QUOTING/detail
4.8% FP rate
http://ruleqa.spamassassin.org/20090801-r799815-n/GAPPY_SUBJECT/detail
4.7% FP rate

These users insist that they have confirmed their Ham boxes manually.  I would like to split out folders containing specific rule hits and ask them to choose a few to submit as samples, but mboxget is misbehaving.  It would be nice if mboxget could output in mbox format.
Comment 15 Warren Togami 2009-08-03 05:25:02 UTC
Is there any way to easily detect if other wt-japanese rule FP ratios are differing substantially from the other corpora?
Comment 16 Karsten Bräckelmann 2009-08-03 08:14:30 UTC
At least the TVD_SPACE_RATIO and GAPPY_SUBJECT are known to FP on Japanese (and Chinese IIRC), bugs filed.

(In reply to comment #15)
> Is there any way to easily detect if other wt-japanese rule FP ratios are
> differing substantially from the other corpora?

You're using mbox format, so a hack like this should get you some nice hit-rates at least. Anything close to the top with a positive score is a candidate.

formail -c -x X-Spam-Status -s < MBOX | \
  sed -re 's/^.+tests=(.+) autolearn.+/\1/' -e 's/[, \t]+/\n/g' | \
  sort | uniq -c | sort -r -n

Hope this help.
Comment 17 Warren Togami 2009-08-03 08:28:04 UTC
http://ruleqa.spamassassin.org/
No, I meant scores in the corpora as compared to the average from other contributors.
Comment 18 Karsten Bräckelmann 2009-08-03 09:06:52 UTC
Well, just run it. Have a look at the top rules and apply some common sense. It should be dead easy to spot common FPs in there.

If in doubt, you can check back with ruleqa. But that hack *will* show you heavy hitters on your ham corpus.
Comment 19 Warren Togami 2009-08-03 09:09:01 UTC
Not exactly, much of the hand sorted mail didn't go through spamassassin at all.
Comment 20 Karsten Bräckelmann 2009-08-03 09:26:11 UTC
Ah. In that case, having a look at similar reports gathered from your mass-check logs probably should do. A fully automated check of your local hits against the ones on ruleqa probably can't be done easily.

Anyway, could we please move such discussions to the dev list? :)
Comment 21 Thomas Maadie 2009-08-03 12:20:33 UTC
Created attachment 4491 [details]
Test:  Put "Roadrunner" in Japanese katakana into the Subject.  Encode in ISO-2022-JP before sending it...

Uploading new tests for GAPPY_SUBJECT.

Test case 1: Gappy-Roadrunner-I.txt

Using the word "Roadrunner" in Japanese katakana in the subject line, encoded in ISO-2022-JP.  No body.

Please look at the "Subject:" line only, because that's all that matters for this test.

Encoded in ISO-2022-JP...
Subject: =?ISO-2022-JP?B?GyRCJW0hPCVJJWklcyVKITwbKEI=?=

Encoded in UTF-8...
Subject: =?UTF-8?B?44Ot44O844OJ44Op44Oz44OK44O8?=

The ISO-2022-JP one triggers GAPPY_SUBJECT & TVD_SPACE_RATIO.
The UTF-8 one only triggers TVD_SPACE_RATIO only.
Comment 22 Thomas Maadie 2009-08-03 12:24:48 UTC
Created attachment 4492 [details]
Test case 1, part 2:  Using the word "Roadrunner" in Japanese katakana in the subject line, encoded in UTF-8. 

Test case 1, part 2: Gappy-Roadrunner-U.txt

Using the word "Roadrunner" in Japanese katakana in the subject line, encoded in UTF-8.  No body.

See the last upload, Gappy-Roadrunner-U.txt, and compare these 2.
Comment 23 Thomas Maadie 2009-08-03 12:37:22 UTC
Created attachment 4493 [details]
Test case 2:  Using the word "Print" in Japanese katakana in the subject line, encoded in ISO-2022-JP.

Case 2 tests for GAPPY_SUBJECT.  Please look at the "Subject:" line only, because that's all that matters for
this test.

Encoded in ISO-2022-JP...
Subject: =?ISO-2022-JP?B?GyRCJVclaiVzJUgbKEI=?=

Encoded in UTF-8...
Subject: =?UTF-8?B?44OX44Oq44Oz44OI?=

The ISO-2022-JP one triggers GAPPY_SUBJECT & TVD_SPACE_RATIO.
The UTF-8 one only triggers TVD_SPACE_RATIO only.
Comment 24 Thomas Maadie 2009-08-03 12:47:03 UTC
Created attachment 4494 [details]
Test case 3:  Using the word "Amazon" in double wide (2-byte) characters in the Subject line, encoded in ISO-2022-JP.

Case 3 also tests for GAPPY_SUBJECT.  Please look at the "Subject:" line only,
because that's all that matters here.

Encoded in ISO-2022-JP...
Subject: =?ISO-2022-JP?B?GyRCI0EjbSNhI3ojbyNuGyhC?=

Encoded in UTF-8...
Subject: =?UTF-8?B?77yh772N772B772a772P772O?=

The ISO-2022-JP one triggers GAPPY_SUBJECT & TVD_SPACE_RATIO.
The UTF-8 one only triggers TVD_SPACE_RATIO only.
Comment 25 Thomas Maadie 2009-08-03 18:50:27 UTC
Created attachment 4495 [details]
Test case 4:  Japanese Subject line test for GAPPY_SUBJECT, encoded in ISO-2022-JP.

Case 4 tests for GAPPY_SUBJECT.  See the "Subject:" line.

Subject line text reads "web no FTP johou hennkou to gorennraku-negai" in Japanese.  (Original Japanese text and translation will be supplied upon request).

Encoded in ISO-2022-JP...
Subject: FW: web =?ISO-2022-JP?B?GyRCJE4jRiNUI1A+cEpzSlE5OSRIJDRPIk1tGyhC?=
 =?ISO-2022-JP?B?GyRCNGokJBsoQg==?=

Encoded in UTF-8...
Subject: FW: web =?UTF-8?B?44Gu77ym77y077yw5oOF5aCx5aSJ5pu044Go44GU6YCj57Wh?=
 =?UTF-8?B?6aGY44GE?=

The ISO-2022-JP one triggers GAPPY_SUBJECT & TVD_SPACE_RATIO.
The UTF-8 one triggers TVD_SPACE_RATIO only.
Comment 26 Thomas Maadie 2009-08-03 18:57:26 UTC
Created attachment 4496 [details]
Test case 5:  Japanese Subject line test for GAPPY_SUBJECT, encoded in ISO-2022-JP.

Case 5 tests for GAPPY_SUBJECT.  See the "Subject:" line.

Subject line text reads "Fwd: FW: condominium 2410?2 no Yano 8-gatsu no shukuhaku yoyaku ni tsuite" in
Japanese.  (Original Japanese text and translation will be supplied upon
request).

Encoded in ISO-2022-JP...
Subject: Fwd: FW: =?ISO-2022-JP?B?GyRCJTMlcyVJJV8lSyUiJWAjMiM0IzEjMBsoQj8bJEIjMhsoQg==?=
 =?ISO-2022-JP?B?GyRCJE5McExuISEjODduJE49SUdxTT1McyRLJEQkJCRGGyhC?=

Encoded in UTF-8...
Subject: Fwd: FW: =?UTF-8?B?44Kz44Oz44OJ44Of44OL44Ki44Og77yS77yU77yR77yQPw==?=
 =?UTF-8?B?77yS44Gu55+i6YeO44CA77yY5pyI44Gu5a6/5rOK5LqI57SE44Gr44Gk44GE44Gm?=

The ISO-2022-JP one triggers GAPPY_SUBJECT & TVD_SPACE_RATIO.
The UTF-8 one triggers TVD_SPACE_RATIO only.
Comment 27 Thomas Maadie 2009-08-03 19:02:22 UTC
Created attachment 4497 [details]
Test case 6:  Japanese Subject line test for GAPPY_SUBJECT, encoded in ISO-2022-JP.

Case 6 tests for GAPPY_SUBJECT.  See the "Subject:" line.

Subject line text reads "Re: Corporation no IP address, Subnet, Gagteway, DNS" in mixed English and Japanese.  (Original text and translation will be supplied upon request).

Encoded in ISO-2022-JP...
Subject: RE: Corporation  =?ISO-2022-JP?B?GyRCJE4jSSNQJSIlSSVsJTkhIhsoQlN1?=
 =?ISO-2022-JP?B?Ym5ldCAbJEIhIhsoQkdhdGV3YXkbJEIhIhsoQkROUw==?=

Encoded in UTF-8...
Subject: RE: Corporation  =?UTF-8?B?44Gu77yp77yw44Ki44OJ44Os44K544CBU3Vibg==?=
 =?UTF-8?B?ZXQg44CBR2F0ZXdheeOAgUROUw==?=

The ISO-2022-JP one triggers GAPPY_SUBJECT & TVD_SPACE_RATIO.
The UTF-8 one triggers TVD_SPACE_RATIO only.
Comment 28 Thomas Maadie 2009-08-03 19:09:17 UTC
Created attachment 4498 [details]
Test case 7:  Japanese Subject line test for PLING_QUERY, encoded in ISO-2022-JP.

Case 7 tests for PLING_QUERY.  See the "Subject:" line.

Subject line text reads "Computer service no ken" in Japanese.  (Original Japanese text and translation will be supplied upon request).

Encoded in ISO-2022-JP...
Subject: =?ISO-2022-JP?B?GyRCJTMlcyVUJWUhPCU/JTUhPCVTJTkkTjdvGyhC?=

Encoded in UTF-8...
Subject: =?UTF-8?B?44Kz44Oz44OU44Ol44O844K/44K144O844OT44K544Gu5Lu2?=

The ISO-2022-JP one triggers PLING_QUERY & TVD_SPACE_RATIO.
The UTF-8 one triggers TVD_SPACE_RATIO only.
Comment 29 Thomas Maadie 2009-08-03 19:39:19 UTC
Created attachment 4499 [details]
Test case 8:  Japanese Subject line test for PLING_QUERY, encoded in ISO-2022-JP.

Case 8 tests for PLING_QUERY.  See the "Subject:" line.

Subject line text reads "Mail address soushinn no ken" in Japanese.  (Original
Japanese text and translation will be supplied upon request).

Encoded in ISO-2022-JP...
Subject: =?ISO-2022-JP?B?GyRCJWEhPCVrJSIlSSVsJTkhIUF3Py4kTjdvGyhC?=

Encoded in UTF-8...
Subject: =?UTF-8?B?44Oh44O844Or44Ki44OJ44Os44K544CA6YCB5L+h44Gu5Lu2?=

The ISO-2022-JP one triggers PLING_QUERY & TVD_SPACE_RATIO.
The UTF-8 one triggers TVD_SPACE_RATIO only.
Comment 30 Thomas Maadie 2009-08-03 19:54:23 UTC
Created attachment 4500 [details]
Test case 9:  Japanese Subject line test for PLING_QUERY, encoded in ISO-2022-JP.

Case 9 tests for PLING_QUERY.  See the "Subject:" line.

Subject line text reads "Shin-shohin Tekoge Kakaku-minaoshi" in Japanese.  (Original Japanese text and translation will be supplied upon request).

Encoded in ISO-2022-JP...
Subject: Fw: =?ISO-2022-JP?B?GyRCPzc+JklKISE8aj5HJDIhITJBM0o4K0Q+JDcbKEI=?=

Encoded in UTF-8...
Subject: Fw: =?UTF-8?B?5paw5ZWG5ZOB44CA5omL54Sm44GS44CA5L6h5qC86KaL55u044GX?=

The ISO-2022-JP one triggers PLING_QUERY & TVD_SPACE_RATIO.
The UTF-8 one triggers TVD_SPACE_RATIO only.
Comment 32 Justin Mason 2009-08-18 13:02:43 UTC
(In reply to comment #30)
> Created an attachment (id=4500) [details]
> Test case 9:  Japanese Subject line test for PLING_QUERY, encoded in
> ISO-2022-JP.

thanks for all these samples Thomas, they helped a lot!  they're all now in the rule regression test suite too.

: 680...; svn commit -m "bug 6149: fix false positives on ISO-2022-JP mail for TVD_SPACE_RATIO, GAPPY_SUBJECT, PLING_QUERY reported by Thomas Maadie"
Sending        rules/20_body_tests.cf
Sending        rules/20_head_tests.cf
Adding         t.rules/GAPPY_SUBJECT
Adding         t.rules/GAPPY_SUBJECT/fp_bug6149_thomas_att4491_Gappy-Roadrunner-I
Adding         t.rules/GAPPY_SUBJECT/fp_bug6149_thomas_att4493
Adding         t.rules/GAPPY_SUBJECT/fp_bug6149_thomas_att4494
Adding         t.rules/GAPPY_SUBJECT/fp_bug6149_thomas_att4495
Adding         t.rules/GAPPY_SUBJECT/fp_bug6149_thomas_att4497
Adding         t.rules/PLING_QUERY
Adding         t.rules/PLING_QUERY/fp_bug6149_thomas_att4496
Adding         t.rules/PLING_QUERY/fp_bug6149_thomas_att4498
Adding         t.rules/PLING_QUERY/fp_bug6149_thomas_att4499
Adding         t.rules/PLING_QUERY/fp_bug6149_thomas_att4500
Adding         t.rules/TVD_SPACE_RATIO
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4491_Gappy-Roadrunner-I
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4492_Gappy-Roadrunner-U
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4493
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4494
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4495
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4496
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4497
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4498
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4499
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_att4500
Sending        t.rules/run
Transmitting file data ......................
Committed revision 805559.
Comment 33 Justin Mason 2009-08-18 13:13:23 UTC
and the remaining ones (as best I could reconstruct them):

: 716...; svn commit -m "bug 6149: fix more false positives on ISO-2022-JP mail for TVD_SPACE_RATIO reported by Thomas Maadie. add more samples to t.rules test suite for WEIRD_QUOTING, TVD_SPACE_RATIO, GAPPY_SUBJECT, OBSCURED_EMAIL to avoid regressions in future"
Sending        rules/20_body_tests.cf
Adding         t.rules/GAPPY_SUBJECT/fp_bug6149_thomas_4479
Adding         t.rules/OBSCURED_EMAIL
Adding         t.rules/OBSCURED_EMAIL/fp_bug6149_thomas_4481
Adding         t.rules/PLING_QUERY/fp_bug6149_thomas_4478
Adding         t.rules/PLING_QUERY/fp_bug6149_thomas_4480
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_4478
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_4479
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_4480
Adding         t.rules/TVD_SPACE_RATIO/fp_bug6149_thomas_4481
Adding         t.rules/WEIRD_QUOTING
Adding         t.rules/WEIRD_QUOTING/fp_bug6149_thomas_4478
Adding         t.rules/WEIRD_QUOTING/fp_bug6149_thomas_4481
Transmitting file data ...........
Committed revision 805563.


better samples for those first 4 attachments would be welcome, I suspect my reconstructions may have missed something.