SA Bugzilla – Bug 5349
scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output
Last modified: 2012-01-17 15:20:54 UTC
i can't get any discussion abt _whether_ this *is* a bug, so i'll simply file it as one, and decide-by-discussion here ... testing, spamassassin --version SpamAssassin version 3.2.0-pre1-r499012 running on Perl version 5.8.8 & using with, FuzzyOCR 3.5.1 plugin. two test cases, (1) spamasssassin @ cmd_line (2) sent/recd email show similar behavior of missing/truncated fuzzyocr headers, only in the case of v320/trunk; v318/trunk is ok, and does not display this behavior. case (1): test with, spamassassin -D -t -x < /usr/ports/FuzzyOcr/samples/ocr-animated.eml in 'verbose' fuzzyocr.log, ... 2007-02-22 14:07:35 [6252] Found: 1 images 2007-02-22 14:07:35 [6252] Found GIF header name="CIMG0980.gif" 2007-02-22 14:07:36 [6252] Image is interlaced or animated... 2007-02-22 14:07:36 [6252] File contains <7> images, deanimating... 2007-02-22 14:07:37 [6252] Calculating image hash for: /tmp/.spamassassin6252Qdn9h3tmp/CIMG0980.gif.pnm 2007-02-22 14:07:37 [6252] Updating Exact info File:'CIMG0980.gif' Type:'image/gif' 2007-02-22 14:07:37 [6252] Found Score <15.500> for Exact Image Hash 2007-02-22 14:07:37 [6252] Matched [1] time(s). Prev match: 15 min. 40 sec. ago 2007-02-22 14:07:37 [6252] Message is SPAM. Words found: "investor" in 1 lines "price" in 2 lines "company" in 1 lines "alert" in 1 lines "valium" in 1 lines "trade" in 1 lines "banking" in 1 lines "news" in 1 lines (13.5 word occurrences found) % but, at console, i _only_ see, ... Content analysis details: (43.7 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS 4.5 HELO_LOCALHOST HELO_LOCALHOST 0.5 FH_MSGID_01C67 Special MSGID 2.3 CTYPE_001C_A CTYPE_001C_A 1.7 OUTLOOK_3416 Claims to be sent by an unusual build of Outlook (3416) 0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs some mails 3.3 DATE_IN_FUTURE_12_24 Date: is 12 to 24 hours after Received: date 5.0 BOTNET Relay might be a spambot or virusbot [botnet0.7,ip=58.186.156.15,nordns] 0.0 DKIM_POLICY_SIGNSOME Domain Keys Identified Mail: policy says domain signs some mails 0.0 BOTNET_NORDNS Relay's IP address has no PTR record [botnet_nordns,ip=58.186.156.15] 0.0 HTML_MESSAGE BODY: HTML included in message 1.9 TVD_VIS_HIDDEN RAW: TVD_VIS_HIDDEN 1.8 MIME_QP_LONG_LINE RAW: Quoted-printable line longer than 76 chars 1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level above 50% [cf: 100] 0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level above 50% [cf: 100] 0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% [cf: 100] 1.4 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/) 0.0 DIGEST_MULTIPLE Message hits more than one network digest check 3.6 XMAILER_MIMEOLE_OL_465CD XMAILER_MIMEOLE_OL_465CD 1.9 HDR_ORDER_FTSDMCXX_001C Header order similar to spam (FTSDMCXX/MID variant) 0.7 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image 11 FUZZY_OCR BODY: % NOTE, there's NO detail to the FUZZY_OCR header output. case (2): test, with a 'sent/recd' email, rather than just a file test @ cmd_line similarly, with this image, http://img181.imageshack.us/img181/2156/spamsc2.gif attached to an otherwise blank email, on receipt, i see in "FuzzyOCR.log", 2007-02-22 14:22:57 [27803] Processing Message with ID "<1172182945.27063@spamassassin_spamd_init>" (ignore@compiling.spamassassin.taint.org -> <no receipients>) 2007-02-22 14:25:10 [6298] Processing Message with ID "<45EC358A.4080104@gmail.com>" (SnowCrash <schneecrash+spamassassin@gmail.com> -> "SnowCrash" <snowcrash@mydomain.com>) 2007-02-22 14:25:10 [6298] GIF: [320x512] spam.gif (10195) 2007-02-22 14:25:10 [6298] Found: 1 images 2007-02-22 14:25:10 [6298] Found GIF header name="spam.gif" 2007-02-22 14:25:11 [6298] Image is single non-interlaced... 2007-02-22 14:25:12 [6298] Calculating image hash for: /tmp/.spamassassin6298Zhf5nItmp/spam.gif.pnm 2007-02-22 14:25:12 [6298] Scanset Order: ocrad(0) ocrad-invert(0) ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0) 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "target" with fuzz of 0.0000 line: "target s" 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "investor" with fuzz of 0.2500 line: " fhe lncreasing inrest receilled br th liile gotwtg" 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "breaking" with fuzz of 0.2500 line: " fhe lncreasing inrest receilled br th liile gotwtg" 2007-02-22 14:25:22 [6298] Scanset "ocrad-decolorize" found word "target" with fuzz of 0.0000 line: "target s" 2007-02-22 14:25:22 [6298] Scanset "ocrad-decolorize" found word "investor" with fuzz of 0.2500 line: " fhe lncreasing inrest receilled br th liile gotwtg" 2007-02-22 14:25:23 [6298] Scanset "ocrad-decolorize" found word "breaking" with fuzz of 0.2500 line: " fhe lncreasing inrest receilled br th liile gotwtg" 2007-02-22 14:25:23 [6298] Scanset "gocr" found word "erectile" with fuzz of 0.2500 line: " e increasln ingrest receiled hr j lirg ne t u t " 2007-02-22 14:25:23 [6298] Scanset "gocr" found word "target" with fuzz of 0.0000 line: "target " 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "erectile" with fuzz of 0.2500 line: "eincreaslningrestreceiledhrjlirgnetut" 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "buy" with fuzz of 0.0000 line: "momemnsborqbuy" 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "target" with fuzz of 0.0000 line: "target" 2007-02-22 14:25:25 [6298] Scanset "gocr-180" found word "target" with fuzz of 0.0000 line: "target " 2007-02-22 14:25:26 [6298] Scanset "gocr-180" found word "buy" with fuzz of 0.0000 line: "momemnsborqbuy" 2007-02-22 14:25:26 [6298] Scanset "gocr-180" found word "target" with fuzz of 0.0000 line: "target" 2007-02-22 14:25:26 [6298] Message is spam, score = 9.500 2007-02-22 14:25:26 [6298] Adding Hash to "/var/mail/spamassassin/local/FuzzyOcr.db" with score "9.500" 2007-02-22 14:25:26 [6298] Words found: "erectile" in 1 lines "target" in 1 lines "erectile" in 1 lines "buy" in 1 lines "target" in 1 lines (7.5 word occurrences found) in the rec'd message's header, i see only, ... X-Spam-Report: * 0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS * 0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs some mails * 0.0 DKIM_POLICY_SIGNSOME Domain Keys Identified Mail: policy says domain * signs some mails * 0.0 DK_SIGNED Domain Keys: message has a signature * 0.0 DKIM_SIGNED Domain Keys Identified Mail: message has a signature * 1.0 DC_IMG_TEXT_RATIO BODY: Low body to pixel area ratio * 0.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.0002] * 2.2 TVD_SPACE_RATIO BODY: TVD_SPACE_RATIO * 1.2 SARE_GIF_ATTACH FULL: Email has a inline gif * 9.5 FUZZY_OCR BODY: ... *again*, with no header 'detail' for the FUZZY_OCR BODY header :-/ since i'm seeing the same 'missing header' biz on both, (1) rec'd email proc'd via spamd running on my mailserver (2) test file submitted to spamassassin via cmd line, and, differing behavior for sa v318 & v320, with the same version of FuzzyOCR, i suspect this is a SA-related issue.
Since you're complaining about a third party plugin, we have no way to debug the issue. I'd talk to the author and have them do some testing. If there's a SA bug that can be demonstrated, the ticket can be reopened. :)
the issue occurs in the presence of sa v320, but not sa v318. looks like an SA issue to me. as far as "complaining?" you might want to reconsider *asking* people to provide feedback, file bugs, ask on the list, etc if you consider this "complaining. you had an opportunity to participate/comment in irc channel and on the list ... you chose not to. *now* you complain that i'm complaining? fix it yourself, if you care.
.
so this is still an issue with released 3.2.0, I hear. If someone from FuzzyOCR could post the code they're using to generate multi-line reports, we may be able to help...
This is a 3rd party plugin and per http://fuzzyocr.own-hero.net/, it is no longer maintained: This project is UNMAINTAINED as of 2009-06-01. Use it at your own risk. If you want to fork this project, drop me a note (decoder[at]own-hero.net).