Bug 5349 - scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test output
Summary: scanning w/ v320-trunk shows diff/missing header displays in FuzzyOCR test ou...
Status: RESOLVED INVALID
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamassassin (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other other
: P5 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-22 16:22 UTC by snowcrash+apache
Modified: 2012-01-17 15:20 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description snowcrash+apache 2007-02-22 16:22:18 UTC
i can't get any discussion abt _whether_ this *is* a bug, so i'll simply file it
as one, and decide-by-discussion here ...

testing,

 spamassassin --version
   SpamAssassin version 3.2.0-pre1-r499012
     running on Perl version 5.8.8

& using with,

 FuzzyOCR 3.5.1

plugin.

two test cases,

  (1) spamasssassin @ cmd_line
  (2) sent/recd email

show similar behavior of missing/truncated fuzzyocr headers, only in the case of
v320/trunk; v318/trunk is ok, and does not display this behavior.


case (1):

test with,

 spamassassin -D -t -x < /usr/ports/FuzzyOcr/samples/ocr-animated.eml

in 'verbose' fuzzyocr.log,

 ...
 2007-02-22 14:07:35 [6252] Found: 1 images
 2007-02-22 14:07:35 [6252] Found GIF header name="CIMG0980.gif"
 2007-02-22 14:07:36 [6252] Image is interlaced or animated...
 2007-02-22 14:07:36 [6252] File contains <7> images, deanimating...
 2007-02-22 14:07:37 [6252] Calculating image hash for:
/tmp/.spamassassin6252Qdn9h3tmp/CIMG0980.gif.pnm
 2007-02-22 14:07:37 [6252] Updating Exact info File:'CIMG0980.gif'
Type:'image/gif'
 2007-02-22 14:07:37 [6252] Found Score <15.500> for Exact Image Hash
 2007-02-22 14:07:37 [6252] Matched [1] time(s). Prev match:  15 min.
40 sec. ago
 2007-02-22 14:07:37 [6252] Message is SPAM. Words found:
             "investor" in 1 lines
             "price" in 2 lines
             "company" in 1 lines
             "alert" in 1 lines
             "valium" in 1 lines
             "trade" in 1 lines
             "banking" in 1 lines
             "news" in 1 lines
             (13.5 word occurrences found)

 %

but, at console, i _only_ see,

 ...
 Content analysis details:   (43.7 points, 5.0 required)

  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.1 RDNS_NONE              Delivered to trusted network by a host
with no rDNS
  4.5 HELO_LOCALHOST         HELO_LOCALHOST
  0.5 FH_MSGID_01C67         Special MSGID
  2.3 CTYPE_001C_A           CTYPE_001C_A
  1.7 OUTLOOK_3416           Claims to be sent by an unusual build of
Outlook (3416)
  0.0 DK_POLICY_SIGNSOME     Domain Keys: policy says domain signs some mails
  3.3 DATE_IN_FUTURE_12_24   Date: is 12 to 24 hours after Received: date
  5.0 BOTNET                 Relay might be a spambot or virusbot
               [botnet0.7,ip=58.186.156.15,nordns]
  0.0 DKIM_POLICY_SIGNSOME   Domain Keys Identified Mail: policy says domain
               signs some mails
  0.0 BOTNET_NORDNS          Relay's IP address has no PTR record
               [botnet_nordns,ip=58.186.156.15]
  0.0 HTML_MESSAGE           BODY: HTML included in message
  1.9 TVD_VIS_HIDDEN         RAW: TVD_VIS_HIDDEN
  1.8 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
               above 50%
               [cf: 100]
  0.5 RAZOR2_CHECK           Listed in Razor2 (http://razor.sf.net/)
  1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level
               above 50%
               [cf: 100]
  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
               [cf: 100]
  1.4 DCC_CHECK              Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
  0.0 DIGEST_MULTIPLE        Message hits more than one network digest check
  3.6 XMAILER_MIMEOLE_OL_465CD XMAILER_MIMEOLE_OL_465CD
  1.9 HDR_ORDER_FTSDMCXX_001C Header order similar to spam (FTSDMCXX/MID
               variant)
  0.7 SHORT_HELO_AND_INLINE_IMAGE Short HELO string, with inline image
   11 FUZZY_OCR              BODY:
 %


NOTE, there's NO detail to the FUZZY_OCR header output.


case (2):

test, with a 'sent/recd' email, rather than just a file test @ cmd_line

similarly, with this image,

       http://img181.imageshack.us/img181/2156/spamsc2.gif

attached to an otherwise blank email, on receipt, i see in "FuzzyOCR.log",

 2007-02-22 14:22:57 [27803] Processing Message with ID
"<1172182945.27063@spamassassin_spamd_init>"
(ignore@compiling.spamassassin.taint.org -> <no receipients>)
 2007-02-22 14:25:10 [6298] Processing Message with ID
"<45EC358A.4080104@gmail.com>" (SnowCrash
<schneecrash+spamassassin@gmail.com> -> "SnowCrash"
<snowcrash@mydomain.com>)
 2007-02-22 14:25:10 [6298] GIF: [320x512] spam.gif (10195)
 2007-02-22 14:25:10 [6298] Found: 1 images
 2007-02-22 14:25:10 [6298] Found GIF header name="spam.gif"
 2007-02-22 14:25:11 [6298] Image is single non-interlaced...
 2007-02-22 14:25:12 [6298] Calculating image hash for:
/tmp/.spamassassin6298Zhf5nItmp/spam.gif.pnm
 2007-02-22 14:25:12 [6298] Scanset Order: ocrad(0) ocrad-invert(0)
ocrad-decolorize-invert(0) ocrad-decolorize(0) gocr(0) gocr-180(0)
 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "target" with
fuzz of 0.0000
     line: "target s"
 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "investor"
with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:14 [6298] Scanset "ocrad" found word "breaking"
with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:22 [6298] Scanset "ocrad-decolorize" found word
"target" with fuzz of 0.0000
     line: "target s"
 2007-02-22 14:25:22 [6298] Scanset "ocrad-decolorize" found word
"investor" with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:23 [6298] Scanset "ocrad-decolorize" found word
"breaking" with fuzz of 0.2500
     line: " fhe lncreasing inrest receilled br th liile gotwtg"
 2007-02-22 14:25:23 [6298] Scanset "gocr" found word "erectile" with
fuzz of 0.2500
     line: " e increasln ingrest receiled hr j lirg ne  t u t  "
 2007-02-22 14:25:23 [6298] Scanset "gocr" found word "target" with
fuzz of 0.0000
     line: "target "
 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "erectile" with
fuzz of 0.2500
     line: "eincreaslningrestreceiledhrjlirgnetut"
 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "buy" with fuzz of 0.0000
     line: "momemnsborqbuy"
 2007-02-22 14:25:24 [6298] Scanset "gocr" found word "target" with
fuzz of 0.0000
     line: "target"
 2007-02-22 14:25:25 [6298] Scanset "gocr-180" found word "target"
with fuzz of 0.0000
     line: "target "
 2007-02-22 14:25:26 [6298] Scanset "gocr-180" found word "buy" with
fuzz of 0.0000
     line: "momemnsborqbuy"
 2007-02-22 14:25:26 [6298] Scanset "gocr-180" found word "target"
with fuzz of 0.0000
     line: "target"
 2007-02-22 14:25:26 [6298] Message is spam, score = 9.500
 2007-02-22 14:25:26 [6298] Adding Hash to
"/var/mail/spamassassin/local/FuzzyOcr.db" with score "9.500"
 2007-02-22 14:25:26 [6298] Words found:
             "erectile" in 1 lines
             "target" in 1 lines
             "erectile" in 1 lines
             "buy" in 1 lines
             "target" in 1 lines
             (7.5 word occurrences found)


in the rec'd message's header, i see only,

 ...
 X-Spam-Report:
   *  0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
   *  0.0 DK_POLICY_SIGNSOME Domain Keys: policy says domain signs some mails
   *  0.0 DKIM_POLICY_SIGNSOME Domain Keys Identified Mail: policy says domain
   *       signs some mails
   *  0.0 DK_SIGNED Domain Keys: message has a signature
   *  0.0 DKIM_SIGNED Domain Keys Identified Mail: message has a signature
   *  1.0 DC_IMG_TEXT_RATIO BODY: Low body to pixel area ratio
   *  0.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
   *      [score: 0.0002]
   *  2.2 TVD_SPACE_RATIO BODY: TVD_SPACE_RATIO
   *  1.2 SARE_GIF_ATTACH FULL: Email has a inline gif
   *  9.5 FUZZY_OCR BODY:
 ...


*again*, with no header 'detail' for the FUZZY_OCR BODY header :-/


since i'm seeing the same 'missing header' biz on both,

  (1) rec'd email proc'd via spamd running on my mailserver
  (2) test file submitted to spamassassin via cmd line,

and, differing behavior for sa v318 & v320, with the same version of
FuzzyOCR, i suspect this is a SA-related issue.
Comment 1 Theo Van Dinter 2007-02-22 16:28:38 UTC
Since you're complaining about a third party plugin, we have no way to debug the
issue.  I'd talk to the author and have them do some testing.  If there's a SA
bug that can be demonstrated, the ticket can be reopened. :)
Comment 2 snowcrash+apache 2007-02-22 17:07:31 UTC
the issue occurs in the presence of sa v320, but not sa v318.  looks like an SA
issue to me.

as far as "complaining?"

you might want to reconsider *asking* people to provide feedback, file bugs, ask
on the list, etc if you consider this "complaining.  you had an opportunity to
participate/comment in irc channel and on the list ... you chose not to.

*now* you complain that i'm complaining?

fix it yourself, if you care.
Comment 3 snowcrash+apache 2007-02-22 17:17:55 UTC
.
Comment 4 snowcrash+apache 2007-02-22 17:18:26 UTC
.
Comment 5 Justin Mason 2007-05-06 05:34:14 UTC
so this is still an issue with released 3.2.0, I hear.  If someone from FuzzyOCR
could post the code they're using to generate multi-line reports, we may be able
to help...
Comment 6 Kevin A. McGrail 2012-01-17 15:20:54 UTC
This is a 3rd party plugin and per http://fuzzyocr.own-hero.net/, it is no longer maintained:

This project is UNMAINTAINED as of 2009-06-01. Use it at your own risk. If you want to fork this project, drop me a note (decoder[at]own-hero.net).