Bug 2101 - HTML parser is run up to 4 times per message
Summary: HTML parser is run up to 4 times per message
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other All
: P5 normal
Target Milestone: 2.70
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 1527
Blocks:
  Show dependency tree
 
Reported: 2003-06-19 19:11 UTC by Daniel Quinlan
Modified: 2004-01-24 11:48 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Quinlan 2003-06-19 19:11:24 UTC
This is not a good thing since it probably affects some of the HTML
tests (changing results depending on how many times it's called and when
and with which options turned on or off).

Worse, it slows stuff down.

1. get_content_preview() calls get_decoded_stripped_body_text_array() which
   renders the entire HTML document again.  This is probably the worst of the
   lot since it doesn't even need the entire HTML document.  We could just
   save off a relatively small hunk of text the first time we run
   get_decoded_stripped_body_text_array().

2. Bayes also calls get_decoded_stripped_body_text_array()

3. PerMsgStatus also has another repeat of the call.  I think this one has
   been here longer. I don't remember HTML::Parser running more than
   once per message back when I put it in originally, but that was a while
   ago, so maybe I've forgotten.
Comment 1 Antony Mawer 2003-06-19 19:52:48 UTC
Subject: Re: [SAdev]  New: HTML parser is run up to 4 times per message 


> 1. get_content_preview() calls get_decoded_stripped_body_text_array() which
>    renders the entire HTML document again.  This is probably the worst of the
>    lot since it doesn't even need the entire HTML document.  We could just
>    save off a relatively small hunk of text the first time we run
>    get_decoded_stripped_body_text_array().

true.

> 2. Bayes also calls get_decoded_stripped_body_text_array()

BTW are the results of this not cached?  They used to be, and they should
be.

--j.

Comment 2 Daniel Quinlan 2003-06-19 19:55:47 UTC
Subject: Re:  HTML parser is run up to 4 times per message

> BTW are the results of this not cached?  They used to be, and they
> should be.

Well, there's only one return statement and it's at the end of the
function.

Comment 3 Duncan Findlay 2003-06-19 20:06:01 UTC
Subject: Re: [SAdev]  HTML parser is run up to 4 times per message

> Well, there's only one return statement and it's at the end of the
> function.

Alright... so we just need to cache it :-)
Comment 4 Theo Van Dinter 2004-01-24 20:48:16 UTC
this is all fixed in 2.70, html is only rendered once, and content preview gets a cached version of 
the rendered output.