SA Bugzilla – Bug 5763
[review] Problem with invisible context extraction - whitespace chars dropped
Last modified: 2011-05-23 17:51:15 UTC
There seems to be a problem with text extracted into the "invisible" context -- whitespace characters are dropped. During the HTML parsing time (HTML.pm), "whitespace" is always treated as visible. On the other hand, when parsing the HTML text, in display_text() API, trailing whitespace is trimmed when current element is whitespace; leading whitespace is trimmed when previous element is whitespace. So when invisible text is extracted, no whitespace (because either trailing whitespace or leading whitespace is trimmed).
Created attachment 4226 [details] Test msg with invisible text
Created attachment 4227 [details] the invisible text extracted Note that how in the invisible context some lines are juxtaposed directly together without any whitespace between them, even though there is whitespace (e.g. newline chars and in some cases spaces) in the original message.
Created attachment 4228 [details] Proposed change to SpamAssassin HTML.pm In display_text() API, do not trim trailing whitespace of last element if last element is invisible text; do not trim leading whitespace of current element if current element is invisible.
thanks for the report and fix! at first glance, it looks good; aiming at 3.2.5.
checked into trunk: : jm 419...; svn commit -m "bug 5763: whitespace characters are dropped from 'invisible' text sections. fix, thanks to Yanyan Yang" lib/Mail/SpamAssassin/HTML.pm Sending lib/Mail/SpamAssassin/HTML.pm Transmitting file data . Committed revision 610204. +1 for application to 3.2.5.
+1
moving to 3.2.6 so that we can release a 3.2.5
This got committed to trunk when 3.2.5 was current, so it should be in 3.3, and should be closed, right? 3 years since last update.
Close. Yup, 3.3 was branched from trunk January 21 2010, two years after this was committed to trunk, so this should be closed. http://svn.apache.org/viewvc/spamassassin/branches/3.3/?view=log
closing, fixed in 3.3.0