SA Bugzilla – Bug 8218
HTML URIs with linebreaks not parsed with Content-Transfer-Encoding: quoted-printable
Last modified: 2024-03-15 07:30:18 UTC
It appears PerMsgStatus.pm will not parse URI's that contain a linebreak when Content-Transfer-Encoding: quoted-printable is enabled. For example this blob: <A style=3D"FONT-SIZE: 16px; TEXT-DECORATION: none; FONT-FAMILY: Helvetica,= sans-serif; BACKGROUND: rgb(16,163,127) 0% 50%; FONT-WEIGHT: 400; COLOR: wh= ite; PADDING-BOTTOM: 11px; PADDING-TOP: 12px; PADDING-LEFT: 20px; MARGIN: 0= px; LINE-HEIGHT: 24px; PADDING-RIGHT: 20px" href="http://hashbltest.s= urbl.org/example_uri">Verify email address</A> Will return the following entry from get_uri_detail_list: $VAR1 = { 'types' => { 'schemeless' => 1, 'parsed' => 1, 'unlinked' => 1 }, 'cleaned' => [ 'http://urbl.org/example_uri' ], 'hosts' => { 'urbl.org' => 'urbl.org' }, 'domains' => { 'urbl.org' => 1 } }; Rather than the entire URI (http://hashbltest.surbl.org/example_uri) as we would expect
Created attachment 5942 [details] Test email Can you attach a full test email that demonstrates the problem? I made your snippet into an email with quoted-printable and ran ./spamassassin -t -D uri,message < uriwraptest.eml Did I miss something? I am testing using current trunk. Here is the debug output showing the correct URL being parsed: Mar 15 20:23:08.229 [99779] dbg: message: _decode_header date: Thu, 2 May 2002 00:02:49 +1200 Mar 15 20:23:08.229 [99779] dbg: message: _decode_header subject: foo Mar 15 20:23:08.229 [99779] dbg: message: _decode_header to: <bar@example.org> Mar 15 20:23:08.229 [99779] dbg: message: _decode_header from: <baz@example.com> Mar 15 20:23:08.229 [99779] dbg: message: _decode_header message-id: <INTM-6516584-3669405-2002.08.01-16.21.51--f@example.com> Mar 15 20:23:08.229 [99779] dbg: message: _decode_header mime-version: 1.0 Mar 15 20:23:08.229 [99779] dbg: message: _decode_header content-type: text/html; charset=US-ASCII Mar 15 20:23:08.229 [99779] dbg: message: _decode_header content-transfer-encoding: quoted-printable Mar 15 20:23:08.230 [99779] dbg: message: main message type: text/html Mar 15 20:23:08.235 [99779] dbg: message: ---- MIME PARSER START ---- Mar 15 20:23:08.235 [99779] dbg: message: parsing normal part Mar 15 20:23:08.235 [99779] dbg: message: storing a body to memory Mar 15 20:23:08.235 [99779] dbg: message: ---- MIME PARSER END ---- Mar 15 20:23:08.235 [99779] dbg: message: decoding quoted-printable Mar 15 20:23:08.235 [99779] dbg: message: contains only US-ASCII characters, declared US-ASCII, not decoding Mar 15 20:23:08.235 [99779] dbg: message: HTML::Parser utf8_mode off (default, assumed Unicode characters) Mar 15 20:23:08.236 [99779] dbg: message: spaces (octets) in HTML: 3 out of 21, chars!? Mar 15 20:23:08.242 [99779] dbg: uri: canonicalizing html uri: http://hashbltest.surbl.org/example_uri Mar 15 20:23:08.242 [99779] dbg: uri: cleaned uri: http://hashbltest.surbl.org/example_uri Mar 15 20:23:08.242 [99779] dbg: uri: added host: hashbltest.surbl.org domain: surbl.org