Bug 8071 - get_uri_detail_list & missing links
Summary: get_uri_detail_list & missing links
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.4.6
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
: 8024 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-11-09 12:55 UTC by Pascal
Modified: 2023-11-24 07:53 UTC (History)
7 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
impacted message message/rfc822 None Pascal [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Pascal 2022-11-09 12:55:48 UTC
there are links which are not identified when the html code contains tags like that:

<!--[if mso]>
<!--[if !mso]><!-->
<!--[if gte mso 9]>
...
<!--<![endif]-->
<![endif]-->
Comment 1 Pascal 2022-11-09 13:37:39 UTC
my bugfix:

before using get_uri_detail_list, I remove these tags:

s/(?:<\!--)?<\!\[endif\]-->// for @lines;
s/<\!--\[if [\(\)\|\!\w\s]+\]>(?:<\!-->)?// for @lines;
Comment 2 John Hardin 2022-11-10 02:02:12 UTC
Those are HTML comments, not links or tags, and they have no place in the uri list.

If you want to see that content, use a rawbody rule.
Comment 3 Pascal 2022-11-10 02:08:27 UTC
I mean that when there is an html code (containing links) inside comment tags, the function does not identify the links.
Comment 4 Henrik Krohns 2022-11-19 09:58:02 UTC
(In reply to John Hardin from comment #2)
> Those are HTML comments, not links or tags, and they have no place in the
> uri list.

I think he's referring to this:

https://email2go.io/blog/outlook-conditional-code

Outlook will display anything inside those.
Comment 5 Pascal 2022-11-19 10:25:01 UTC
exactly
it will really be displayed on some email clients

It is probably the same than here:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8024
Comment 6 Kevin A. McGrail 2022-11-28 15:29:12 UTC
I think we should look at this for 4.0.1 and consider if this is a dupe of bug 8024 as well
Comment 7 Henrik Krohns 2022-11-28 16:17:17 UTC
*** Bug 8024 has been marked as a duplicate of this bug. ***
Comment 8 Henrik Krohns 2022-11-28 16:20:35 UTC
Agree for 4.0.1, needs some thought and proper testing to not break HTML stuff. Maybe someone can also look if these are good spam indicators for rules.
Comment 9 Pascal 2023-02-06 22:45:31 UTC
Created attachment 5878 [details]
impacted message

Here is a message for you to test.
It can't identify the jpg images using get_uri_detail_list function
Comment 10 Giovanni Bechis 2023-10-09 10:32:08 UTC
.jpg linked images on the provided spample are detected for me on trunk.
Comment 11 Benny Pedersen 2023-10-09 12:52:51 UTC
its valid html5, with 418 warnings and 0 errors :)

if spamassassin had htmltidy, this would be pease of cake to get this info as how valid spam or not spam is, imho we should not take the comment fails as spammies, its a microsoft bug that should be fixed
Comment 12 Sidney Markowitz 2023-11-24 07:53:01 UTC
Unless someone wants to argue with this being closed as WORKSFORME and reopen it, it should not have a target milestone, so I'm changing that from 4.0.1 as part of my 4.0.1 release cleanup.