SA Bugzilla – Bug 8206
uri_list_canonicalize adds more domains then it should
Last modified: 2024-01-03 16:21:52 UTC
Created attachment 5930 [details] Sample email In the attached sample the tag <img src="undefined/favicon.ico"> is wrongly translated in an http://undefined.com uri.
Created attachment 5931 [details] fix for the issue
I don't have any specific examples right at hand, but I've posted on the users list about essentially this issue in April last year with another specific case. See https://lists.apache.org/thread/gf3kyq2y3j1v1lj37g5tpngmk82wgmcz. I don't recall if any patches were committed as a result of that thread. Looking at your patch, I think this is too narrow (even if only because it omits .png, .webp, and who knows what other image types some sender might use) and far too late in the process to fix the root cause. There are a long list of other HTML elements that get filed in the "URI" bin, that can trigger this problem. I think to properly solve it, potential URIs from HTML elements need to be more tightly preprocessed (and discarded) ahead of the rest of the canonicalization process. I have docucomments in my local configuration with this: dbg: uri: canonicalizing html uri: none dbg: uri: cleaned uri: http://www.none.com dbg: uri: added host: www.none.com domain: none.com dbg: uri: cleaned uri: none dbg: uri: cleaned uri: http://none (likely from that particular case I posted about) and: dbg: uri: canonicalizing html uri: assets/css/styles.css dbg: uri: cleaned uri: http://www.assets.com/css/styles.css along with matching uridnsbl_skip_domain entries for none.com assets.com (I also have "none" listed, but that doesn't seem to work to suppress the entry) and background.com www.com which latter two I don't have debug detail recorded but which both originated in essentially the same source - HTML/CSS elements (not content/text!) that specify a relative URI in some context or form. None of these were in text that a mail program *would* often turn into a clickable link.