SA Bugzilla – Bug 8268
trim whitespace from anchor text in uri_detail_list
Last modified: 2024-07-11 21:09:52 UTC
It would be convenient if leading & trailing whitespace was removed from anchor_text in uri_detail_list. For example, HTML such as: <a href="#"> Download File </a> will end up with anchor_text containing "\n Download File\n". This leads to unexpected results if you have a rule such as: uri-detail RULENAME text =~ /^download file$/i The workaround is to not use regex anchors, or explicitly allow whitespace in the regex: uri-detail RULENAME text =~ /^\s*download file/i However, I think this is non-intuitive and has tripped me up several times. I don't think there is any harm in removing the whitespace since the rules of HTML whitespace dictate that the HTML above should parse identically to this HTML: <a href="#">Download File</a> Please see the attached patch and provide feedback.
Created attachment 5959 [details] patch
I think it might be good to collapse whitespace between words as well. Again, following the rules of HTML whitespace in case someone does this: <a href="#"> Download File </a>