Apache OpenOffice (AOO) Bugzilla – Issue 109005
Automatic URL recognition will not work for URL containing underscore
Last modified: 2017-05-20 11:33:52 UTC
The URL noted above will not work when entered into a document in OO writer. The problem is the underscore (_) between "lucinda" and "ball". When this underscore is removed, then OO Writer will form a link. I transferred the document containing this link to Word97 in WindowsME, and it worked fine. I emailed the document to a friend with Open Office 3.0 running on WindowsXP, and it WOULD NOT work. Therefore, I feel that Open Office Writer has a bug that won't allow an underscore (_) in the name portion of a URL before the .com. The underscores in the portion of the address after the .com are not a problem. Note that I created a hyperlink as part of my effort, but when clicked on, it would only link to OO as a browser, not the system default browser, and I could find no way to change this. After going to OO as a browser, it would not return to OO writer.
Reproducible with "Ooo-Dev 3.3.0 multilingual version English UI WIN XP: [DEV300m70 (Build 9478)]" and "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]"! You can simply try it with a self cerated URL. Steps to reproduce: 1. open new WRITER document 2. activate automatic URL recognition 3. type "http://www.a-b.org" 4. press space bar expected: Blank inserted, automatic URL recognition will create hyperlink actual: as expected 10. Press <Enter> 11. type "http://www.a_b.org" expected: Blank inserted, automatic URL recognition will create hyperlink actual: Blank inserted, but NO hyperlink created Same with CALC, DRAW I did some further Tests in WRITER (3.1.1): "xy@a-b.org" will be recognized and a mailto link will be created "xy@a_b.org" will be recognized and a mailto link will be created, but mailto link is "xy@a" and link will be created for text part "xy@a_" "x_y@a-b.org" will be recognized and a mailto link will be created @mr_lauren: I don't understand your "I created a hyperlink as part of my effort ..." problem, Please file a separeate issue, if you find out that that problem still exists in a current stable version (3.1.1 or later).
Can also confirm this on Win. Something like www.aaa_bbb.org will not be autoformatted as hyperlink.
The autocorrection calls URIHelper::FindFirstURLInText(...) which calls via local scanDomain(...) INetURLObject::scanDomain(...) This stops detection at '_' because it's not INetMIME::isAlphanumeric() ->sb: Would it make sense to add the underscore as exception in scanDomain() or should it be added to INetMIME::isAlphanumeric()?
Interesting. Even the most recent URI syntax RFC 3986 suggests that host names in http URLs should be formed from the venerable [A-Za-z0-9.-] set (<http://tools.ietf.org/html/rfc3986#section-3.2.2>, in turn referencing RFCs 1034 and 1123), so that "lucinda_ball.blogspot.com" would be ruled out. However, resolving that name via dig, or entering it into a browser does indeed work, so it appears either hostname rules have been relaxed (and I am not aware of that) or that hostname is indeed invalid but most software is lax enough to make it appear as if the hostname were good. However, given that somewhat similar issues 53184 and 80134 have been fixed in nearby code (to allow underscores in UNC server names in file URLs), it might be a good idea to clean the code up somewhat (maybe generally allowing the relaxed RFC 3986 reg-name syntax, independent of actual schema). Though, for automatic URL recognition, we have to be careful not to become too relaxed and produce too many false positives. Anyway, to work around this issue you can always manually mark a URL in Writer text as such via "Insert - Hyperlink."
Reset assigne to the default "issues@openoffice.apache.org".