Issue 109005 - Automatic URL recognition will not work for URL containing underscore
Summary: Automatic URL recognition will not work for URL containing underscore
Status: ACCEPTED
Alias: None
Product: ui
Classification: Code
Component: ui (show other issues)
Version: OOo 2.0.4
Hardware: PC All
: P3 Trivial (vote)
Target Milestone: 4.x
Assignee: AOO issues mailing list
QA Contact:
URL: http://lucinda_ball.blogspot.com/2009...
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2010-02-06 04:13 UTC by mr_lauren
Modified: 2017-05-20 11:33 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description mr_lauren 2010-02-06 04:13:20 UTC
The URL noted above will not work when entered into a document in OO writer.  The 
problem is the underscore (_) between "lucinda" and "ball".  When this underscore is 
removed, then OO Writer will form a link.  I transferred the document containing this 
link to Word97 in WindowsME, and it worked fine.  I emailed the document to a friend 
with Open Office 3.0 running on WindowsXP, and it WOULD NOT work.  Therefore, I feel 
that Open Office Writer has a bug that won't allow an underscore (_) in the name 
portion of a URL before the .com.  The underscores in the portion of the address 
after the .com are not a problem.

Note that I created a hyperlink as part of my effort, but when clicked on, it would 
only link to OO as a browser, not the system default browser, and I could find no way 
to change this.  After going to OO as a browser, it would not return to OO writer.
Comment 1 Rainer Bielefeld 2010-02-06 09:42:07 UTC
Reproducible with "Ooo-Dev 3.3.0 multilingual version English UI WIN XP:
[DEV300m70 (Build 9478)]" and "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]"!

You can simply try it with a self cerated URL. Steps to reproduce:
1. open new WRITER document
2. activate automatic URL recognition
3. type "http://www.a-b.org"
4. press space bar
   expected: Blank inserted, automatic URL recognition will create hyperlink
   actual: as expected
10. Press <Enter>
11. type "http://www.a_b.org"
    expected: Blank inserted, automatic URL recognition will create hyperlink
    actual: Blank inserted, but NO hyperlink created

Same with CALC, DRAW

I did some further Tests in WRITER (3.1.1):
"xy@a-b.org" will be recognized and a mailto link will be created
"xy@a_b.org" will be recognized and a mailto link will be created, but 
             mailto link is "xy@a" and link will be created for text part
             "xy@a_"
"x_y@a-b.org" will be recognized and a mailto link will be created

@mr_lauren:
I don't understand your "I created a hyperlink as part of my effort ..."
problem, Please file a separeate issue, if you find out that that problem still
exists in a current stable version (3.1.1 or later).
Comment 2 michael.ruess 2010-02-08 09:44:17 UTC
Can also confirm this on Win. Something like www.aaa_bbb.org will not be
autoformatted as hyperlink.
Comment 3 Oliver Specht 2010-02-08 12:36:07 UTC
The autocorrection calls 
URIHelper::FindFirstURLInText(...)
which calls via local scanDomain(...)
INetURLObject::scanDomain(...)

This stops detection at '_' because it's not INetMIME::isAlphanumeric()

->sb: Would it make sense to add the underscore as exception in scanDomain() or
should it be added to INetMIME::isAlphanumeric()?
Comment 4 Stephan Bergmann 2010-02-08 14:19:14 UTC
Interesting.  Even the most recent URI syntax RFC 3986 suggests that host names
in http URLs should be formed from the venerable [A-Za-z0-9.-] set
(<http://tools.ietf.org/html/rfc3986#section-3.2.2>, in turn referencing RFCs
1034 and 1123), so that "lucinda_ball.blogspot.com" would be ruled out. 
However, resolving that name via dig, or entering it into a browser does indeed
work, so it appears either hostname rules have been relaxed (and I am not aware
of that) or that hostname is indeed invalid but most software is lax enough to
make it appear as if the hostname were good.

However, given that somewhat similar issues 53184 and 80134 have been fixed in
nearby code (to allow underscores in UNC server names in file URLs), it might be
a good idea to clean the code up somewhat (maybe generally allowing the relaxed
RFC 3986 reg-name syntax, independent of actual schema).  Though, for automatic
URL recognition, we have to be careful not to become too relaxed and produce too
many false positives.

Anyway, to work around this issue you can always manually mark a URL in Writer
text as such via "Insert - Hyperlink."
Comment 5 Marcus 2017-05-20 11:33:52 UTC
Reset assigne to the default "issues@openoffice.apache.org".