Issue 80657 - Importing apostrophes from HTML fails
Summary: Importing apostrophes from HTML fails
Status: RESOLVED FIXED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOo 2.2.1
Hardware: All All
: P3 Minor (vote)
Target Milestone: 4.1.14
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-13 18:29 UTC by ronnystandtke
Modified: 2023-01-04 00:18 UTC (History)
4 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: 4.2.0-dev
Developer Difficulty: Simple


Attachments
HTML file with apostrophes (229 bytes, text/html)
2007-08-13 18:30 UTC, ronnystandtke
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description ronnystandtke 2007-08-13 18:29:51 UTC
If a HTML file contains apostrophes (') Writer imports them verbatim 
without mapping them to real apostrophes.
I will attach a test file...
If you open this file in a browser of your choice you will see the 
string: 'test'
If you open this file in OOo Writer you will see this string: 'test'
Comment 1 ronnystandtke 2007-08-13 18:30:34 UTC
Created attachment 47515 [details]
HTML file with apostrophes
Comment 2 michael.ruess 2007-08-14 08:11:49 UTC
Reassigned to ES.
Comment 3 eric.savary 2007-08-14 08:39:34 UTC
"&apos" may be supported by many browsers it is not a valide HTML entity.

See: http://www.w3.org/TR/html4/sgml/entities.html

Please use a plain text ' or ' instead.

*** This issue has been marked as a duplicate of 9457 ***
Comment 4 eric.savary 2007-08-14 08:39:46 UTC
closed
Comment 5 ronnystandtke 2007-08-14 09:26:33 UTC
The 'apos' is a standard html/xhtml entity, see 
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
Comment 6 eric.savary 2007-08-14 10:30:31 UTC
XHTML yes, HTML (4.0) not.

Requalifying as enhancement.

Reassigned
Comment 7 ronnystandtke 2010-05-17 13:28:15 UTC
Now using v3.2, unfortunately, I still have to manually sed many html files
before opening them with OOo because of this issue...
Comment 8 Edwin Sharp 2014-04-06 12:46:57 UTC
As given in description
AOO410m15(Build:9761)  -  Rev. 1583666
2014-04-01 13:50 - Linux x86_64
Debian
Comment 9 damjan 2023-01-03 07:15:09 UTC
Still an issue in the latest Git.

We just need 2 lines of code added to fix this bug. The patch below allows reading "'" from HTML, but doesn't write it to HTML.

But should we unconditionally support the "'" entity, or only when the HTML version is >= 5? If web browsers were supporting it in 2007, while the first draft of HTML 5 was in 2008 and HTML 5 only became a stable recommendation on October 2014, then we may have to support it in any HTML version for better compatibility.


diff --git a/main/svtools/inc/svtools/htmlkywd.hxx b/main/svtools/inc/svtools/htmlkywd.hxx
index ff11057f1a..5ec2e37c79 100644
--- a/main/svtools/inc/svtools/htmlkywd.hxx
+++ b/main/svtools/inc/svtools/htmlkywd.hxx
@@ -182,6 +182,7 @@
 #define OOO_STRING_SVTOOLS_HTML_C_lt "lt"
 #define OOO_STRING_SVTOOLS_HTML_C_gt "gt"
 #define OOO_STRING_SVTOOLS_HTML_C_amp "amp"
+#define OOO_STRING_SVTOOLS_HTML_C_apos "apos"
 #define OOO_STRING_SVTOOLS_HTML_C_quot "quot"
 #define OOO_STRING_SVTOOLS_HTML_C_Aacute "Aacute"
 #define OOO_STRING_SVTOOLS_HTML_C_Agrave "Agrave"
diff --git a/main/svtools/source/svhtml/htmlkywd.cxx b/main/svtools/source/svhtml/htmlkywd.cxx
index 24b3160009..7554343ec6 100644
--- a/main/svtools/source/svhtml/htmlkywd.cxx
+++ b/main/svtools/source/svhtml/htmlkywd.cxx
@@ -278,6 +278,7 @@ static HTML_CharEntry __FAR_DATA aHTMLCharNameTab[] = {
        {{OOO_STRING_SVTOOLS_HTML_C_lt},                         60},
        {{OOO_STRING_SVTOOLS_HTML_C_gt},                         62},
        {{OOO_STRING_SVTOOLS_HTML_C_amp},                38},
+       {{OOO_STRING_SVTOOLS_HTML_C_apos},               39},
        {{OOO_STRING_SVTOOLS_HTML_C_quot},               34},
 
        {{OOO_STRING_SVTOOLS_HTML_C_Agrave},            192},
Comment 10 damjan 2023-01-03 07:32:34 UTC
Firefox imports "'" as "'" even when the HTML file has version 2.0 set:

<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">

so I am going to commit this.

Fixed by commit 3304210c5c53f441cdb2c462fbbf6d8351380b01.
Resolving FIXED.

Thank you for your bug report and sample file!