Issue 80657

Summary: Importing apostrophes from HTML fails
Product: Writer Reporter: ronnystandtke <ronny.standtke>
Component: open-importAssignee: AOO issues mailing list <issues>
Status: RESOLVED FIXED QA Contact:
Severity: Minor    
Priority: P3 CC: damjan, elish, issues, mseidel
Version: OOo 2.2.1   
Target Milestone: 4.1.14   
Hardware: All   
OS: All   
Issue Type: ENHANCEMENT Latest Confirmation in: 4.2.0-dev
Developer Difficulty: Simple
Attachments:
Description Flags
HTML file with apostrophes none

Description ronnystandtke 2007-08-13 18:29:51 UTC
If a HTML file contains apostrophes (&apos;) Writer imports them verbatim 
without mapping them to real apostrophes.
I will attach a test file...
If you open this file in a browser of your choice you will see the 
string: 'test'
If you open this file in OOo Writer you will see this string: &apos;test&apos;
Comment 1 ronnystandtke 2007-08-13 18:30:34 UTC
Created attachment 47515 [details]
HTML file with apostrophes
Comment 2 michael.ruess 2007-08-14 08:11:49 UTC
Reassigned to ES.
Comment 3 eric.savary 2007-08-14 08:39:34 UTC
"&apos" may be supported by many browsers it is not a valide HTML entity.

See: http://www.w3.org/TR/html4/sgml/entities.html

Please use a plain text ' or &#39; instead.

*** This issue has been marked as a duplicate of 9457 ***
Comment 4 eric.savary 2007-08-14 08:39:46 UTC
closed
Comment 5 ronnystandtke 2007-08-14 09:26:33 UTC
The 'apos' is a standard html/xhtml entity, see 
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
Comment 6 eric.savary 2007-08-14 10:30:31 UTC
XHTML yes, HTML (4.0) not.

Requalifying as enhancement.

Reassigned
Comment 7 ronnystandtke 2010-05-17 13:28:15 UTC
Now using v3.2, unfortunately, I still have to manually sed many html files
before opening them with OOo because of this issue...
Comment 8 Edwin Sharp 2014-04-06 12:46:57 UTC
As given in description
AOO410m15(Build:9761)  -  Rev. 1583666
2014-04-01 13:50 - Linux x86_64
Debian
Comment 9 damjan 2023-01-03 07:15:09 UTC
Still an issue in the latest Git.

We just need 2 lines of code added to fix this bug. The patch below allows reading "&apos;" from HTML, but doesn't write it to HTML.

But should we unconditionally support the "&apos;" entity, or only when the HTML version is >= 5? If web browsers were supporting it in 2007, while the first draft of HTML 5 was in 2008 and HTML 5 only became a stable recommendation on October 2014, then we may have to support it in any HTML version for better compatibility.


diff --git a/main/svtools/inc/svtools/htmlkywd.hxx b/main/svtools/inc/svtools/htmlkywd.hxx
index ff11057f1a..5ec2e37c79 100644
--- a/main/svtools/inc/svtools/htmlkywd.hxx
+++ b/main/svtools/inc/svtools/htmlkywd.hxx
@@ -182,6 +182,7 @@
 #define OOO_STRING_SVTOOLS_HTML_C_lt "lt"
 #define OOO_STRING_SVTOOLS_HTML_C_gt "gt"
 #define OOO_STRING_SVTOOLS_HTML_C_amp "amp"
+#define OOO_STRING_SVTOOLS_HTML_C_apos "apos"
 #define OOO_STRING_SVTOOLS_HTML_C_quot "quot"
 #define OOO_STRING_SVTOOLS_HTML_C_Aacute "Aacute"
 #define OOO_STRING_SVTOOLS_HTML_C_Agrave "Agrave"
diff --git a/main/svtools/source/svhtml/htmlkywd.cxx b/main/svtools/source/svhtml/htmlkywd.cxx
index 24b3160009..7554343ec6 100644
--- a/main/svtools/source/svhtml/htmlkywd.cxx
+++ b/main/svtools/source/svhtml/htmlkywd.cxx
@@ -278,6 +278,7 @@ static HTML_CharEntry __FAR_DATA aHTMLCharNameTab[] = {
        {{OOO_STRING_SVTOOLS_HTML_C_lt},                         60},
        {{OOO_STRING_SVTOOLS_HTML_C_gt},                         62},
        {{OOO_STRING_SVTOOLS_HTML_C_amp},                38},
+       {{OOO_STRING_SVTOOLS_HTML_C_apos},               39},
        {{OOO_STRING_SVTOOLS_HTML_C_quot},               34},
 
        {{OOO_STRING_SVTOOLS_HTML_C_Agrave},            192},
Comment 10 damjan 2023-01-03 07:32:34 UTC
Firefox imports "&apos;" as "'" even when the HTML file has version 2.0 set:

<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">

so I am going to commit this.

Fixed by commit 3304210c5c53f441cdb2c462fbbf6d8351380b01.
Resolving FIXED.

Thank you for your bug report and sample file!