Issue 115301 - Calc confused by unclosed HTML tags
Summary: Calc confused by unclosed HTML tags
Status: CONFIRMED
Alias: None
Product: Calc
Classification: Application
Component: open-import (show other issues)
Version: OOo 3.2.1
Hardware: Unknown Linux, all
: P3 Trivial with 1 vote (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact: Ephraim Purcell
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-28 17:21 UTC by psychonaut
Modified: 2013-02-04 16:44 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
HTML spreadsheet Calc fails to import properly (334 bytes, text/html)
2010-10-28 17:22 UTC, psychonaut
no flags Details
Screenshot showing how Calc renders the attached spreadsheet (45.35 KB, image/png)
2010-10-28 17:23 UTC, psychonaut
no flags Details
Screenshot showing how Writer more correctly imports the HTML file (29.49 KB, image/png)
2010-10-28 17:26 UTC, psychonaut
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description psychonaut 2010-10-28 17:21:22 UTC
The Calc HTML importer is completely confused by unclosed HTML tags.  If you try
to open an HTML file in Calc which contains unclosed HTML tags, it will import
only up until the unclosed tag.  The Writer HTML importer is much more
resilient, and will gracefully ignore unclosed tags.

Reproducibility: Always

Steps to reproduce:
1. Create an HTML file containing a table with unclosed tags.  Example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <title>spreadsheet</title>
  </head>
  <body>
    <table>
      <tr><td>a1</td><td>b1</td><td>c1</td></tr>
      <tr><td>a2</td><td><a href="foo">b2</td><td>c2</td></tr>
      <tr><td>a3</td><td>b3</td><td>c3</td></tr>
    </table>
  </body>
</html>

2. Open the file in Calc.  (If the file has an .htm or .html extension, you will
need to set the filter to "HTML Document (OpenOffice.org Calc) (*.html;*.htm)"
in the file selection dialog, or else OpenOffice.org will try to open it with
Writer.)

Observed behaviour:
3. Calc renders the spreadsheet as follows:
spreadsheet
a1 b1 c1
a2 

Expected behaviour:
3. Calc should have rendered the spreadsheet as follows:
spreadsheet
a1 b1 c1
a2 b2 c2
a3 b3 c3
Comment 1 psychonaut 2010-10-28 17:22:59 UTC
Created attachment 72768 [details]
HTML spreadsheet Calc fails to import properly
Comment 2 psychonaut 2010-10-28 17:23:36 UTC
Created attachment 72769 [details]
Screenshot showing how Calc renders the attached spreadsheet
Comment 3 psychonaut 2010-10-28 17:26:51 UTC
Created attachment 72770 [details]
Screenshot showing how Writer more correctly imports the HTML file
Comment 4 Ephraim Purcell 2013-02-04 16:44:12 UTC
Behaviour as described with Calc and Writer imports on AOO 3.2 and 3.4.