Apache OpenOffice (AOO) Bugzilla – Issue 115301
Calc confused by unclosed HTML tags
Last modified: 2013-02-04 16:44:12 UTC
The Calc HTML importer is completely confused by unclosed HTML tags. If you try to open an HTML file in Calc which contains unclosed HTML tags, it will import only up until the unclosed tag. The Writer HTML importer is much more resilient, and will gracefully ignore unclosed tags. Reproducibility: Always Steps to reproduce: 1. Create an HTML file containing a table with unclosed tags. Example: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>spreadsheet</title> </head> <body> <table> <tr><td>a1</td><td>b1</td><td>c1</td></tr> <tr><td>a2</td><td><a href="foo">b2</td><td>c2</td></tr> <tr><td>a3</td><td>b3</td><td>c3</td></tr> </table> </body> </html> 2. Open the file in Calc. (If the file has an .htm or .html extension, you will need to set the filter to "HTML Document (OpenOffice.org Calc) (*.html;*.htm)" in the file selection dialog, or else OpenOffice.org will try to open it with Writer.) Observed behaviour: 3. Calc renders the spreadsheet as follows: spreadsheet a1 b1 c1 a2 Expected behaviour: 3. Calc should have rendered the spreadsheet as follows: spreadsheet a1 b1 c1 a2 b2 c2 a3 b3 c3
Created attachment 72768 [details] HTML spreadsheet Calc fails to import properly
Created attachment 72769 [details] Screenshot showing how Calc renders the attached spreadsheet
Created attachment 72770 [details] Screenshot showing how Writer more correctly imports the HTML file
Behaviour as described with Calc and Writer imports on AOO 3.2 and 3.4.