Bug 60500 - Incorrect rowspan attribute value when converting a Word file containing a table with merged rows to HTML
Summary: Incorrect rowspan attribute value when converting a Word file containing a ta...
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.15-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-20 15:19 UTC by tbrielle
Modified: 2017-04-10 17:04 UTC (History)
1 user (show)



Attachments
Word file containing a table with merged rows (120.50 KB, application/msword)
2016-12-20 15:19 UTC, tbrielle
Details
Generated HTML (6.43 KB, text/html)
2016-12-20 15:19 UTC, tbrielle
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tbrielle 2016-12-20 15:19:13 UTC
Created attachment 34538 [details]
Word file containing a table with merged rows

Hi,

The attached Word document contains a table with multiple merged rows and columns. The following Java 8 code converts this document into HTML.

//----
FileInputStream fileInputStream = new FileInputStream(file); //
HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(fileInputStream);
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
wordToHtmlConverter.processDocument(wordDocument);
Document htmlDocument = wordToHtmlConverter.getDocument();
ByteArrayOutputStream out = new ByteArrayOutputStream();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(out);

TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
out.close();

System.out.println(new String(out.toByteArray(), StandardCharsets.UTF_8));
//----

In the generated HTML (see attached HTML), the "td" element corresponding to the lower right table cell contains a "rowspan" attribute with a value of "8" but this value should be "9" according to the original table layout.
Comment 1 tbrielle 2016-12-20 15:19:47 UTC
Created attachment 34539 [details]
Generated HTML