Created attachment 34538 [details] Word file containing a table with merged rows Hi, The attached Word document contains a table with multiple merged rows and columns. The following Java 8 code converts this document into HTML. //---- FileInputStream fileInputStream = new FileInputStream(file); // HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(fileInputStream); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()); wordToHtmlConverter.processDocument(wordDocument); Document htmlDocument = wordToHtmlConverter.getDocument(); ByteArrayOutputStream out = new ByteArrayOutputStream(); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(out); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); out.close(); System.out.println(new String(out.toByteArray(), StandardCharsets.UTF_8)); //---- In the generated HTML (see attached HTML), the "td" element corresponding to the lower right table cell contains a "rowspan" attribute with a value of "8" but this value should be "9" according to the original table layout.
Created attachment 34539 [details] Generated HTML