Created attachment 37517 [details] Example file to regenerate the bug Loading and saving a file, adds an unwanted <w:p/> tag in the output file. Here's the code to regenerate the problem: FileInputStream inputFis = new FileInputStream("test6.docx"); XWPFDocument doc = new XWPFDocument(inputFis); doc.write(new FileOutputStream(new File("test6_out.docx"))); doc.close();
Does it actually cause a problem or is just some of your custom parsing strict about the elements? I don't see a visual difference caused by this tag and if I load/save the file with LibreOffice, the structure is stored completely differently anyway.
Yes, there is a problem - a new line is added. I view it with LibreOffice Writer. Do you see that a new tag <w:p/> is added? That's a new paragraph. Why loading a document xml and saving it should change the structure by Apache POI?
I think there will be an error when opening your document in some versions of word processing software: table cell should include at least one block-level element. For example, if you add a paragraph to the cell, the new paragraph will not be added automatically.
Is there a way to simulate with POI an opening of a file with LibreOffice and saving it, to fix these kinds of issues?
The following change would remove this check and adding of the paragraph, but it seems to have been added for a reason a long time ago, probably for a similar reason to what Sayi explained, so I fear reverting this in general might have side-effects on other documents. Index: src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFTableCell.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFTableCell.java (revision d427ca10e24dfc4b43985a0dd87ad750ca1e18ba) +++ src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFTableCell.java (revision 3304c1bdfa571730386f147116586917ac40762f) @@ -80,9 +80,7 @@ this.ctTc = cell; this.part = part; this.tableRow = tableRow; - // NB: If a table cell does not include at least one block-level element, then this document shall be considered corrupt. - if (cell.sizeOfPArray() < 1) - cell.addNewP(); + bodyElements = new ArrayList<>(); paragraphs = new ArrayList<>(); tables = new ArrayList<>();
I thought some more about this and decided to try to remove this behavior as it seems unwanted and is likely a very old remnant of the early implementation here. We'll see if there are cases that need this behavior and can then re-add it under some better conditions that do not change files in such an unexpected way. This is done via r1884958.
see bug #65292