Bug 64838

Summary: Loading and saving a file adds an unwanted <w:p/> tag resulting in additional newlines in the document
Product: POI Reporter: NadavB <nadavbenedek>
Component: XWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Example file to regenerate the bug

Description NadavB 2020-10-21 17:40:14 UTC
Created attachment 37517 [details]
Example file to regenerate the bug

Loading and saving a file, adds an unwanted <w:p/> tag in the output file.

Here's the code to regenerate the problem:

       FileInputStream inputFis = new FileInputStream("test6.docx");
        XWPFDocument doc = new XWPFDocument(inputFis); 
        doc.write(new FileOutputStream(new File("test6_out.docx")));  
        doc.close();
Comment 1 Dominik Stadler 2020-10-25 20:49:26 UTC
Does it actually cause a problem or is just some of your custom parsing strict about the elements? 

I don't see a visual difference caused by this tag and if I load/save the file with LibreOffice, the structure is stored completely differently anyway.
Comment 2 NadavB 2020-10-26 15:31:56 UTC
Yes, there is a problem - a new line is added. I view it with LibreOffice Writer.

Do you see that a new tag <w:p/> is added? That's a new paragraph.

Why loading a document xml and saving it should change the structure by Apache POI?
Comment 3 Sayi 2020-10-27 07:25:49 UTC
I think there will be an error when opening your document in some versions of word processing software: table cell should include at least one block-level element.

For example, if you add a paragraph to the cell, the new paragraph will not be added automatically.
Comment 4 NadavB 2020-10-30 14:30:56 UTC
Is there a way to simulate with POI an opening of a file with LibreOffice and saving it, to fix these kinds of issues?
Comment 5 Dominik Stadler 2020-11-01 18:36:47 UTC
The following change would remove this check and adding of the paragraph, but it seems to have been added for a reason a long time ago, probably for a similar reason to what Sayi explained, so I fear reverting this in general might have side-effects on other documents.


Index: src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFTableCell.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFTableCell.java	(revision d427ca10e24dfc4b43985a0dd87ad750ca1e18ba)
+++ src/ooxml/java/org/apache/poi/xwpf/usermodel/XWPFTableCell.java	(revision 3304c1bdfa571730386f147116586917ac40762f)
@@ -80,9 +80,7 @@
         this.ctTc = cell;
         this.part = part;
         this.tableRow = tableRow;
-        // NB: If a table cell does not include at least one block-level element, then this document shall be considered corrupt.
-        if (cell.sizeOfPArray() < 1)
-            cell.addNewP();
+
         bodyElements = new ArrayList<>();
         paragraphs = new ArrayList<>();
         tables = new ArrayList<>();
Comment 6 Dominik Stadler 2020-12-30 21:41:35 UTC
I thought some more about this and decided to try to remove this behavior as it seems unwanted and is likely a very old remnant of the early implementation here. 

We'll see if there are cases that need this behavior and can then re-add it under some better conditions that do not change files in such an unexpected way.

This is done via r1884958.
Comment 7 Sayi 2021-05-20 10:08:35 UTC
see bug #65292