Bug 65292 - createRow() leads to corrupted word file
Summary: createRow() leads to corrupted word file
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 5.0.0-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-07 14:08 UTC by Thomas Hoffmann
Modified: 2021-05-20 10:02 UTC (History)
1 user (show)



Attachments
Test word file (14.33 KB, application/x-zip-compressed)
2021-05-07 14:08 UTC, Thomas Hoffmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Hoffmann 2021-05-07 14:08:57 UTC
Created attachment 37849 [details]
Test word file

Hello,
we have a docx file with a table in it.
If we add a new row via createRow() and save the document, it gets corrupted.

MS Word says "Word found unreadable cotent in ... Do you want to recover the contents of this document? If you trust the source of this document, click yes"

The code to reproduce this is quite small:

    public static void main(String[] args) throws Exception
    {
        File templateFile = new File("c\\test.docx");

        XWPFDocument xwpfDocument = null;
        try (InputStream input = new FileInputStream(templateFile))
        {
            xwpfDocument = new XWPFDocument(input);
        }
        
        XWPFTable table = xwpfDocument.getTableArray(0);
        table.createRow();
        
        try (FileOutputStream output = new FileOutputStream("c:\\test_out.docx"))
        {
            xwpfDocument.write(output);
        }

        xwpfDocument.close();
    }

Without the line table.createRow() everything works well. Adding a row to the table renders the file corrupt when saving.

Could you check and verify the problem with the current version 5.0.0 ?

Thanks in advance!
Comment 1 Thomas Hoffmann 2021-05-07 14:52:00 UTC
Digging into the sourcecode of POI, I found that the empty cells are causing the problems.

class: XWPFTable
private void addColumn(XWPFTableRow tabRow, int sizeCol) {
        if (sizeCol > 0) {
            for (int i = 0; i < sizeCol; i++) {
                tabRow.createCell();
            }
        }
    }

If i change the loop and additionally fill the cell with a paragraph, MS word won't complain about a corrupted file any more.

So, instead of "tabRow.createCell()" I do:

XWPFTableCell c = tabRow.createCell();
c.addParagraph();

Error is gone after this change in the POI sources. Maybe you can check, whether empty cells violate the specification.
Comment 2 Sayi 2021-05-07 15:51:56 UTC
It does report an error when opened in ms word, but it seems to be correct when opened in libreoffice.

Please check bug #64838(https://bz.apache.org/bugzilla/show_bug.cgi?id=64838).

The solution may be:
1. not add a paragraph when loading an existing empty table cell, and add a paragraph by default when creating a cell???
2. not add a paragraph: in order to open without error, users add paragraph by themselves
3. add a paragraph by default: this will change the structure of the empty cell in the existing document
Comment 3 Thomas Hoffmann 2021-05-07 21:12:39 UTC
Hello,
thanks for the quick response.
I think option no 1) sounds reasonable to me.

POI shouldn't change a file or the content / structure when opening.
But when I add a new row, it should be initialized "correctly".

MS word shows the file correctly but the error message will worry the user
and needs an extra approval when opening.

It didn't happen with POI 4.x so probably something changed in POI 5.0.0.

Another option would be to pass a parameter whether to initialize the cells
when creating a new row --> tabRow.createCell(true)

(option no 2) would cause many errors if the programmer needs to take care of
"post-processing" the newly added row which is not obvious at first)

Regards, Thomas
Comment 4 Sayi 2021-05-20 10:02:41 UTC
I have revert r1884958, the final solution is add a paragraph by default when creating a cell, not add a paragraph when loading an existing table cell.

This is done via r1890042.