Bug 57603 - failed to create Word 2003 with seven or more columns
Summary: failed to create Word 2003 with seven or more columns
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.11-FINAL
Hardware: PC All
: P5 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 55541 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-02-19 18:10 UTC by E.G.Miranda
Modified: 2021-03-02 19:35 UTC (History)
2 users (show)



Attachments
Test program and word documents (11.71 KB, application/x-rar)
2015-02-19 18:10 UTC, E.G.Miranda
Details
Empty document with a seven column table (41.50 KB, application/msword)
2016-01-05 16:31 UTC, Thomas Schwery
Details
Patch demonstrating exception resolution (4.39 KB, patch)
2021-03-02 19:35 UTC, Marius Volkhart
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description E.G.Miranda 2015-02-19 18:10:23 UTC
Created attachment 32498 [details]
Test program and word documents

When HWPFDocument reads a Word document (.doc 2003) with six columns, writes a new document without problem. If Word document has seven or more columns, the write process creates a corrupt document, that MSWord can't read it.
Comment 1 Thomas Schwery 2016-01-05 16:31:42 UTC
Created attachment 33408 [details]
Empty document with a seven column table

This bug is also occurring with the latest Apache POI 3.14-beta1 on an empty document that is written without modifications.

To reproduce, you only need to open the attached file and write it back :
        HWPFDocument doc = new HWPFDocument(new FileInputStream(inFile));
        FileOutputStream fos = new FileOutputStream(outFile);
        doc.write(fos);
        fos.close();

Word then displays an error when the output file is opened.
Comment 2 Thomas Schwery 2016-01-19 12:16:16 UTC
After doing some additional tests, it seems that writing such a file and opening it again leads to an Exception. Here is a small testcase that reproduce the error using the previously attached document :

    public void testSevenRowTable() throws Exception
    {
        HWPFDocument hwpfDocument = new HWPFDocument( POIDataSamples
                .getDocumentInstance().openResourceAsStream( "Bug57603-sevencolumns.doc" ) );

        ByteArrayOutputStream out = new ByteArrayOutputStream();
        hwpfDocument.write(out);
        out.close();

        HWPFDocument hwpfDocument2 = new HWPFDocument(new ByteArrayInputStream(out.toByteArray()));
    }
Comment 3 Dominik Stadler 2016-07-17 21:18:34 UTC
I have added a disabled unit-test for this via r1753120.
Comment 4 Dominik Stadler 2016-07-26 13:03:41 UTC
*** Bug 55541 has been marked as a duplicate of this bug. ***
Comment 5 Marius Volkhart 2021-03-02 19:35:31 UTC
Created attachment 37753 [details]
Patch demonstrating exception resolution

I believe the root cause to be that PAPBinTable#writeTo() uses the tableStream on line 408. It should be the dataStream.

7 columns appears to cause the creation of a "huge grpprl". According to MS-DOC, those are stored in the dataStream (search spec for sprmPHugePapx).

I have a relatively simple way of making the exception not happen. However, given that there is no support at the moment for saving any content in the dataStream that has been altered, it is possible that changes made to the document would be silently lost during the save. It is also possible that this might corrupt the document.

During the save, the contents of the huge grpprl get written to the dataStream, and a pointer to the location of the huge grpprl is modified in the PAPFormattedDiskPage. That pointer, if it were to be saved (and I'm not sure it is) would be incorrect since we reuse the old dataStream during save.