Created attachment 32498 [details] Test program and word documents When HWPFDocument reads a Word document (.doc 2003) with six columns, writes a new document without problem. If Word document has seven or more columns, the write process creates a corrupt document, that MSWord can't read it.
Created attachment 33408 [details] Empty document with a seven column table This bug is also occurring with the latest Apache POI 3.14-beta1 on an empty document that is written without modifications. To reproduce, you only need to open the attached file and write it back : HWPFDocument doc = new HWPFDocument(new FileInputStream(inFile)); FileOutputStream fos = new FileOutputStream(outFile); doc.write(fos); fos.close(); Word then displays an error when the output file is opened.
After doing some additional tests, it seems that writing such a file and opening it again leads to an Exception. Here is a small testcase that reproduce the error using the previously attached document : public void testSevenRowTable() throws Exception { HWPFDocument hwpfDocument = new HWPFDocument( POIDataSamples .getDocumentInstance().openResourceAsStream( "Bug57603-sevencolumns.doc" ) ); ByteArrayOutputStream out = new ByteArrayOutputStream(); hwpfDocument.write(out); out.close(); HWPFDocument hwpfDocument2 = new HWPFDocument(new ByteArrayInputStream(out.toByteArray())); }
I have added a disabled unit-test for this via r1753120.
*** Bug 55541 has been marked as a duplicate of this bug. ***
Created attachment 37753 [details] Patch demonstrating exception resolution I believe the root cause to be that PAPBinTable#writeTo() uses the tableStream on line 408. It should be the dataStream. 7 columns appears to cause the creation of a "huge grpprl". According to MS-DOC, those are stored in the dataStream (search spec for sprmPHugePapx). I have a relatively simple way of making the exception not happen. However, given that there is no support at the moment for saving any content in the dataStream that has been altered, it is possible that changes made to the document would be silently lost during the save. It is also possible that this might corrupt the document. During the save, the contents of the huge grpprl get written to the dataStream, and a pointer to the location of the huge grpprl is modified in the PAPFormattedDiskPage. That pointer, if it were to be saved (and I'm not sure it is) would be incorrect since we reuse the old dataStream during save.