Bug 60217

Summary: Word document with a single table gets corrupted after load/save with no changes
Product: POI Reporter: Kostiantyn Miklevskyi <kostiantyn.miklevskyi>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED DUPLICATE    
Severity: major    
Priority: P2    
Version: 3.15-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: Maven project with document corruption example
output .doc file after running unit test
Screenshot of Word error message when opening a corrupted file
LibreOffice 5.2.2.2 original file and corrupted file side-to-side

Description Kostiantyn Miklevskyi 2016-10-07 14:11:55 UTC
Created attachment 34333 [details]
Maven project with document corruption example

Attaching a sample with a Word document that gets corrupted when we open it and save it to another file with a code like:

final POIDocument doc = new HWPFDocument(new FileInputStream(DOCUMENT_NAME));
final File copy = new File(CORRUPTED_PREFIX + "-" + DOCUMENT_NAME);
doc.write(copy);

When trying to open source document it will open ok.
When trying to open the document after load/save Microsoft Word reports that it is corrupted and cannot be recovered.
Comment 1 Mark Murphy 2016-10-07 19:23:58 UTC
Can POI read the document after load/save?
Comment 2 Javen O'Neal 2016-10-08 22:16:16 UTC
Created attachment 34340 [details]
output .doc file after running unit test

Using the DocumentWithOneTable.doc from your attachment, the unit test below creates the attached file. LibreOffice does not complain about this file. Can you check if Word reports that the attached file is corrupted?

Added to TestHPSFBugs.java:
public void test60217() throws Exception {
    InputStream fis = new FileInputStream("/tmp/bug60217.doc");
    POIDocument doc = new HWPFDocument(fis);
    fis.close();
    doc.write(new File("/tmp/bug60217-out.doc"));
    doc.close();
}
Comment 3 Kostiantyn Miklevskyi 2016-10-10 06:36:28 UTC
>Mark Murphy 2016-10-07 19:23:58 UTC
>Can POI read the document after load/save?

No, it throws an exception.
Should've provided this info in initial report as I actually tried it.

Here's a code:

        final POIDocument doc = new HWPFDocument(SaveToAnotherDocumentBug.class.getClassLoader().getResourceAsStream(DOCUMENT_NAME));
        final File copy = new File(CORRUPTED_PREFIX + "-" + DOCUMENT_NAME);
        doc.write(copy);
        doc.close();

        new HWPFDocument(new FileInputStream(copy));

And it throws with this stacktrace:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1845343745
	at org.apache.poi.util.LittleEndian.getUByte(LittleEndian.java:274)
	at org.apache.poi.hwpf.model.FormattedDiskPage.<init>(FormattedDiskPage.java:61)
	at org.apache.poi.hwpf.model.PAPFormattedDiskPage.<init>(PAPFormattedDiskPage.java:85)
	at org.apache.poi.hwpf.model.PAPBinTable.<init>(PAPBinTable.java:75)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:226)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
	at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:145)
	at com.cosi.SaveToAnotherDocumentBug.main(SaveToAnotherDocumentBug.java:20)
Comment 4 Kostiantyn Miklevskyi 2016-10-10 06:41:24 UTC
>Javen O'Neal 2016-10-08 22:16:16 UTC
>Can you check if Word reports that the attached file is corrupted?

Yes. The same error message that Word reported previously.
Attaching a screenshot.
Comment 5 Kostiantyn Miklevskyi 2016-10-10 06:42:33 UTC
Created attachment 34350 [details]
Screenshot of Word error message when opening a corrupted file
Comment 6 Kostiantyn Miklevskyi 2016-10-10 07:05:36 UTC
Created attachment 34351 [details]
LibreOffice 5.2.2.2 original file and corrupted file side-to-side

Downloaded latest stable LibreOffice version 5.2.2.2 and it indeed doesn't complain about the corruption but, so I opened original document and a corrupted one to show the difference.
Comment 7 Dominik Stadler 2019-08-29 18:04:02 UTC
This looks quite similar to bug #60097, so I am closing this one as duplicate to have one place to continue discussion.

*** This bug has been marked as a duplicate of bug 60097 ***