Bug 51834

Summary: Opening and Writing .doc file results in corrupt document
Product: POI Reporter: Gilbert <roger.varley>
Component: HWPFAssignee: POI Developers List <dev>
Status: REOPENED ---    
Severity: major CC: melanie.reiter, poi.dev.art
Priority: P2    
Version: 3.8-dev   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: Opening and re-writing this file corrupts the output
Result doc (correct one)
Validation result
Opening and re-writing this file corrupts the output

Description Gilbert 2011-09-16 11:47:25 UTC
Created attachment 27508 [details]
Opening and re-writing this file corrupts the output

This code run against the attached document results in a corrupt word document that crashes MSWord 2003 and 2007 refuses to open.

	private void start() throws FileNotFoundException, IOException {

        POIFSFileSystem fsfilesystem = null;
        HWPFDocument hwpfdoc = null;
        
        InputStream resourceAsStream =  getClass().getResourceAsStream("/com/blackbox/admin/templates/rma.doc");       
        try {
			fsfilesystem = new POIFSFileSystem(resourceAsStream );
			hwpfdoc = new HWPFDocument(fsfilesystem);
			
			FileOutputStream fos = new FileOutputStream(new File("C:\\temp\\newTemplate.doc"));
			hwpfdoc.write(fos);
			fos.flush();
			fos.close();
			
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
        
		System.out.println("Opened");

}
Comment 1 Sergey Vladimirov 2011-10-02 01:08:32 UTC
Please, check latest code from trunk and attachment with saved document. It is passed Microsoft BFFValidator.

Several bugs were fixed:
 - summary properties handling
 - extended FIB handling
 - lists handling
Comment 2 Sergey Vladimirov 2011-10-02 01:09:18 UTC
Created attachment 27667 [details]
Result doc (correct one)
Comment 3 Sergey Vladimirov 2011-10-02 01:09:48 UTC
Created attachment 27668 [details]
Validation result
Comment 4 poi.dev.art 2011-12-26 16:23:03 UTC
Created attachment 28099 [details]
Opening and re-writing this file corrupts the output

Table cells seems to be problematic.

Tested : 
Merging any cells (using WORD 2007) from the input document before re-writing it makes the output clean.
Removing the table produces a clean output too
Comment 5 poi.dev.art 2011-12-26 16:25:06 UTC
Reopening for 3.8-beta5 : See previous comment
Comment 6 melanie.reiter 2014-03-11 11:54:10 UTC
This bug still exists in Version 3.10 final.

The following Situation occured:

My Word Document contains a table and I want to replace some text in a cell.
This works fine and I can open the file with Word 2010, but not with Word 2003 (It is a doc file).

There are three cases after replacing the text:

1. same length of the text: no problem, it is possible to open the file in Word 2003

2. old one is longer than replacement: open and repair is possible with Word 2003

3. old one is shorter than replacement: Word 2003 crashes

It is possible to open all documents with Word 2010.

Another test was to replace a text that is contained in an enumeration, but not in a table and it has got the same behavior.