Bug 56006 - XWPFDocument increasing size and gets unusable after several "read-write" iterations
Summary: XWPFDocument increasing size and gets unusable after several "read-write" ite...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 3.9-FINAL
Hardware: All All
: P2 blocker (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-14 16:49 UTC by Andres Fuentes
Modified: 2014-07-31 16:23 UTC (History)
1 user (show)



Attachments
Footnotes in first iteration (2.19 KB, text/xml)
2014-01-14 17:03 UTC, Andres Fuentes
Details
Footnotes in 10th iteration (560.10 KB, text/xml)
2014-01-14 17:03 UTC, Andres Fuentes
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andres Fuentes 2014-01-14 16:49:22 UTC
Hello,
  first of all, sorry about my english. I hope someone could know if there is any problem with my code or it's a problem with poi library. I'm using poi 3.9.

The fact is this:

1.- I read a file with XWPFDocument:

     File docFile = null;
                     docFile = new File(fileUrl);
                     FileInputStream fis;
                        try {
                                fis = new FileInputStream(docFile.getAbsolutePath());
                                XWPFDocument doc = new XWPFDocument(fis);

2.- I do some operations, but I've tried the code withour any code here, so I'll pass.

3.- I write the file into disk:

       FileOutputStream out = new FileOutputStream(outFile);
                                 doc.write(out);
                                 out.close();

The size of the document get's changed, but this doesn't worry me. The problem is that after several modifications, the size of it are increased each time. At about 14 or 15 "open-close" iterations, the file stucks in this line:

      XWPFDocument doc = new XWPFDocument(fis);

It takes 100% cpu and 100% of available Java memory, until out of memory error is thrown. I think poi modifies the internal structure adding some things to it. If I open the file with Word, and save it, it turns to the beginning state, all ok, but I need to get it modified automatically a lot of times.
Comment 1 Nick Burch 2014-01-14 16:56:00 UTC
A .docx file is a zip of XML files in a certain structure

Could you therefore unzip two files, one that works, and one that doesn't, and see where the main differences are?
Comment 2 Andres Fuentes 2014-01-14 17:01:50 UTC
(In reply to Nick Burch from comment #1)
> A .docx file is a zip of XML files in a certain structure
> 
> Could you therefore unzip two files, one that works, and one that doesn't,
> and see where the main differences are?

Thank you for your comment. The first big difference I see is in the /word/footnotes.xml. The original one is 3KB and the 10th iteration has 561KB. Do you know how this happens?
Comment 3 Andres Fuentes 2014-01-14 17:03:15 UTC
Created attachment 31204 [details]
Footnotes in first iteration
Comment 4 Andres Fuentes 2014-01-14 17:03:55 UTC
Created attachment 31205 [details]
Footnotes in 10th iteration
Comment 5 Nick Burch 2014-01-14 17:10:26 UTC
Before we look too much more, could you try with a recent nightly build of POI / a build of POI from a SVN checkout of trunk?
Comment 6 Andres Fuentes 2014-01-14 17:23:39 UTC
I'll download 3.10.beta3 (20140111) and give it a try. I'll update you asap.
Comment 7 Andres Fuentes 2014-01-14 17:42:07 UTC
I've tried with 3.10-beta3-20140110 (20140111 are failing to download) and it works perfect, I've run 30 iterations and size are not modified. In your page it says: " These builds should not be used in production: they are only intended for use by developers to help with resolving bugs and evaluating new features. ". Shoud I wait until this are released or can I use this beta?
Comment 8 Nick Burch 2014-07-31 16:23:52 UTC
Closing, based on being fixed in 3.10 final (which has been available for some months now)