Apache OpenOffice (AOO) Bugzilla – Issue 3103
failed to open large size document
Last modified: 2013-08-07 14:43:23 UTC
everytime I try to open a large size document, I failed and got an soffice.exe application error message, such as: The instruction at "0x00231f5a" referenced memory at "0x06061d07". The memory could not be read... PS: my document is zipped from an xml file which is generated by a report program. on the url link, you will find a big file which can't open it but you can see the xml file if you unzip it. The small file works fine.
Created attachment 1047 [details] a large size doc file can't be opened (zipped from an xml file)
Reassigned to Éric.
ES->DVO: Well I also get a GPF in the 641. In the current build, no GPF, just an empty page. looks normal because the "sxw" file only contains a huge conten.xml, without manifest, style or meta... Should we open an empty page or display a message error "Unknown format!"?
Hello Eric, Regarding your response, I have two questions: 1. Definitely, I don't know that my file contains unknown format. It does work with a little smaller file. All format appears to be according to OpenOffice xml file format. So it should open neither as an empty page nor unknown format... 2. I've included all styles in one content.xml file. I think I don't need the rest files such as manifest, meta, settings.... Is it required to separate all xml files and then zip together? Best regards, Rachel
dvo->rachel, es: We support a single content.xml without any other streams, but it's no 'official' feature. Either way, any GPF is a bug, and so is an empty page (provided, the document itself is OK). Both of those will likely occur with a multi-stream package, too.
This is indeed a problem with large documents: the item reference counter overflows, due to too many 'hard' formatting. [details in: SfxPoolItem::AddRef(), svtools/poolitem.hxx, line 347] Fixing this will probably require a substantial rework of the code base, so I don't think we can do that anytime soon. There's a fairly easy work-around though, namely to use style sheets. If I move all paragraph styles (all those called P<number>) from the <office:automatic-styles> into the <office:styles> element, they become templates. No 'hard' paragraph formatting -> no refcount overflow -> no problem. (Actually, using style sheets is better anyway...) dvo->rachel: Please comment on whether you can load he file with the work-around, and whether this is good enough for you. dvo->rachel: Btw, the document has some syntax errors: All tables have the same name, which the format doesn't allow. This doesn't cause any bugs or so, though. Just thought you might want to know.
Hello Daniel, Thank you so much for helping me out. I'm able to open a big file with the work-around now :-)) I also corrected my syntax errors.( I thought that was allowed... ) However, it seems taking a very long time to fully load a big file. for example, I tried to open a 600-page file. it took 1 minute to show up on the window, and another 4 minutes to be fully loaded. The reason I'm saying fully loading is that when the document first shows a few pages, it also shows a much bigger total page number(such as 2270, kind of estimated pages?) which is not true, while CPU is working very hard (fully cpu usage)to load the rest pages. I can't scroll down to the last page untill it shows right total page number, which took another 4 mins. Is this normal? How long time should I expect to open such a big file? once again, thanks a lot, Rachel
dvo->rachel: Two comments: 1) performance: Indeed, performance with (large) tables can be improved in Writer. The loading isn't that fast, and I've also heard about problems in Writer. Alas, this is hard to fix. 2) 2000 pages: When loading a file, the text document is being formatted. The layout initially distributes the contents to the pages according to a simple heuristics. In this case, it apparently overestimates the number of required pages; that's where the 2000 pages come from. Then the 'real' formatting of the document starts; that's where the high CPU load comes from. In the end, the document is properly formatted, and the number of pages should be correct again. I think you can already start working on the formatted parts of the documents. This is behaviour is OK (except for performance). I'm not sure what do to with this issue. I think I'll split it into two (one for performance, one for limit on hard formatting), and mark both a 'resolution: later'. This way, the issues won't get lost, and we can fix them later on.
dvo: Removing the limit on 'hard' formatting will apparently get removed as part of issue 5000. Work is in progress.
dvo: I mark it 'duplicate' because the limit on hard formatting problem is resolved as part of issue i5000. Performance issues have been improved over the time; I'm not sure if the original problem still persists. *** This issue has been marked as a duplicate of 5000 ***
As mentioned on the qa dev list on March 5th I will close all resolved duplicate issues. Please see this posting for details. First step in IssueZilla is unfortunately to set them to verified.
As mentioned on the qa dev list on March 5th I will close all resolved duplicate issues. Please see this posting for details.