Bug 58963 - OutOfMemoryError while reading some Excel files
Summary: OutOfMemoryError while reading some Excel files
Status: RESOLVED DUPLICATE of bug 57031
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.13-FINAL
Hardware: PC All
: P2 critical with 4 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-03 09:35 UTC by Mircea
Modified: 2016-04-01 21:19 UTC (History)
1 user (show)



Attachments
sample project + file to reproduce the error (826.57 KB, application/zip)
2016-02-03 09:35 UTC, Mircea
Details
VisualVM screenshot after the test failed - OOM (46.07 KB, image/png)
2016-02-03 09:36 UTC, Mircea
Details
VisualVM memory sampler screenshot (59.32 KB, image/png)
2016-03-22 06:58 UTC, Mircea
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mircea 2016-02-03 09:35:47 UTC
Created attachment 33518 [details]
sample project + file to reproduce the error

Hi!

Today we received an Excel file which can't be added into the system, because POI triggers OOM while trying to open it.
The xlsx file has ~300KB and the application -Xmx750m. It does the same with -Xmx2750m so it's definitely not this.

This OOM happens with both 3.13 and 3.14-beta1.

I will attach a sample maven project + the problematic file called "eu-triggers-oom.xlsx" available in /src/test/resources

To replicate the issue, just execute the test from class POIExcelOOMNGTest.
I will also a VisualVM memory usage screenshot while running the test.

Here is the stacktrace of the test project:
shouldNotThrowOOMWhileReadingExcel(poi.excel.oom.POIExcelOOMNGTest)  Time elapsed: 136.287 sec  <<< FAILURE!
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
	at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:62)
	at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:465)
	at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:173)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:278)
	at poi.excel.oom.POIExcelOOMNGTest.shouldNotThrowOOMWhileReadingExcel(POIExcelOOMNGTest.java:12)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:60)
	... 37 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.xmlbeans.impl.store.CharUtil.allocate(CharUtil.java:397)
	at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:506)
	at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:419)
	at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:489)
	at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:2927)
	at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.stripText(Cur.java:3130)
	at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:3143)
	at org.apache.xmlbeans.impl.store.Locale$SaxHandler.characters(Locale.java:3291)
	at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportCdata(Piccolo.java:992)
	at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1290)
	at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261)
	at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4812)
	at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290)
	at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400)
	at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714)
	at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479)
	at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277)
	at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264)
	at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
	at org.openxmlformats.schemas.spreadsheetml.x2006.main.SstDocument$Factory.parse(Unknown Source)
	at org.apache.poi.xssf.model.SharedStringsTable.readFrom(SharedStringsTable.java:119)
	at org.apache.poi.xssf.model.SharedStringsTable.<init>(SharedStringsTable.java:106)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:60)
	at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:465)
	at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:173)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:278)
	at poi.excel.oom.POIExcelOOMNGTest.shouldNotThrowOOMWhileReadingExcel(POIExcelOOMNGTest.java:12)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Comment 1 Mircea 2016-02-03 09:36:55 UTC
Created attachment 33519 [details]
VisualVM screenshot after the test failed - OOM
Comment 2 Javen O'Neal 2016-02-03 19:10:29 UTC
By any chance have you used the same file on an older version of POI without getting an OOM?
Comment 3 Mircea 2016-02-03 19:12:06 UTC
We received the file just today from a complaining user, then we tested it ourselves with 3.13 and 3.14-beta1, but nothing else.
I am sorry.
Comment 4 Javen O'Neal 2016-03-22 06:37:34 UTC
Looking through eu-triggers-oom.xlsx, I didn't see any XML bombs likely to bloat memory. The main memory consumer here is shared strings table containing 5700 unique values. The rest of the XML files are mostly empty.

To build up a test case to make sure there isn't anything else in the Excel file, a unit test could be: read in a dictionary of random Norwegian words (including æ, ø, å--if that matters), generate 5700 "sentences" composed of 10 random words each, and look at memory consumption.

The SharedStringsTable [1] uses an Array<CTRst> and a Map<String, Int> to store the strings. The Map makes string lookup faster at the cost of increasing the memory requirements, but both structures should be able to handle 6700 entries without OOM'ing.

What kinds of objects did VisualVM indicate were consuming the largest amount of memory?

[1] https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/xssf/model/SharedStringsTable.java?view=markup
Comment 5 Mircea 2016-03-22 06:58:59 UTC
Created attachment 33690 [details]
VisualVM memory sampler screenshot

I have just reproduced the error and took a screenshot from VisualVM's memory sampler.
It seems that char[] is taking most of the memory.
Comment 6 Javen O'Neal 2016-03-22 07:25:49 UTC
> It seems that char[] is taking most of the memory.
Which would make sense because the shared strings table is disproportionately large, and rest of the xml nodes are just arrays/pointers+strings.

Andi fixed some OOM's in the current trunk build on bug 57031. Read through that bug to see if it's relevant to you, and if so see if you still get OOM's on a 3.15 beta 1 nightly [1]

[1] https://builds.apache.org/job/POI/lastSuccessfulBuild/artifact/
Comment 7 Mircea 2016-03-22 09:06:25 UTC
I'll give it a try.
What should I write in my Maven pom.xml file in order to get the nightly build instead of 3.13?
I didn't find the nightly builds Maven repository.
Comment 8 Dominik Stadler 2016-03-29 15:35:43 UTC
Nightly builds are not available via Maven but only as manual download via https://builds.apache.org/view/POI/job/POI/lastSuccessfulBuild/artifact/build/dist
Comment 9 Mircea 2016-04-01 19:41:58 UTC
I looked into that link but I couldn't find the working dist version.
I tried what's in build/dist, but no success. Apache POI classes were missing.

As you probably already have POI setup in your IDE, can you please send me (via Dropbox, OneDrive etc) the working dist version?

Otherwise, what should I do to get it working?
I spent quite a lot of time taking the src zip and trying to fix all the missing dependencies, but there are many which aren't there.

On the other side, I provided the test case XML and Java code (3 lines).
It's 1000x easier for one of POI developers to copy those 3 lines + the file in a project compiled against the latest version of POI.
Then I wouldn't have to spend a lot of time trying to build POI myself.

Thanks a lot. :)
Comment 10 Nick Burch 2016-04-01 19:59:25 UTC
(In reply to Mircea from comment #9)
> Otherwise, what should I do to get it working?
> I spent quite a lot of time taking the src zip and trying to fix all the
> missing dependencies, but there are many which aren't there.

Just run "ant jar" from a svn checkout / git checkout / source download, and all the dependencies you need will be fetched for you on demand

Otherwise, the nightly builds are available from Jenkins at https://builds.apache.org/job/POI/lastSuccessfulBuild/artifact/build/dist/ - grab the POI jars from the bin, and add in any dependencies from the previous full POI release's bin package
Comment 11 Mircea 2016-04-01 21:19:03 UTC
Great! Thanks for the clarification.
I built it with ant, then added the remaining jars.

It seems to work properly now. No more memory issue.
Most probably bug 57031 has taken care of it.

Thanks, Apache POI!

*** This bug has been marked as a duplicate of bug 57031 ***