Bug 56294 - Cannot parse an XLSX
Summary: Cannot parse an XLSX
Status: RESOLVED LATER
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: 3.10-FINAL
Hardware: PC All
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-20 12:17 UTC by Eamonn Young
Modified: 2014-09-05 18:56 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eamonn Young 2014-03-20 12:17:33 UTC
I am trying to open a workbook in an XLSX format in order to parse it and retrieve certain elements from certain cells. The file is provided by a customer. Please find (a modified version of) the file attached. I am able to open the file directly in Excel, but not in POI.

I'm running Java version jdk1.6.0_23 and have installed version 3.10 of the POI library package.

When attempting to parse the file using "org.apache.poi.xssf.usermodel.XSSFWorkbook", Memory usage jumps from under 200KB to over 1.1GB. The CPU also jumps to 100% capacity.

The program then stalls and does not recover. The Debug window in Eclipse shows a large number (380) of Threads as "Running".

I would really appreciate it if you could investigate the cause behind this issue.

Thanks in advance,
Eamonn.

P.s. I looked at bug 55769, but we do not get any error message - Java just hangs...
Comment 1 Nick Burch 2014-03-20 13:11:00 UTC
Can you try bumping up the heap size more, to see if that helps?

Can you run the jvm so it reports when it does garbage collection, so you can see if the hang is due to terminal-GC or something else?

If you take a thread dump, what are the threads doing when it hangs?
Comment 2 Eamonn Young 2014-03-21 16:51:40 UTC
(In reply to Nick Burch from comment #1)
> Can you try bumping up the heap size more, to see if that helps?
> 
> Can you run the jvm so it reports when it does garbage collection, so you
> can see if the hang is due to terminal-GC or something else?
> 
> If you take a thread dump, what are the threads doing when it hangs?

Nick,

Thanks a million for your swift response. We increased the heap size and that allows us to process the file. So thanks for the suggestion. It is slow, but it gets there.

Thanks again,
Eamonn.
Comment 3 Nick Burch 2014-03-21 17:07:26 UTC
OK, that's good. Next thing to check - is the problem with your code or file?

There's two example programs that can be used to check generation speed and reading speed, try using SSPerformanceTest to check how long to generate a simple file of your desired size. Next check if your program can load it quicker or not. Now, use XLSX2CSV to read both your file and your generated test file - how slow/quick, and are they about the same speed?
Comment 4 Arik 2014-03-26 13:39:19 UTC
Hi,

We encountered the same behavior, 25 MB file with 382K rows and 15 columns each.

anything lower than that and we get OOM exception.

We are using POI-3.10-FINAL on JDK 7 with the latest patch.

Please advise.

Thank you,
Arik
Comment 5 Nick Burch 2014-03-26 13:52:17 UTC
Have you tried following the suggestions/strategy from the FAQ entry "I think POI is using too much memory! What can I do?" ?
http://poi.apache.org/faq.html#faq-N10109
Comment 6 Andreas Beeker 2014-09-05 18:56:48 UTC
There was no feedback over the last 5 month, so I'm closing this now.
We can re-open it later anyways ...