|Summary:||Another HSSF OOM Problem - Small XLS File|
|Product:||POI||Reporter:||Jacob Steijn <jsteijn>|
|Component:||HSSF||Assignee:||POI Developers List <dev>|
|OS:||Windows Server 2003|
this zip has a sheet that kills hssf, and the larger file is a spreadsheet with VBA that generates the killer sheet.
Java module that calls hssf, to set context of failure.
Description Jacob Steijn 2009-11-16 21:12:39 UTC
Created attachment 24545 [details] this zip has a sheet that kills hssf, and the larger file is a spreadsheet with VBA that generates the killer sheet. I have a spreadsheet generated by Excel 2003 that reliably triggers POI HSSF to ask for all the memory my JVM has to give and then asks for more, causing a heap dump etc. I have not seen this in bugtrack or on the user lists - for small sheets - and wonder if perhaps this is a unique error. My spreadsheet is only 22 records, and about 11 columns. All data is general or text format. My application is running under Java 1.4, a 2GB JVM on a quad processor server with 8 GB of physical memory. No other applications on the server. The application is IBM Maximo Asset Manager 6.2 on a Websphere 6.0.2_27 application server. Attached are: the XLS that kills HSSF, the XLS workbook with the VBA script that creates that file, sample log data with stack trace. This was originally noted on a system using POI 2.5 FINAL. Subsequently tested on the same system with the POI 3.5 final library in place, no change in behavior. Note, there are twenty to thirty PCs which produce Excel sheets that we access with the POI API, only two machines are definitely associated with creating sheets with this pathological condition. For all I know they are running defective versions of DAO or some other MS toolkit. The cause is irrelevant. I would not expect HSSF to be able to read any file we cook up, but I would like to have a method to protect my application from what is in effect a denial of service. Thanks to anyone who can provide a patch, a workaround or any other trick we can use to protect our system while preserving the functionality. Stack trace: [11/11/09 6:50:04:243 EST] 00000038 O UOW= source=SystemOut org=IBM prod=WebSphere component=Application Server thread=[maximo-LaborEntry.labentry_1] 2009-11-11 06:50:04,227 ERROR maximo.crontaskmanager - java.lang.OutOfMemoryError at java.util.ArrayList.add(ArrayList.java(Compiled Code)) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java(Compiled Code)) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:210) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:191) at com.csc.dupont.interfaces.inbound.labor.beans.ExcelDocToLaborEntry.readExcel(ExcelDocToLaborEntry.java:45) at com.csc.dupont.interfaces.inbound.labor.LaborEntryMaximo.processLaborEntry(LaborEntryMaximo.java:335) at com.csc.dupont.interfaces.inbound.labor.LaborEntryMaximo.processLaborEntryXlsDir(LaborEntryMaximo.java:291) at com.csc.dupont.interfaces.inbound.labor.LaborEntryMaximo.cronAction(LaborEntryMaximo.java:143) at psdi.server.CronTaskManager$CronThread.run(CronTaskManager.java:1297)
Comment 1 Josh Micich 2009-11-17 12:47:18 UTC
I experimented a little with the smaller the XLS file from your zip (name: "Stine Nov. 9 lk.xls", md5sum: bba472f37e3df4a8fb9a83459b54bec6). It appears that this file is definitely broken. Excel 2003 and 2007 both report data loss (several times) while opening this file. So while it is probably OK for POI to throw some sort of exception, I agree that OutOfMemoryError is not the right one. Having said that, it appears that the latest POI (svn trunk) does *not* throw OOME. The line numbers in the stack trace you supplied don't correspond to any recent version of POI. This leads me to believe that the bug has been fixed since whatever version of POI you are running. I have marked this bug as 'fixed', but please re-open if you can show the OOME on a more recent version of POI. For the record, here is the exception I get: java.lang.RuntimeException: Buffer underrun - requested 2 bytes but 1 was available at org.apache.poi.poifs.filesystem.DocumentInputStream.checkAvaliable(DocumentInputStream.java:202) at org.apache.poi.poifs.filesystem.DocumentInputStream.readUShort(DocumentInputStream.java:300) at org.apache.poi.poifs.filesystem.DocumentInputStream.readShort(DocumentInputStream.java:220) at org.apache.poi.hssf.record.RecordInputStream.readShort(RecordInputStream.java:234) at org.apache.poi.hssf.record.PrintSetupRecord.<init>(PrintSetupRecord.java:81) ... 10 more
Comment 2 David Fisher 2009-11-17 12:57:06 UTC
Could have been fixed with this one - https://issues.apache.org/bugzilla/show_bug.cgi?id=48085
Comment 3 Jacob Steijn 2009-11-17 21:36:01 UTC
Created attachment 24555 [details] Java module that calls hssf, to set context of failure.
Comment 4 Jacob Steijn 2009-11-17 21:45:31 UTC
Just re-reviewed the stack trace I just referenced and compared to the original stack trace. The line numbers are the same. That throws me, because I absolutely positively removed the POI 2.5 jar and replaced it with the POI 3.5 jar, both in my Eclipse installation (and referenced external jars) and in the target system library. At this time on the east coast I am mentally running on empty, but I've got another local resource who can give me a second set of eyes on the setup to see if he can spot something I've missed. I would think that rebuilding with the newer jar would at least give us new line numbers, if not a better result.
Comment 5 Jacob Steijn 2009-11-18 18:54:43 UTC
Issue occurs in POIN 2.5, the functions used are not present in 3.5. Time to update the code and test again