Bug 59307 - IOException while reading /xl/sharedStrings.xml from a XLSX
Summary: IOException while reading /xl/sharedStrings.xml from a XLSX
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-12 03:33 UTC by Vikram Gupta
Modified: 2016-05-20 18:57 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vikram Gupta 2016-04-12 03:33:31 UTC
I am using POI-3.10-FINAL-20140208 release. 

I have a agent server that picks the files from the file system and reads the files. Below is the code I am using.

OPCPackage opcPackage = OPCPackage.open(filename);
XSSFReader reader = new XSSFReader(opcPackage);

From the reader I get the workbook xml as InputStream and use custom SAX parser to get the sheet names and the corresponding RIDs

InputStream workbookData = reader.getWorkbookData();

Using the RIDs, I get individual sheet XMLs as InputStream objects and use custom SAX Parser to parse these. I set the sharedStringTable and styleTable from the reader to the custom parsers to be used during the parsing of the sheet data.

DefaultSheetParser sheetParser = new DefaultSheetParser(reader.getSharedStringsTable(), reader.getStylesTable());

InputStream sheet = reader.getSheet(relId);


All this is working fine, but all of a sudden I start to get "IOException - Can't obtain the input stream from /xl/sharedStrings.xml" at the reader.getSharedStringsTable().

The files open in Excel without any error. Most of the failing files are of size 400KB.

Once I re-start the agent server and reprocess the same files, there is no such error.
I checked the memory settings of the JVM, there is enough memory allocated (about 4GB) and I do not get any Out of Memory error.
Comment 1 Javen O'Neal 2016-04-12 03:53:40 UTC
Keeping too many file handles open, perhaps? This would be more likely to show up in a long-running server process.

Make sure you close your resource streams when you're done with them and you don't have too many open simultaneously. Once you've read through your code, skim through the POI classes that you're using to see if they leak any file handles/resources. Eclipse or other tools might make the process of finding leaked resources easier.
Comment 2 Javen O'Neal 2016-04-12 03:57:33 UTC
I think Hotspot, included in JDK, can show instantaneous resource usage (CPU, heap, permgen, and file handles) on running processes. Check that before you start looking for file handle leaks.
Comment 3 Nick Burch 2016-04-12 04:17:24 UTC
As well as ensuring you close your resources as Javen says, 3.10 is over 2 years old (clue is the date in the filename!), you might want to try 3.14, or better wait a few more days then try 3.15 beta 1
Comment 4 Vikram Gupta 2016-04-13 06:31:30 UTC
I am closing the workbook using the below code: -

opcPackage.close();

After I finish reading from the workbook.

Is there any other handle that needs to be closed?
Comment 5 Vikram Gupta 2016-04-13 06:38:12 UTC
I have updated the code to close all the InputStream handles once I have parsed the workbook and sheet InputStream using custom sax parsers.
Comment 6 Dominik Stadler 2016-05-20 18:57:58 UTC
Also on Unix you can look at the output of "ls /proc/<pid>/fd" with the pid of the server-process to see which files are actually currently open. This might give an indication of which part of your application is actually leaking file handles (if this is the actual problem here).

Anyway I don't see an actual problem in POI here for now. We have extensive tests which verify that file-handles are closed properly as long as the respective close() method is called. 

If there is still a problem then please update to a current version and retry. If it still dose not work then, then please reopen this bug with the list of open files at the time when the application fails.