Bug 55124 - java.io.IOException: block[ 2 ] already removed
Summary: java.io.IOException: block[ 2 ] already removed
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.9-FINAL
Hardware: PC Linux
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-20 18:44 UTC by doubiman
Modified: 2013-06-26 13:22 UTC (History)
0 users



Attachments
File giving the cited error. (47.00 KB, application/vnd.ms-excel)
2013-06-20 18:44 UTC, doubiman
Details

Note You need to log in before you can comment on or make changes to this bug.
Description doubiman 2013-06-20 18:44:13 UTC
Created attachment 30466 [details]
File giving the cited error.

Hello,

I'm getting the following error when trying to open files from a particular client:

java.io.IOException: block[ 2 ] already removed - does your POIFS have circular or duplicate block references?
        at org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:89)
        at org.apache.poi.poifs.storage.RawDataBlockList.remove(RawDataBlockList.java:34)
        at org.apache.poi.poifs.storage.BlockAllocationTableReader.fetchBlocks(BlockAllocationTableReader.java:221)
        at org.apache.poi.poifs.storage.BlockListImpl.fetchBlocks(BlockListImpl.java:123)
        at org.apache.poi.poifs.storage.RawDataBlockList.fetchBlocks(RawDataBlockList.java:34)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.processProperties(POIFSFileSystem.java:528)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:163)
        at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:322)
        at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:303)
        at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:70)
        at excelmunger.XLSParser.parse(XLSParser.java:37)
        at excelmunger.Main.main(Main.java:34)

I'm aware this problem has reared its ugly head before:

#42941 was a non-bug

#45290 was meant to have been fixed in v3.2 back in 2008

#46904 was fixed in 2009, despite being for an old, unsupported file format.

#52915 has NEEDEDINFO from the requestor for over a year and should probably be closed, and in any case on the face of it seems to be related to network issues.


I've seen some other supposed fixed for this issue around the web:

"Move the file to be parsed into the same dir as the .jar doing the work": http://mail-archives.apache.org/mod_mbox/poi-user/201011.mbox/%3C1289246829453-3255661.post@n5.nabble.com%3E . I didn't even want to try this as it seemed so cargo-cultish, but did anyway, and it didn't work.

"Switch from POIFSFileSystem to NPOIFSFileSystem": http://stackoverflow.com/questions/13689843/jxl-poi-incompatibility .
This sounds promising, but the setup of the POIFSFileSystem object seems to happen sufficiently far down inside the POI call chain that I'm reticent to go poking about blindly in that way.

I got back to the client and found out that the files they're sending are generated with PHPExcel, version 2012-05-19 (v1.7.7).

We were able to parse these spreadsheets with Perl's Spreadsheet::ParseExcel, but tried to standardise on using POI for efficiency. I didn't want to suddenly refuse to accept them any more without being able to tell the customer exactly what it is that's wrong, so we've fallen back to Ss::PE in the mean time.

Not knowing what's wrong, I don't know if a workaround is appropriate in this case.

If you could at least tell me in some detail what the problem is though, I'd be happy to go raise the issue with the PHPExcel project.

I've attached as minimal as an example as I can lay my hands on.

Best regards,

--doubi
Comment 1 Nick Burch 2013-06-26 13:22:39 UTC
WorkbookFactory will take a NPOIFSFileSystem just as easily as a POIFSFileSystem, so I'd suggest you just switch your code to using the former instead of the latter. With NPOIFSFileSystem I can open your file without issues.

The POIFSFileSystem has one or two very hard baked assumptions about the file format that are almost, but not quite always correct. I think your file has hit one of those. NPOIFSFileSystem takes a slightly different approach, which avoids this class of problem, and also has the advantage of needing less memory. 

The plan was to switch everything to NPOIFSFileSystem one release after write support was finished in it, but there hasn't been any drive/funding/need of late to complete the write support...