Bug 44898

Summary: Regression: Cannot remove block
Product: POI Reporter: Trejkaz (pen name) <trejkaz>
Component: POIFSAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: major CC: sackley
Priority: P2    
Version: 3.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows Vista   
Attachments: Test files (zipped)
Proposed fix
Proposed fix v2

Description Trejkaz (pen name) 2008-04-28 16:02:33 UTC
A couple of files in our own test suite which used to parse fine under 3.0.1 now fail under 3.0.2 with the following error:

java.io.IOException: Cannot remove block[ 2545 ]; out of range
	at org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:104)
	at org.apache.poi.poifs.storage.BlockAllocationTableReader.fetchBlocks(BlockAllocationTableReader.java:190)
	at org.apache.poi.poifs.storage.BlockListImpl.fetchBlocks(BlockListImpl.java:129)
	at org.apache.poi.poifs.filesystem.POIFSFileSystem.processProperties(POIFSFileSystem.java:448)
	at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:109)
Comment 1 Trejkaz (pen name) 2008-04-28 16:14:31 UTC
Created attachment 21876 [details]
Test files (zipped)

Attaching test files in a zip (was too big to upload raw.)
Comment 2 Josh Micich 2008-04-29 11:04:14 UTC
Looks similar to bug 28239.  Is this a POIFS block size problem?
Comment 3 Trejkaz (pen name) 2008-04-29 17:15:15 UTC
I saw that other bug, but I thought it might be a different issue as that bug has been around since 2004, yet this one only started in version 3.0.2.

Did something about the way blocks were loaded change in 3.0.2?  I've had a bit of a look myself by diffing the poifs package between 3.0.1 and 3.0.2, and the only change I can see which might cause the issue is:

public RawDataBlock(final InputStream stream, int blockSize)

+               // IOUtils.readFully will always read the
+               //  requested number of bytes, unless it hits
+               //  an EOF
+            _eof = true;

This is in the code which logs a warning instead of throwing an error, if the final block isn't a multiple of 512... I do recall that being fixed in 3.0.2 but it's possible the way it was fixed causes an error in some files.
Comment 4 Trejkaz (pen name) 2008-04-29 17:29:03 UTC
Created attachment 21882 [details]
Proposed fix

It is as I thought, removing that one line makes all our files parse correctly.

...I put it on 28239 too, purely by accident.  I don't know if it fixes that bug yet as I haven't tested it.
Comment 5 Trejkaz (pen name) 2008-04-29 17:41:11 UTC
Before and after my fix, TestRawDataBlock.testShortConstructor is failing.  It seems to be expecting an IOException, which has been removed since 3.0.2.

Seems someone "fixed" testEmptyConstructor by removing the fail() call but not testShortConstructor.  But perhaps catching IOException in the unit test is also a bad idea?
Comment 6 Trejkaz (pen name) 2008-04-29 17:58:33 UTC
Created attachment 21883 [details]
Proposed fix v2

Here's one which fixes the failures in TestRawDataBlock and TestRawDataBlockList.
Comment 7 Nick Burch 2008-05-06 08:29:37 UTC
I don't like the idea of just hiding the EOF

Maybe the right solution is that if we have a short last block, we detect this, then pad it out to the required 512 byte multiple?
Comment 8 Trejkaz (pen name) 2008-05-06 16:01:24 UTC
It isn't really "hiding" the EOF, it's simply moving it to where the file is actually EOF.

Previously it would be:
  * 512 byte block
  * ...
  * 512 byte block
  * short block (claims to be EOF)

Whatever reads it from the outside said "oh, EOF... so discard the entire block."

Now it's:
  * 512 byte block
  * ...
  * 512 byte block
  * short block
  * EOF

I'm not really sure what's wrong with representing it how it really is.  But if you want to pad the file, go ahead... not padding it doesn't seem to have any drawback either since if it's a short block then the document entry for it won't result in reading past the end of the data which is actually there.
Comment 9 Nick Burch 2008-05-20 08:42:31 UTC
Fixed in SVN trunk.

We now have a difference between "stream EoF reached" and "I have data", so things work correctly now
Comment 10 Nick Burch 2008-05-20 08:43:09 UTC
*** Bug 28239 has been marked as a duplicate of this bug. ***
Comment 11 Josh Micich 2009-01-29 11:41:50 UTC
(In reply to comment #9)
> Fixed in SVN trunk.


svn r653945 added a unit test and improved the error message. This change made it into 3.1-beta2.

svn r658285 completed the fix for this bug and was available in 3.1-final