Bug 28231 - POIFS throws an IOException when trying to open a file whose size is not a multiple of 512
Summary: POIFS throws an IOException when trying to open a file whose size is not a mu...
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: unspecified
Hardware: All Linux
: P1 major with 2 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2004-04-06 15:00 UTC by Ryan Ackley
Modified: 2008-03-07 15:42 UTC (History)
2 users (show)

Workaround for this issue (1.58 KB, patch)
2006-03-21 23:16 UTC, Trejkaz (pen name)
Details | Diff
Better workaround :) (1.65 KB, patch)
2006-03-22 03:03 UTC, Trejkaz (pen name)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Ackley 2004-04-06 15:00:03 UTC
This is a big problem because the creating applications (Word, Excel) will open 
these files with no problems. I get the following stacktrace:
java.io.IOException: Unable to read entire block; 1 byte read; expected 512 

	at org.apache.poi.poifs.storage.RawDataBlock.<init>

	at org.apache.poi.poifs.storage.RawDataBlockList.<init>

	at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>

	at td.plugin.thumbnail.OleThumbnailRenderer.createRenderedImage

	at td.plugin.thumbnail.OleThumbnailRenderer.main
Comment 1 Trejkaz (pen name) 2006-03-21 23:14:47 UTC
I get this a lot, particularly on Microsoft Works documents.
Comment 2 Trejkaz (pen name) 2006-03-21 23:16:52 UTC
Created attachment 17933 [details]
Workaround for this issue

Here's how I worked around this issue.

However, you'll probably find that the file has been truncated, so an error
will come from somewhere else once you avoid this one.
Comment 3 Trejkaz (pen name) 2006-03-22 03:03:05 UTC
Created attachment 17934 [details]
Better workaround :)
Comment 4 Trejkaz (pen name) 2006-11-23 20:14:19 UTC
Any chance of getting this one committed?  Is there something else required?
Comment 5 Paul King 2007-07-29 20:23:38 UTC
Possibly the same as 42834. Similar issues, similar but different patch.
Comment 6 Scott 2008-03-05 14:29:01 UTC
I am new to POI and I get this error with the latest version 3.0.2. I only have tried this with Word 2007 documents since that is what I am concerned with. I see this was bug back from 2005. Has anyone developed  a real solution to this problem yet? I would send some sample docs but I am on a secure network so I can't.

Comment 7 Trejkaz (pen name) 2008-03-05 14:31:28 UTC
Does the patch not work for you?
Comment 8 Scott 2008-03-05 14:37:07 UTC
I assumed a patch from 2006 would be included in the latest release. So are you saying it is not? I did look at the source yet...
Comment 9 Trejkaz (pen name) 2008-03-05 14:48:41 UTC
A rule of thumb is that if a bug is still NEW, the patch hasn't been put in.  Otherwise it would be RESOLVED FIXED.

Certainly this patch works for us.  Why it hasn't been committed is anyone's guess.  Probably someone wants to keep throwing the error because the document is "technically" invalid OLE2.  Unfortunately Microsoft's own software generates files with this problem when the last block isn't completely used...
Comment 10 Paul King 2008-03-05 17:43:47 UTC
Yes, I supplied a similar patch to issue 42834 which was never supplied "because it didn't follow OLE2 rules" but documents produced from crystal reports and Microsoft themselves and other vendors don't follow these rules. We manually patched POI ourselves and have successfully been using the patched version for two years with these kinds of files. Sure would be nice for the official version not to be broken though rather than us having to keep patching it.
Comment 11 Nick Burch 2008-03-06 07:52:51 UTC
OK, looks like the changes for 42834 haven't fixed this

I've updated svn trunk to issue an error about a probably truncated file, but still carry on. Affected people may wish to open bug reports with the people who make the software that produces these files, as OLE2 documents are required to be multiples of 512 bytes...
Comment 12 Trejkaz (pen name) 2008-03-06 12:52:51 UTC
Microsoft don't even have a bug tracking system, and filing a bug won't solve all the existing files anyway.  But thanks for getting this in, it's one less custom patch for our local branch.

It's true that they're supposed to be multiples of 512 bytes, but if the last block in the file contains the end of a stream (happens fairly often) and the stream itself isn't a multiple of 512 bytes (also happens fairly often), sometimes they just don't pad it out to the 512-byte boundary.  But since the stream length is declared elsewhere in the file and all the bytes in the stream are present, it seems silly to even warn about it possibly being truncated as none of the real data has been lost, only the padding.

If a stream itself is truncated (declared to be longer than the amount of data available in its blocks) or if a block declared in the file isn't actually present... that sort of thing is a critical error still, of course.
Comment 13 Scott 2008-03-06 16:58:27 UTC
Guys thanks for readdressing this. I didn't have time to try it today with the patch but I am sure there will be lots of people very interested in using this product which they assumed didn't work for MS documents. I am not sure why we generate hundreds of documents which all fail but they do. 

Is it possible to use the poibrowser to identify what the block size is? 512 or other? I believe the poibrowser reads in these documents fine.
Comment 14 Nick Burch 2008-03-07 02:20:26 UTC
I'm not sure how our current block parsing code would know that we're on the last block, and that it's due to be short, so not warn if there's no padding

Do please submit a patch if you can think of a nice way to tell the block code not to issue a warning :)
Comment 15 Paul King 2008-03-07 02:42:29 UTC
Thanks for fixing this. Great news. Personally, I would have just set the log level to 'warning' and said 'possibly' truncated rather than 'probably' truncated as we have thousands of files the aren't truncated that will trigger this log message every day and I would be surprised if we have ever more than 1 or 2 files actually truncated in 2 years (if we have ever had any - I certainly don't recall any).
Comment 16 Scott 2008-03-07 15:42:53 UTC
FYI, I pulled down your patch, rebuilt the code and as expected it worked like a charmed. Thanks!