Bug 35928 - [PATCH] POIFS hardcodes big-block size to 512
Summary: [PATCH] POIFS hardcodes big-block size to 512
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: unspecified
Hardware: Other other
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-29 15:10 UTC by nutello
Modified: 2010-05-03 13:21 UTC (History)
2 users (show)



Attachments
Patch for current SVN tree; checks the first few bytes of a POIFS file to see if the big-block size is 512 or 4096. (18.29 KB, patch)
2006-09-16 18:08 UTC, linkert
Details | Diff
docfile with incorrect(?) block size (31.12 KB, application/msword)
2008-03-13 03:46 UTC, Yury Batrakov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description nutello 2005-07-29 15:10:33 UTC
I have come across some files generated by scientific instruments whose
big-block size is not 512, but rather 4096. The power-of-two (12) is properly
stored in the header, but POIFS ignores that entirely, resorting to a built-in
constant. I'd post some files, but they average 260MB each.

I can help develop/test this, but I'll probably need some guidance first.
Comment 1 Andy Oliver 2005-07-29 15:39:14 UTC
Please write the dev list.  I'm unsure of this one because IIRC XLS files have a
default block size of 4096 (for smaller files)...  It could be we said "heck
with it" if it liked the smaller one too.
Comment 2 Andy Oliver 2005-07-29 15:46:18 UTC
Marc confirmed this. . He needs a file though.  Any way to generate a smaller
file?  if not then if you have ssh (and scp in particular), email me and I can
give you a spot to upload it to. (please tar/bz2 or tar/gz it first :-) )
Comment 3 linkert 2006-09-16 18:08:20 UTC
Created attachment 18875 [details]
Patch for current SVN tree; checks the first few bytes of a POIFS file to see if the big-block size is 512 or 4096.
Comment 4 Nick Burch 2008-01-09 01:40:17 UTC
The patch doesn't look to be threadsafe to me

If we had two files open, one with a 512 blocksize, and another with a 4096
blocksize, then I think it'd fail, as it's all using a single static int on
POIFSFileSystem

I think before we could apply this, we would need a sample file with the
alternate block size (so we can write a unit test for all this), and the patch
would need to be slightly re-worked to be threadsafe (i.e. not use a static for
something that can vary between concurrantly open files).
Comment 5 Yury Batrakov 2008-03-13 03:39:01 UTC
> I think before we could apply this, we would need a sample file with the alternate block size

Maybe attached file could help: WordExtractor fails to decode it with:
java.io.IOException: Unable to read entire block; 122 bytes read before EOF; expected 512 bytes
Comment 6 Yury Batrakov 2008-03-13 03:46:14 UTC
Created attachment 21663 [details]
docfile with incorrect(?) block size
Comment 7 Nick Burch 2008-03-13 06:23:18 UTC
Word files should always be 512 byte blocks, so I think attachment 21663 [details] isn't quite appropriate for this bug - it's probably just a truncated file
Comment 8 Yury Batrakov 2008-03-17 05:13:57 UTC
but it's being opened OK in word 2003 :)
Comment 9 Nick Burch 2009-05-17 11:54:49 UTC
There has been partial, thread safe support for this in svn for a while now

However, without a file with a 4096 block size, we can't test that this works properly

If you do have a file with 4096 blocks, please do re-open the bug and upload it, then we can write a unit test for it. Alas all the files we can find (word, powerpoint, excel, visio etc) are all 512 byte blocks.
Comment 10 Nick Burch 2010-05-03 13:21:33 UTC
This has now been properly solved, along with sample files for unit tests, see bug #49139