Bug 52446 - BufferUnderlowException in NPropertyTable
Summary: BufferUnderlowException in NPropertyTable
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.8-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-10 15:18 UTC by Antoni Mylka
Modified: 2012-01-11 12:01 UTC (History)
0 users



Attachments
The patch with my workaround (1.04 KB, patch)
2012-01-10 15:18 UTC, Antoni Mylka
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Antoni Mylka 2012-01-10 15:18:17 UTC
Created attachment 28131 [details]
The patch with my workaround

I have a .doc file which is OK from the MSOffice POV

java.nio.BufferUnderflowException
 at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
 at org.apache.poi.poifs.property.NPropertyTable.buildProperties(NPropertyTable.java:93)
 at org.apache.poi.poifs.property.NPropertyTable.<init>(NPropertyTable.java:62)
 at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:379)
 at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:293)


The BFFValidator returns 

<BFFValidation 
  path="twenty-tips.doc" 
  datetime="01/10/12 16:07:25" 
  result="ERROR 0x80030109. Docfile zostal uszkodzony.  " 
  reason="The Microsoft Office Binary File Format Validator encountered an error reading the file you specified.">
</BFFValidation>

In English it's "Docfile has been corrupted". 

I came up with a workaround. In NPropertyTable.buildProperties, instead of 

data = new byte[bigBlockSize.getBigBlockSize()];

I would put:

int dataSize = bigBlockSize.getBigBlockSize() <= bb.remaining() ?
                bigBlockSize.getBigBlockSize() : bb.remaining();
             data = new byte[dataSize];

So get the big block size only if it's less than or equal to the number of remaining bytes. Otherwise, just get the remaining bytes.

The file is obviously corrupted, yet it opens up just fine in Word and I can get fulltext and metadata with the old POIFSFileSystem. This problem popped up in my regression tests, when I switched to NPOIFSFileSystem. It seems like a safe workaround to me. For correct files, it won't change anything, for other corrupted files it will probably move the error to somewhere within PropertyFactory.convertToProperties. For my file, it's the difference between life and death.

Unfortunately I can't share the file.
Comment 1 Nick Burch 2012-01-10 15:25:23 UTC
Are you able to give us a bit more info on the property stream that's misbehaving? 

I'd be interested in knowing:
 * How long is it, in bytes?
 * How many blocks is the property stream split over?
 * If you look at the bytes of the problem block, is it null padded?
Comment 2 Antoni Mylka 2012-01-11 10:40:17 UTC
I took a very close look in the debugger. POIFSViewer seems to work at a higher-level, where blocks are already combined into streams. I know nothing about the POI format, yet from what I understand it goes like this:

NPropertyTable is constructed with an iterator on byte buffers. Each byte buffer represents a single block. In this file the blocks are 512-bytes large. The NPropertyTable constructor goes through this stack trace twice:

ByteArrayBackedDataSource.read(int, long) line: 48	
NPOIFSFileSystem.getBlockAt(int) line: 420	
NPOIFSStream$StreamBlockByteBufferIterator.next() line: 213	
NPOIFSStream$StreamBlockByteBufferIterator.next() line: 1	
NPropertyTable.buildProperties(Iterator<ByteBuffer>, POIFSBigBlockSize) line: 84	

The first time getBlockAt is called with 946. When I look at offset 947*512=484864 within the file it contains four: UTF-16 strings like "Root Entry", "Data", "1Table", "WordDocument". AFAIU these are names of top-level directory entries. This block is parsed correctly by PropertyFactory.convertToProperties(data, properties);

Afterwards comes the second block, index 956. It also comes down to ByteArrayBackedDataSource.read(int, long) line: 48. Unfortunately the (957*512 + 512) exceeds the size of the file. The returned byte buffer is only 510 bytes large, hence the BufferUnderflowException. I don't know how many blocks should there be (there is BAT, but I don't understand it). What I know, is that this file has been truncated somewhere in the process.

When the second block is parsed, with 510 bytes, the PropertyFactory.convertToProperties begins with 

int property_count = data.length / POIFSConstants.PROPERTY_SIZE;

In my case this evaluates to 3. The last 126 bytes are not taken into account - hence no errors. The second block, when viewed in XVI shows UTF-16 strings "SummaryInformation", "DocumentSummaryInformation", and "\u0001CompObj" (the three "correct" properties). The fourth, truncated property contains only zeros:

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00

Therefore no information is lost. I think that my workaround is actually correct.
Comment 3 Nick Burch 2012-01-11 11:12:53 UTC
Just to check - is your file size a multiple of 512? (It's supposed to be, but based on what you're saying I think it might be 2 bytes short)
Comment 4 Antoni Mylka 2012-01-11 11:31:03 UTC
It's 490 494.

490494 div 512: 957
490494 mod 512: 126

It's 2 bytes short.
Comment 5 Nick Burch 2012-01-11 11:48:44 UTC
I think this should be fixed in r1229963. I've taken a slightly different approach, where we log the situation and pad the byte array with zeros (rather than passing a short byte array). Can you see if that solves it for your file, and close the bug if so?
Comment 6 Antoni Mylka 2012-01-11 12:01:50 UTC
Yup, works. 

Thanks a lot.