Bug 49323 - LittleEndian throws ArrayIndexOutOfBoundle Exception
Summary: LittleEndian throws ArrayIndexOutOfBoundle Exception
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.7-dev
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-20 15:10 UTC by Zhang Zhang
Modified: 2010-05-26 18:51 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Zhang Zhang 2010-05-20 15:10:01 UTC
For methods in LittleEndian util class in org.apache.poi.util package, there is no border check of getXXXX methods, for example:
getInt(byte[] data, int offset) method, if the offset is less then 0 or greater then data.length-4, then there would be an ArrayIndexOutOfBound Exception happens.

I was planning to fix it with adding a check in the beginning of the method, but there is a test case named TestIntegerField are expecting the exception to happen, so that made me confused, what the expect behavior in this case? In my fix, I set the value to 0 in order to let the program run without an exception.

I would love to check in my fix with unit test, just not sure which one is the expected behavior? Exception or Return 0.


Zhang Zhang
Comment 1 Josh Micich 2010-05-20 19:44:02 UTC
From what I understand, you are proposing that the getXXXX() methods on LittleEndian should return 0 when an invalid offset is supplied.  While it's easy to imagine code that might depend on this behaviour, it is nonetheless very much against the coding style of POI.  POI tries to avoid 'silent errors'.  Specifying an invalid array index really shouldn't be standard usage for POI.
Comment 2 Zhang Zhang 2010-05-20 19:49:14 UTC
Got it, thanks.
Comment 3 Zhang Zhang 2010-05-25 18:37:55 UTC
You are right about the coding style, but in real scenario, we experiening un-expected interruption in long running process. The process is processing numbers of PPT files, one of them is crappy and has ArrayIndexOufOfBoundsException which is a runtime error and breaks the process.

So instead of reset the value to 0, can we throw a checked exception?


Thanks
Zhang Zhang
Comment 4 Nick Burch 2010-05-26 04:40:15 UTC
It really is an ArrayIndexOutOfBoundsException though. To process the file, we decide we need x bytes of data. There aren't x bytes available...

The fix is to identify the records which are short, verify that office can read these files without issue, and then patch POI to detect the short vs long forms of the records and handle those.

(This typically occurs when there is an older and newer format for a record, with the newer having extra fields, and POI only supporting the newer format as that's all we've ever seen)

Once you have identified a record which is allowed to be shorter than POI currently accepts, please open a new bug and upload the problem file + description of where the record processing is incorrect
Comment 5 Zhang Zhang 2010-05-26 12:49:09 UTC
The original file can be opened successfully by office 2007, but unfortunately, the file is belongs to my client and it is confidential, I haven't got authorization to submit it, also I can't re-produce it with a artifical office file.

Thanks for the help.


Thanks
JJ
Comment 6 Nick Burch 2010-05-26 13:01:50 UTC
You could always try using the HSLF dev tools to dump just the problem record? It'll need a bit of work, since it'll blow up on the broken record, but if you run it under a debugger you might be able to figure out just the bytes the correspond to that one record
Comment 7 Zhang Zhang 2010-05-26 18:51:41 UTC
(In reply to comment #6)
> You could always try using the HSLF dev tools to dump just the problem record?
> It'll need a bit of work, since it'll blow up on the broken record, but if you
> run it under a debugger you might be able to figure out just the bytes the
> correspond to that one record

Good point, I definitely will try that. Thanks