Created attachment 29666 [details] Java test to generate exception When attached testing example is executed against the attached document, it generates the exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.poi.util.LittleEndian.getByteArray(LittleEndian.java:72) at org.apache.poi.hpsf.UnicodeString.<init>(UnicodeString.java:44) at org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:162) at org.apache.poi.hpsf.Vector.read(Vector.java:74) at org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:218) at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:163) at org.apache.poi.hpsf.Property.<init>(Property.java:164) at org.apache.poi.hpsf.Section.<init>(Section.java:277) at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451) at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:246) at org.alfresco.sample.TestPoi.main(TestPoi.java:46) Information: Test document was generated usin APSOSE: http://www.aspose.com/ An anlisys of the document content and format can be found here: https://issues.alfresco.com/jira/browse/ALF-16896 The questions are? "It appears that the length is little endian but in this file it always starts on a 4 byte boundary. I don't know if that is what should happen or if this is an error in the file. However as a result I have been able to work out a patch (UnicodeString.java.patch attached) which when applied to our POI works for both this file and existing files."
Created attachment 29667 [details] Document used to generate the exception Document used to generate the exception. Generate using ASPOSE
Created attachment 29668 [details] Proposed patch
The attached UnicodeString.java.patch allows POI to recover from the type of error found in the file generated by http://www.aspose.com The file specifies an offset to a UnicodeString parameter, which is out by 2 bytes. The real offset starts on a 4 byte boundary. The patch works by checking the offsets provided to make sure the UnicodeString appears valid. The original code checked the UnicodeString ends in a NULL character, AFTER it had copied the string into a new byte[]. The patch does this check BEFORE the copy avoiding the creation of a very large byte[] followed by an ArrayIndexOutOfBoundsException. As a result it is able to also check if changing the offset to a 4 byte boundary would solve the problem.
It needs some work. At least one unit test started to fail after I applied your patch: org.apache.poi.hpsf.IllegalPropertySetDataException: UnicodeString started at offset #68 is not NULL-terminated at org.apache.poi.hpsf.UnicodeString.<init>(UnicodeString.java:48) at org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:162) at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:166) at org.apache.poi.hpsf.Property.<init>(Property.java:164) at org.apache.poi.hpsf.Section.<init>(Section.java:277) at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451) at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:246) at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:59) at org.apache.poi.POIDocument.getPropertySet(POIDocument.java:165) at org.apache.poi.POIDocument.readProperties(POIDocument.java:126) at org.apache.poi.POIDocument.getSummaryInformation(POIDocument.java:93) at org.apache.poi.TestPOIDocumentMain.testCreateNewPropertiesOnExistingFile(TestPOIDocumentMain.java:161) Please run the "test" ant target and ensure it completes OK. Yegor
Alan and/or Philippe - any luck on a version of the patch that doesn't break the unit tests?
I've had a go at fixing this in r1496675. All the POI tests pass with my fix, and the code is hopefully a little easier to follow than in the original patch. I've added a unit test based on the sample file supplied, which shows we can now read the metadata without error This has just missed out on being in poi 3.10 beta 1 though, so I guess we're stuck with a patched copy of POI in Alfresco for a little bit longer :/ If it helps, I can raise a new Alfresco support ticket, and/or buy a round in the Bear...!