Bug 43670 - Negative length of ChunkHeader while reading a VISIO file
Negative length of ChunkHeader while reading a VISIO file
Status: RESOLVED FIXED
Product: POI
Classification: Unclassified
Component: POI Overall
3.0-dev
PC other
: P2 normal with 4 votes (vote)
: ---
Assigned To: POI Developers List
:
: 44596 44781 (view as bug list)
Depends on:
Blocks: 44501
  Show dependency tree
 
Reported: 2007-10-21 23:59 UTC by Raiko Eckstein
Modified: 2010-06-03 07:15 UTC (History)
4 users (show)



Attachments
Problematic visio file (413.50 KB, application/vnd.visio)
2007-10-22 22:33 UTC, Raiko Eckstein
Details
The attachment that (205.00 KB, application/octet-stream)
2008-04-02 11:21 UTC, Durga Deep Tirunagari
Details
The attachment thats causing this error (205.00 KB, application/octet-stream)
2008-04-02 11:21 UTC, Durga Deep Tirunagari
Details
file that causes exception (116.00 KB, application/octet-stream)
2009-06-30 04:41 UTC, Maxim Valyanskiy
Details
Proposed patch (1.55 KB, patch)
2010-01-27 06:48 UTC, Jukka Zitting
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Raiko Eckstein 2007-10-21 23:59:26 UTC
When trying to extract text from a visio file I encounter a
java.lang.NegativeArraySizeException which is caused by a negative length of a
ChunkHeader. That seems to be caused by casting the result of
LittleEndian.getUInt(byte[],int) to int. 

Stacktrace:
java.lang.NegativeArraySizeException
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:155)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:54)
	at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:92)
	at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:99)
	at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:99)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:89)
	at
org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:44)
	at
org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:48)
	at
org.raikoeckstein.search.indexer.handlingtypes.visio.POIVisioHandler.getDocument(POIVisioHandler.java:35)
	at
test.org.forflow.search.indexer.handlingtypes.POIVisioHandlerTest.testGetDocument(POIVisioHandlerTest.java:55)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at
org.junit.internal.runners.TestMethodRunner.executeMethodBody(TestMethodRunner.java:99)
	at
org.junit.internal.runners.TestMethodRunner.runUnprotected(TestMethodRunner.java:81)
	at
org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:34)
	at org.junit.internal.runners.TestMethodRunner.runMethod(TestMethodRunner.java:75)
	at org.junit.internal.runners.TestMethodRunner.run(TestMethodRunner.java:45)
	at
org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(TestClassMethodsRunner.java:71)
	at
org.junit.internal.runners.TestClassMethodsRunner.run(TestClassMethodsRunner.java:35)
	at
org.junit.internal.runners.TestClassRunner$1.runUnprotected(TestClassRunner.java:42)
	at
org.junit.internal.runners.BeforeAndAfterRunner.runProtected(BeforeAndAfterRunner.java:34)
	at org.junit.internal.runners.TestClassRunner.run(TestClassRunner.java:52)
	at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:38)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
	at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
	at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
	at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
Comment 1 Nick Burch 2007-10-22 02:27:32 UTC
Can up upload the problem file? That way we can check to see if there are any
other issues, and we'll have a testcase to ensure we don't break this again once
it's fixed :)
Comment 2 Raiko Eckstein 2007-10-22 22:33:20 UTC
Created attachment 21022 [details]
Problematic visio file
Comment 3 Nick Burch 2007-10-28 13:08:50 UTC
Java Arrays need to be indexed by an int, not a long, so if the chunk length
really was that large we'd be stuff anyway.

Looking at the header for that chunk, all the values look really really large.
I'm not sure if the problem is that we're de-compressing the stream incorrectly
(it's in a compressed stream), or if we're getting the size of a previous chunk
wrong (so we wind on the wrong amount to get to this chunk)

It's going to need some more investigating, probably comparing lots of things
with vsdump, but that'll have to happen another time :/
Comment 4 Nick Burch 2008-03-13 06:21:54 UTC
*** Bug 44596 has been marked as a duplicate of this bug. ***
Comment 5 Durga Deep Tirunagari 2008-04-02 11:21:10 UTC
Created attachment 21769 [details]
The attachment that
Comment 6 Durga Deep Tirunagari 2008-04-02 11:21:34 UTC
Created attachment 21770 [details]
The attachment thats causing this error
Comment 7 Nick Burch 2008-04-07 08:21:52 UTC
This should now be fixed
Comment 8 Yury Batrakov 2008-05-15 04:39:39 UTC
still throws an exception
Comment 9 Brent Farrell 2009-01-12 13:11:17 UTC
I still get this bug with POI 3.1 and 3.2.
Comment 10 Maxim Valyanskiy 2009-06-30 04:40:02 UTC
Same problem in 3.5-beta7-20090630
Comment 11 Maxim Valyanskiy 2009-06-30 04:41:29 UTC
Created attachment 23910 [details]
file that causes exception
Comment 12 Maxim Valyanskiy 2009-06-30 04:43:11 UTC
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@15ee671
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:85)
	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:116)
	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:98)
	at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
	at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
Comment 13 sits 2009-08-31 18:40:46 UTC
I also still see this exact exception on my 4 .vsd files I put through 3.5 beta 6.

java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:98)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:59)

and something slightly different:

Needed 19 bytes to create the next chunk header, but only found 4 bytes, ignoring rest of data
Needed 19 bytes to create the next chunk header, but only found 4 bytes, ignoring rest of data
Needed 19 bytes to create the next chunk header, but only found 4 bytes, ignoring rest of data
Needed 19 bytes to create the next chunk header, but only found 4 bytes, ignoring rest of data

java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:98)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:59)
Comment 14 Trejkaz (pen name) 2009-08-31 19:48:11 UTC
That's a bugger.

Speaking of POI 3.5 beta 6, did it pass all the other tests?  I think the last one I tried was before that one.  Have been waiting for them to produce one which doesn't fail our existing tests so that we can at least upgrade to the beta and pass a few more which have been shelved for new features which are only in 3.5.
Comment 15 Trejkaz (pen name) 2009-08-31 19:48:51 UTC
Ack.  That comment wasn't supposed to be here... sorry about that. (and about this redundant one too.)
Comment 16 Mike Hays 2009-11-03 09:51:16 UTC
Is there an update on this issue?  Thanks.
Comment 17 Jukka Zitting 2010-01-26 09:55:50 UTC
I just tested this and can verify that the problem still exists in POI 3.6-FINAL.
The original NegativeArraySizeException is just replaced with an
IllegalArgumentException.
Comment 18 Jukka Zitting 2010-01-27 06:48:35 UTC
Created attachment 24895 [details]
Proposed patch

I inspected the troublesome files and the byte patterns in the chunk streams
seem to indicate that the parsing logic is not correctly detecting some separator
bytes.

The proposed patch adds the missing logic for all the misdiagnosed entries in
the attached example files, though it seems likely that there are also other
cases out there where the current logic would fail. Without better information
about the semantics of the chunk header fields it's hard to do anything better.
With this patch all the attached files get parsed without problems.

The patch also contains a change to the chunks_parse_cmds.tbl file for avoiding
incorrect parsing of a chunk in attachment 21770 [details]. The entry that I commented out
seemed vague in the first place, so I don't believe this change will cause (m)any
regressions.
Comment 19 Nick Burch 2010-01-28 04:21:55 UTC
Thanks for investigating this in detail Jukka

I've applied your patch for the v11 chunk header. As vsdump didn't have an issue with the short string on type 45 / format 52, I decided to just have a string length chunk, and treat those cases as an empty string

The result is that we can extract text without error from the files! :)
Comment 20 Nick Burch 2010-06-03 07:15:14 UTC
*** Bug 44781 has been marked as a duplicate of this bug. ***