Bug 44501 - ArrayIndexOutOfBoundsException when extracting text from Visio Files on Linux.
Summary: ArrayIndexOutOfBoundsException when extracting text from Visio Files on Linux.
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: HDGF (show other bugs)
Version: 3.5-dev
Hardware: PC Windows Vista
: P2 major with 2 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
: 44594 44687 44717 (view as bug list)
Depends on: 43670
Blocks:
  Show dependency tree
 
Reported: 2008-02-27 08:51 UTC by esimons
Modified: 2015-03-22 12:17 UTC (History)
4 users (show)



Attachments
zip file containing 3 Visio files which produced the error (154.19 KB, application/zip)
2008-02-27 08:51 UTC, esimons
Details
test file (21.50 KB, application/octet-stream)
2008-03-04 05:48 UTC, fathi.nemeur
Details
java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed (205.00 KB, application/octet-stream)
2008-04-09 11:12 UTC, Durga Deep Tirunagari
Details

Note You need to log in before you can comment on or make changes to this bug.
Description esimons 2008-02-27 08:51:11 UTC
Created attachment 21595 [details]
zip file containing 3 Visio files which produced the error

I'm working on Linux and trying to extract text from visio files with VisioTextExtractor. A sample of the code I've written to do this is below. 

protected static String extractVSD(String filename){
   try { 
     FileInputStream fin = new FileInputStream(filename);
     VisioTextExtractor extractor = new VisioTextExtractor(fin);
     .
     .
     .
}

When it goes to open a new VisioTextExtractor, I get the following error. This error occurs for every VSD file that I have tried. I'm using POI scratchpad version 3.0.2, but I've also tried version 3.0.1 and encountered the same error. I've attached a zip file containing 3 of the simple test files that produced the error.


Stacktrace:
java.lang.ArrayIndexOutOfBoundsException: 1991
        at org.apache.poi.util.LittleEndian.getNumber(LittleEndian.java:492)
        at org.apache.poi.util.LittleEndian.getUInt(LittleEndian.java:164)
        at org.apache.poi.hdgf.chunks.ChunkHeader.createChunkHeader(ChunkHeader.java:43)
        at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:108)
        at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:54)
        at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:92)
        at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:99)
        at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:99)
        at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:92)
        at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:46)
        at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:50)
        at com.lmco.atl.soser.collector.FileContentsUtil.extractVSD(FileContentsUtil.java:298)
        at com.lmco.atl.soser.collector.FileContentsUtil.main(FileContentsUtil.java:442)
Comment 1 fathi.nemeur 2008-02-28 08:01:13 UTC
I have the same problem with my visio files (even an empty one).
Comment 2 Nick Burch 2008-03-04 05:16:12 UTC
Out of interest, if you open one of these files up in visio, and do "save as", does the resulting file still have the same problem?

(It looks like there's more data in a chunk stream than there are chunks, so we're running out of data when creating)
Comment 3 fathi.nemeur 2008-03-04 05:48:49 UTC
Created attachment 21624 [details]
test file

I have the same problem with all my visio files.
Comment 4 fathi.nemeur 2008-03-04 05:49:41 UTC
I can't extract the content of my visio file. (even if I do "save as" before extracting).
I have joined my document and this is the trace of my Junit test.

java.lang.ArrayIndexOutOfBoundsException: 57
	at org.apache.poi.util.LittleEndian.getNumber(LittleEndian.java:492)
	at org.apache.poi.util.LittleEndian.getDouble(LittleEndian.java:220)
	at org.apache.poi.hdgf.chunks.Chunk.processCommands(Chunk.java:174)
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:171)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:54)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:92)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:92)
	at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:46)
	at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:50)
	at fr.sacem.amely.document.repository.core.extractor.office.VisioContentExtractor.getPlainText(VisioContentExtractor.java:56)
	at fr.sacem.amely.document.repository.core.extractor.ContentExtractorManager.getPlainText(ContentExtractorManager.java:117)
	at fr.sacem.amely.document.repository.core.extractor.ContentExtractorManagerTestCase.testExtract(ContentExtractorManagerTestCase.java:78)
Comment 5 Nick Burch 2008-03-04 07:05:55 UTC
Do you happen to know what version of visio produced the files?

(It's odd that you're both finding lots of files that trigger this, but none of mine ever have)

I've added your problem files to svn, so they're available for writing tests against
Comment 6 fathi.nemeur 2008-03-04 07:14:01 UTC
This is the version of visio , I use :
Microsoft Visio Professional 2002 SP-2(10.0.6865) (french version)

(Do you have embedded stencils in your visio document?)
Comment 7 esimons 2008-03-04 08:34:13 UTC
My Visio files were created with Microsoft Office Visio Professional 2007 (12.0.4518.1014) MSO (12.0.6017.5000)
Comment 8 fathi.nemeur 2008-03-06 08:40:43 UTC
Additional info have been provided
Comment 9 Nick Burch 2008-03-13 06:21:01 UTC
*** Bug 44594 has been marked as a duplicate of this bug. ***
Comment 10 Nick Burch 2008-03-19 06:35:18 UTC
I think before we spend a lot of time trying to work around these short chunks, we'll want to be sure we're correctly decoding them in the first place.

So, I'll put this bug on hold until we've fixed the decompression problem from bug #43670
Comment 11 Nick Burch 2008-03-27 04:58:54 UTC
*** Bug 44687 has been marked as a duplicate of this bug. ***
Comment 12 Durga Deep Tirunagari 2008-03-31 18:21:14 UTC
*** Bug 44717 has been marked as a duplicate of this bug. ***
Comment 13 Nick Burch 2008-04-07 08:21:28 UTC
This should now be fixed in svn trunk
Comment 14 Durga Deep Tirunagari 2008-04-09 11:12:00 UTC
Created attachment 21800 [details]
java.lang.IllegalArgumentException: Found a chunk with a negative length, which isn't allowed
Comment 15 fathi.nemeur 2008-05-06 05:01:59 UTC
I have tried with the new release "poi-bin-3.1-beta1-20080428".
I stil have the same problem.

Stacktrace: 

java.lang.ArrayIndexOutOfBoundsException: 57
	at org.apache.poi.util.LittleEndian.getNumber(LittleEndian.java:502)
	at org.apache.poi.util.LittleEndian.getDouble(LittleEndian.java:220)
	at org.apache.poi.hdgf.chunks.Chunk.processCommands(Chunk.java:174)
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:171)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:58)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:92)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:92)
	at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:47)
	at org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:51)
Comment 16 Liyu 2009-07-23 02:45:58 UTC
Bug still exists in POI 3.5 beta. Is there any idea about the error? I can do nothing if I can't create the HDGFDiagram object...
Comment 17 Jan 2011-09-28 14:26:45 UTC
We're seeing the same bug in Apache POI 3.8.0 beta 3.  Is there anyone who is able to fix this?
Comment 18 rendangpei 2011-12-07 09:15:57 UTC
the same in 3.8 beta 4.
stack trace:

java.lang.ArrayIndexOutOfBoundsException: Illegal offset 8 (String data is of length 8)
at org.apache.poi.util.StingUtil.getFromUnicodeLE(StringUtil.java:70)
at org.apache.hdgf.chunks.Chunk.processCommands(Chunk.java:203)
...
Comment 19 Christian Czech 2012-06-21 09:52:05 UTC
the same bug with Apache POI 3.8-20120326

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 57
	at org.apache.poi.util.LittleEndian.getLong(LittleEndian.java:191)
	at org.apache.poi.util.LittleEndian.getDouble(LittleEndian.java:104)
	at org.apache.poi.hdgf.chunks.Chunk.processCommands(Chunk.java:175)
	at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:180)
	at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
	at org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:106)
	at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:70)
	at VsdTextExtractor.<init>(VsdTextExtractor.java:56)
	at VsdTextExtractor.<init>(VsdTextExtractor.java:53)
	at VsdTextExtractor.<init>(VsdTextExtractor.java:66)
	at TestVsdTextExtractor.test(TestVsdTextExtractor.java:13)
	at TestVsdTextExtractor.main(TestVsdTextExtractor.java:6)
Comment 20 Mark 2012-08-28 15:22:28 UTC
Has there been any update on this.  In 3.8 I get the same error with some simple visio documents.
Comment 21 Dominik Stadler 2015-03-22 12:17:55 UTC
I tried all of the files provided here and all could be read successfully without any exception. Therefore I am closing this Bug now, if you still see this I think it would be best to report a new bug entry with a sample file and code to reproduce the problem, preferably as junit test.