Bug 51319 - ArrayIndexOutOfBoundsException when trying to use PublisherTextExtraction on a MS Publisher 2010 file
Summary: ArrayIndexOutOfBoundsException when trying to use PublisherTextExtraction on ...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HPBF (show other bugs)
Version: 3.2-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-03 17:51 UTC by Dmitry Goldenberg
Modified: 2015-03-22 19:29 UTC (History)
0 users



Attachments
File to repro with (213.00 KB, application/vnd.ms-publisher)
2011-06-03 17:51 UTC, Dmitry Goldenberg
Details
Sample.pub saved-as with Publisher 2010 (71.00 KB, application/vnd.ms-publisher)
2011-06-03 23:33 UTC, Dmitry Goldenberg
Details
Sample2.pub saved-as with Publisher 2010 (71.50 KB, application/vnd.ms-publisher)
2011-06-03 23:34 UTC, Dmitry Goldenberg
Details
Sample3.pub saved-as with Publisher 2010. (71.00 KB, application/vnd.ms-publisher)
2011-06-03 23:34 UTC, Dmitry Goldenberg
Details
Sample4.pub saved-as with Publisher 2010. (71.00 KB, application/vnd.ms-publisher)
2011-06-03 23:35 UTC, Dmitry Goldenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Goldenberg 2011-06-03 17:51:45 UTC
Created attachment 27105 [details]
File to repro with

Error was

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 70
	at org.apache.poi.util.LittleEndian.getUShort(LittleEndian.java:59)
	at org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.<init>(QCPLCBit.java:214)
	at org.apache.poi.hpbf.model.qcbits.QCPLCBit$Type12.<init>(QCPLCBit.java:182)
	at org.apache.poi.hpbf.model.qcbits.QCPLCBit.createQCPLCBit(QCPLCBit.java:89)
	at org.apache.poi.hpbf.model.QuillContents.<init>(QuillContents.java:69)
	at org.apache.poi.hpbf.HPBFDocument.<init>(HPBFDocument.java:63)
	at org.apache.poi.hpbf.HPBFDocument.<init>(HPBFDocument.java:47)
	at org.apache.poi.hpbf.extractor.PublisherTextExtractor.<init>(PublisherTextExtractor.java:48)
	at org.apache.poi.hpbf.extractor.PublisherTextExtractor.<init>(PublisherTextExtractor.java:51)
	at org.apache.poi.hpbf.extractor.PublisherTextExtractor.<init>(PublisherTextExtractor.java:56)
	at org.apache.poi.hpbf.extractor.PublisherTextExtractor.main(PublisherTextExtractor.java:136)
Comment 1 Nick Burch 2011-06-03 19:40:09 UTC
I don't have a copy of publisher 2010, but I suspect you may do?

If so, could you try opening all the Sample files (Sample.pub -> Sample4.pub) from test-data/publisher and saving them as 2010 files? They're all quite simple files, where we know what the contents are, and what the older files look like. By comparing them we may be able to spot what's different. Plus, they're smaller to work with when debugging which is handy!

FYI When we do have the files, using POIFSViewer may help spot the changes
Comment 2 Dmitry Goldenberg 2011-06-03 23:33:30 UTC
Created attachment 27108 [details]
Sample.pub saved-as with Publisher 2010
Comment 3 Dmitry Goldenberg 2011-06-03 23:34:11 UTC
Created attachment 27109 [details]
Sample2.pub saved-as with Publisher 2010
Comment 4 Dmitry Goldenberg 2011-06-03 23:34:45 UTC
Created attachment 27110 [details]
Sample3.pub saved-as with Publisher 2010.
Comment 5 Dmitry Goldenberg 2011-06-03 23:35:22 UTC
Created attachment 27111 [details]
Sample4.pub saved-as with Publisher 2010.
Comment 6 Dmitry Goldenberg 2011-06-03 23:40:27 UTC
Nick,

Attached all 4 files saved-as. They don't seem to be much different size-wise - not sure about the rest.

Sample2_2010.pub worked for me.

Also, another pub doc created purely with Publisher 2010 works. But that first one Publisher_2010.pub causes AIOOBE in LittleEndian.getUShort. It must be something very specific we hit, may not be a general 2010 issue...
Comment 7 Dominik Stadler 2015-03-22 19:29:09 UTC
All the samples now seem to work fine, therefore I am closing this. If you still see this then please test with the latest version 3.12-beta1 or newer and report new issues for anything still failing.