Bug 60484 - NullPointerException at org.apache.poi.hwpf.usermodel.Picture.getRawContent
Summary: NullPointerException at org.apache.poi.hwpf.usermodel.Picture.getRawContent
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.15-FINAL
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-15 19:26 UTC by Jorge Spinsanti
Modified: 2017-01-03 04:39 UTC (History)
0 users



Attachments
NPE to reproduce the bug (187.74 KB, application/msword)
2017-01-02 19:51 UTC, Jorge Spinsanti
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jorge Spinsanti 2016-12-15 19:26:06 UTC
I got the following stacktrace when I used Tika in a DOCX->TXT conversion:

Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@3507568d
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	... 16 more
Caused by: java.lang.NullPointerException
	at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:422)
	at org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:131)
	at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:286)
	at org.apache.tika.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:609)
	at org.apache.tika.parser.microsoft.WordExtractor.handleSpecialCharacterRuns(WordExtractor.java:517)
	at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:346)
	at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:273)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:179)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:169)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:130)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	... 22 more
Comment 1 Dominik Stadler 2017-01-02 19:40:13 UTC
Can you provide a sample-file so we can reproduce the error and ensure it stays fixed via a unit-test?
Comment 2 Jorge Spinsanti 2017-01-02 19:51:54 UTC
Created attachment 34575 [details]
NPE to reproduce the bug
Comment 3 Javen O'Neal 2017-01-03 04:39:08 UTC
Thanks for attaching a problematic document.

I was able to reproduce the NPE with your attachment by adding the doc file to test-data/document and using TestWordToConverterSuite.java.
Since this is just an NPE and it looks like the submitted file may have some personal information in it, I don't think we need to include this file in our unit test corpus.

> Testcase: testHtml[88: /home/onealj/Downloads/poi/trunk/test-data/document/../document/60484.doc] took 0.063 sec
>         Caused an ERROR
> null
> java.lang.NullPointerException
>   at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:424)
>   at org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:133)
>   at org.apache.poi.hwpf.usermodel.Picture.<init>(Picture.java:124)
>   at org.apache.poi.hwpf.model.PicturesTable.extractPicture(PicturesTable.java:162)
>   at org.apache.poi.hwpf.converter.AbstractWordConverter.processCharacters(AbstractWordConverter.java:489)
>   at org.apache.poi.hwpf.converter.AbstractWordConverter.processField(AbstractWordConverter.java:906)
>   at org.apache.poi.hwpf.converter.AbstractWordConverter.processCharacters(AbstractWordConverter.java:430)
>   at org.apache.poi.hwpf.converter.WordToHtmlConverter.processParagraph(WordToHtmlConverter.java:576)
>   at org.apache.poi.hwpf.converter.AbstractWordConverter.processParagraphes(AbstractWordConverter.java:1113)
>   at org.apache.poi.hwpf.converter.WordToHtmlConverter.processTable(WordToHtmlConverter.java:683)
>   at org.apache.poi.hwpf.converter.AbstractWordConverter.processParagraphes(AbstractWordConverter.java:1075)
>   at org.apache.poi.hwpf.converter.WordToHtmlConverter.processSingleSection(WordToHtmlConverter.java:608)
>   at org.apache.poi.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:721)
>   at org.apache.poi.hwpf.converter.TestWordToConverterSuite.testHtml(TestWordToConverterSuite.java:112)

same error for testText and testFO.

NPE caused by getBlipRecord returning null. [1]
> ( (EscherBSERecord) escherRecord ).getBlipRecord().getPicturedata();

I changed this in r1777063 to return byte[0] if getBlipRecord() returns null.

[1] https://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/usermodel/Picture.java?revision=1751007&view=markup#l402