Bug 52420 - [PATCH] WordToHtmlConverter NullPointerException in compactChildNodesR method
Summary: [PATCH] WordToHtmlConverter NullPointerException in compactChildNodesR method
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.8-dev
Hardware: Other other
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Keywords: PatchAvailable
Depends on:
Reported: 2012-01-04 11:18 UTC by Sachin Gorade
Modified: 2016-06-18 05:55 UTC (History)
1 user (show)

With this file you can reproduce exception (32.00 KB, application/msword)
2013-11-14 12:05 UTC, Yanis
patch (458 bytes, patch)
2013-11-14 14:48 UTC, Yanis
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sachin Gorade 2012-01-04 11:18:11 UTC
While running Apache POI on android with simple application I found that AbstractWordUtils class throws NullPointerException in compactChildNodesR method.
This is due to -

 while ( child2.getChildNodes().getLength() > 0 )
  child1.appendChild( child2.getFirstChild() );
// following line causes NullPointerException
 child2.getParentNode().removeChild( child2 );

I think a check should be there before removing this child. After adding a check my simple application is able to convert doc files to html on Android platform.

Following is the code change that I have done -

 while ( child2.getChildNodes().getLength() > 0 )
  child1.appendChild( child2.getFirstChild() );
  child2.getParentNode().removeChild( child2 );
Comment 1 Sergey Vladimirov 2012-11-05 15:53:14 UTC

Could you please provide an example file, that produces an exception?

Comment 2 Yanis 2013-11-14 12:05:37 UTC
Created attachment 31043 [details]
With this file you can reproduce exception

With this file you can reproduce exception

11-14 13:25:53.108: WARN/System.err(8630): Caused by: java.lang.NullPointerException
11-14 13:25:53.108: WARN/System.err(8630): at org.apache.poi.hwpf.converter.AbstractWordUtils.compactChildNodesR(AbstractWordUtils.java:146)
11-14 13:25:53.108: WARN/System.err(8630): at org.apache.poi.hwpf.converter.WordToHtmlUtils.compactSpans(WordToHtmlUtils.java:238)
11-14 13:25:53.108: WARN/System.err(8630): at org.apache.poi.hwpf.converter.WordToHtmlConverter.processParagraph(WordToHtmlConverter.java:596)
11-14 13:25:53.108: WARN/System.err(8630): at org.apache.poi.hwpf.converter.AbstractWordConverter.processParagraphes(AbstractWordConverter.java:1113)
11-14 13:25:53.108: WARN/System.err(8630): at org.apache.poi.hwpf.converter.WordToHtmlConverter.processSingleSection(WordToHtmlConverter.java:617)
11-14 13:25:53.108: WARN/System.err(8630): at org.apache.poi.hwpf.converter.AbstractWordConverter.processDocument(AbstractWordConverter.java:722)

this code will solve this error

                child2.getParentNode().removeChild( child2 );

but converted html not contains all data from doc file (other bug?)
Comment 3 Yanis 2013-11-14 14:48:08 UTC
Created attachment 31044 [details]

Patch to fix this error.
Comment 4 Dominik Stadler 2015-01-02 22:40:39 UTC
I have tried to reproduce the issue that you reported, but couldn't, see r1649147 for the related test-case that I added.

Can you please retry this with the latest version of POI and if you still see the problem provide some more information, ideally via a self-contained unit-test?

Also I could not find any text not contained in the resulting document, which exact part was missing for you? Maybe this is fixed via some other changes in the meanttime...
Comment 5 Dominik Stadler 2016-06-18 05:55:10 UTC
Could not reproduce and no update for some time, therefore closing this as WORKSFORME.