Created attachment 22629 [details] Word document causing NullPointerException Attached document, scanning its parts with QuickTest, results in a java.lang.NullPointerException at org.apache.poi.hwpf.sprm.ParagraphSprmUncompressor.uncompressPAP(ParagraphSprmUncompressor.java:50) at org.apache.poi.hwpf.model.PAPX.getParagraphProperties(PAPX.java:135) at org.apache.poi.hwpf.usermodel.Range.getParagraph(Range.java:822) at org.apache.poi.hwpf.QuickTest.main(QuickTest.java:45) After some debugging, I found that 1) the istd value for problematic paragraph style is 10 2) in getParagraphProperties method, baseStyle is null, and then in uncompressPAP method, the null variable causing the exception is "parent" 3) During StyleSheet constructor execution (the istd of problematic style is 10), _parahraphDescriptions[10] is not null, but _parahraphDescriptions[10]getPap() returns null 4) in createPAP(10) (called by the second loop in constructor), "pap" and "papx" local variables are *both* null, then createPAP does not create the PAP for istd=10 A more weird thing is that, deleting or changing other document parts, for example removing other paragraphs, the crash disappears...
Created attachment 23783 [details] Patch that modifies stylesheets. During the initialization of "org.apache.poi.hwpf.model.StyleSheet" POI runs through all the style descriptions and adds them to the StyleDescription array in the createPAP method. If a description has an improperly set parent (ie it's parent is null) it still tries to run the ParagraphSprmUncompressor. Doing so attempts to clone the parent and throws a Null Pointer exception. The attached patch simply blocks the running of the uncompressor if the parent is null. I've tested this and it passes all tests and fixes the previously attached document as well as some of my own. If this fixed in deemed inappropriate, please do not fix with a Runtime error. The documents contents have still been loaded correctly and should still be useable in certain contexts.
Created attachment 24082 [details] Patches PAPX.java to avoid NPE when a paragraph's PAPX is based upon a character style I've encountered this error while decoding .doc files saved by OpenOffice Writer. Some paragraphs have a PAPX with an istd that is a character style, and PAPX.getParagraphProperties throws an NPE as a result. I've attached a patch to PAPX.java to work around this. I don't think Chris Walter's patch to StyleSheet.java is necessary.
I've just tried opening your document with POI on svn head, and it was loaded fine, and we could get the text without error. Looks like the bug was fixed at some point between when you reported this and today.