Bug 64197

Summary: AssertionError thrown when processing embedded EMF in doc file
Product: POI Reporter: Raman Gupta <rocketraman>
Component: POIFSAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: normal    
Priority: P2    
Version: 4.1.1-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: Same file with EMF that throws error

Description Raman Gupta 2020-03-04 16:30:45 UTC
When JVM assertions are enabled via the `-ea` flag, the attached file throws an AssertionError during parsing. When running via Tika 1.23, the stack trace is:

Exception in thread "main" java.lang.AssertionError
        at org.apache.poi.hemf.record.emfplus.HemfPlusRecordIterator._next(HemfPlusRecordIterator.java:84)
        at org.apache.poi.hemf.record.emfplus.HemfPlusRecordIterator.next(HemfPlusRecordIterator.java:55)
        at org.apache.poi.hemf.record.emfplus.HemfPlusRecordIterator.next(HemfPlusRecordIterator.java:26)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at org.apache.poi.hemf.record.emf.HemfComment$EmfCommentDataPlus.init(HemfComment.java:292)
        at org.apache.poi.hemf.record.emf.HemfComment$EmfCommentDataIterator._next(HemfComment.java:216)
        at org.apache.poi.hemf.record.emf.HemfComment$EmfCommentDataIterator.<init>(HemfComment.java:155)
        at org.apache.poi.hemf.record.emf.HemfComment$EmfComment.init(HemfComment.java:110)
        at org.apache.poi.hemf.record.emf.HemfRecordIterator._next(HemfRecordIterator.java:76)
        at org.apache.poi.hemf.record.emf.HemfRecordIterator.next(HemfRecordIterator.java:48)
        at org.apache.poi.hemf.record.emf.HemfRecordIterator.next(HemfRecordIterator.java:27)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at org.apache.poi.hemf.usermodel.HemfPicture.getRecords(HemfPicture.java:78)
        at org.apache.poi.hemf.usermodel.HemfPicture.iterator(HemfPicture.java:91)
        at org.apache.tika.parser.microsoft.EMFParser.parse(EMFParser.java:80)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
        at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
        at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:104)
        at org.apache.tika.extractor.EmbeddedDocumentUtil.parseEmbedded(EmbeddedDocumentUtil.java:220)
        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:133)
        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:107)
        at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:100)
        at org.apache.tika.parser.microsoft.WordExtractor.handlePictureCharacterRun(WordExtractor.java:561)
        at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:365)
        at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:187)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
        at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
Comment 1 Raman Gupta 2020-03-04 16:31:18 UTC
Created attachment 37061 [details]
Same file with EMF that throws error