Created attachment 28001 [details] Multi-embedded test Word document This bug was introduced during changes submitted in revision 1190347. Bug was discovered using daily builds of TIKA and POI. Tika exposes the bug using a call to getMasterSheet() for an unused variable (bug submitted to Tika too for the unused variable.) Essentially the return types of the getMasterSheet() accidentally changed between revisions. Return type for getMasterSheet() changed to XSLFSlideLayout from XSLFSlideMaster. Patch changes the returned value back to waht it was prior, leaving the newly added @override specification. Patch file and example multi-embedded word document example used with a Tika based RecursiveMetadataParser included. Stack Trace: ERROR LogFaultActivity org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster; java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster; at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148) at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
Created attachment 28002 [details] Patch file with changes Patch for the XSLFSlide.java file
TIKA bug #795 opened for issue as well.
It was an intentional change. Master sheet of a slide is slide layout and master of a slide layout is slide master. To be clear, my change reflects the sheet hierarchy in the .pptx format: slide.xml <-- slideLayout.xml <-- slideMaster.xml The immediate fix on the Tika side is to use XSLFSlide.getSlideMaster() instead of XSLFSlide.getMasterSheet(). With this change everything should compile and run. Meanwhile, I'm going to rework Tika's XSLFPowerPointExtractorDecorator - most of the logic can be simplified and written in a much nicer form. Yegor (In reply to comment #2) > TIKA bug #795 opened for issue as well.
Sounds good, Thanks Yegor. Nick has already added the patch to TIKA-700 yesterday using that additional method, I missed it when I was looking at it yesterday. After seeing his patch I began to suspect that the change was done on-purpose and for a reason. I'll update the status to resolved and not it as an invalid bug. Thanks again... keep up the great work!!