Created attachment 34555 [details] embedded extractor - changes not related to common ss Find attached an extractor for various embeddings of excel files. This is based on the work for [1] and [2]. Apart of evaluating the ClassIDs of Ole10Native objects, this also finds PDFs hidden in EMFs, which seems to be some specialty of Mac Excel 2011. I'm not sure if the extraction part in org.apache.poi.ss.extractor.EmbeddedExtractor should be part of POI or maybe Tika - but for other type of extraction helper we didn't make this destinction too. The code depends on changes to Common SS which I document in a separate issue, but need to commit it together. I'll commit the code on the 30.12.2016, if no-one objects earlier ... [1] http://stackoverflow.com/questions/41101012 [2] http://stackoverflow.com/questions/27011634
The test data for EMF with embedded PDF can be found under https://people.apache.org/~kiwiwings/Basic_Expense_Template_2011.xls
Applied via r1776819