Over on https://issues.apache.org/jira/browse/TIKA-3526, matcha007 shared a ppt file created by WPS 表格 that handles embedded files slightly differently than standard ppt. I tried some basic stuff with 5.1.0 and still had little luck. The file is: https://issues.apache.org/jira/secure/attachment/13032100/13032100_embedded+attachment.ppt When I do the usual iterate through slides and then iterate through shapes looking for HSLFObjectShape, the objectShape.getObjectData() returns null because, as matcha007 pointed out, the _exEmbed is not found in HSLFObjectShape's private ExEmbed getExEmbed(boolean create) {... matcha007 found that if he added 3 to the objectId, in getExEmbed, it seemed to work on this file, but there's no motivation for that (that I know of), and it looks like it would break everything else. I can extract the embedded files if I iterate through HSLFObjectData from that slideshow level: POIFSFileSystem pfs = new POIFSFileSystem(p.toFile()); try (HSLFSlideShow ss = new HSLFSlideShow(pfs.getRoot())) { HSLFObjectData[] objectData = ss.getEmbeddedObjects(); However, I can't then link those back to the ids in the shapes for this particular file. What can we do with this file?