On TIKA-2129, Seva Alekseyev shared a file that throws: java.lang.IllegalArgumentException: Unknown shape type: 4095 at org.apache.poi.sl.usermodel.ShapeType.forId(ShapeType.java:314) at org.apache.poi.hslf.usermodel.HSLFShapeFactory.createSimpleShape(HSLFShapeFactory.java:98) at org.apache.poi.hslf.usermodel.HSLFShapeFactory.createShape(HSLFShapeFactory.java:62) at org.apache.poi.hslf.usermodel.HSLFSheet.getShapes(HSLFSheet.java:173) at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:93) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) When I save the file as pptx, the shape is called "AutoShape". According to one vendor of MSOffice processing tools, 4095 is a recognized code for 'Unknown'. Any objections to adding this to ShapeType?
For shape types 0x0FFF is not defined (see MSOSPT [1]) - I could only find something for MSODGMT [2]. When the file is saved as .pptx, PP2016 create a custom geometry shape depicting a rectangle, but in .ppt I can't find that freeform shape structure. So I would reuse the shape type NOT_PRIMITIVE in this case and not introduce another enumeration value. [1] https://msdn.microsoft.com/en-us/library/dd949385(v=office.12).aspx [2] https://msdn.microsoft.com/en-us/library/dd953236(v=office.12).aspx
fixed via r1766227
... beside of reading the .ppt, I've also successfully rendered it ...
Thank you, Andi!