Bug 60294 - Add "unknown" ShapeType for 4095
Summary: Add "unknown" ShapeType for 4095
Alias: None
Product: POI
Classification: Unclassified
Component: HSLF (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2016-10-21 16:24 UTC by Tim Allison
Modified: 2016-10-24 16:44 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2016-10-21 16:24:04 UTC
On TIKA-2129, Seva Alekseyev shared a file that throws:

java.lang.IllegalArgumentException: Unknown shape type: 4095
at org.apache.poi.sl.usermodel.ShapeType.forId(ShapeType.java:314)
at org.apache.poi.hslf.usermodel.HSLFShapeFactory.createSimpleShape(HSLFShapeFactory.java:98)
at org.apache.poi.hslf.usermodel.HSLFShapeFactory.createShape(HSLFShapeFactory.java:62)
at org.apache.poi.hslf.usermodel.HSLFSheet.getShapes(HSLFSheet.java:173)
at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:93)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)

When I save the file as pptx, the shape is called "AutoShape". According to one vendor of MSOffice processing tools, 4095 is a recognized code for 'Unknown'.

Any objections to adding this to ShapeType?
Comment 1 Andreas Beeker 2016-10-22 18:52:53 UTC
For shape types 0x0FFF is not defined (see MSOSPT [1]) - I could only find something for MSODGMT [2]. When the file is saved as .pptx, PP2016 create a custom geometry shape depicting a rectangle, but in .ppt I can't find that freeform shape structure.
So I would reuse the shape type NOT_PRIMITIVE in this case and not introduce another enumeration value.

[1] https://msdn.microsoft.com/en-us/library/dd949385(v=office.12).aspx
[2] https://msdn.microsoft.com/en-us/library/dd953236(v=office.12).aspx
Comment 2 Andreas Beeker 2016-10-22 19:35:30 UTC
fixed via r1766227
Comment 3 Andreas Beeker 2016-10-22 19:39:01 UTC
... beside of reading the .ppt, I've also successfully rendered it ...
Comment 4 Tim Allison 2016-10-24 16:44:20 UTC
Thank you, Andi!