Bug 66509 - NullPointerException when invoking getAnchor() on groups/shapes that are missing the off attribute
Summary: NullPointerException when invoking getAnchor() on groups/shapes that are miss...
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: XSLF (show other bugs)
Version: 5.2.2-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2023-03-06 03:39 UTC by Henry Iguaro
Modified: 2023-03-09 05:35 UTC (History)
0 users

Zip file containing a Maven project that reproduces the issue (35.00 KB, application/zip)
2023-03-06 03:39 UTC, Henry Iguaro

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Iguaro 2023-03-06 03:39:06 UTC
Created attachment 38518 [details]
Zip file containing a Maven project that reproduces the issue

When parsing PowerPoint files, some files may be missing the `<a:off>` tag from the `<a:xfrm>` section. While PowerPoint and other applications can open these files without issue, POI throws a `NullPointerException` when trying to retrieve the `anchor` of a shape that is missing this tag.

Steps to reproduce:

1. Download the attached zip file containing a Maven project, unzip the folder and `cd` into its directory.
2. Run `maven install` and then `mvn exec:java`.
3. Observe the `NullPointerException` that is thrown.

java.lang.NullPointerException: Cannot invoke "org.openxmlformats.schemas.drawingml.x2006.main.CTPoint2D.xgetX()" because "off" is null
    at org.apache.poi.xslf.usermodel.XSLFGroupShape.getAnchor (XSLFGroupShape.java:87)
    at com.example.Main.main (Main.java:17)
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:104)
    at java.lang.reflect.Method.invoke (Method.java:578)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:1589)

Expected behavior:
POI could gracefully handle missing `<a:off>` tags and not throw a `NullPointerException` when trying to retrieve the anchor of a shape that is missing this tag.

Proposed solution:
We are going to submit a pull request that makes the library return `null` in cases where the `<a:off>` tag is missing for both `XSLFGroupShape` and `XSLFSimpleShape`, instead of allowing the `NullPointerException` to occur.
Comment 1 Henry Iguaro 2023-03-06 03:52:38 UTC
Link to a Pull Request with a potential fix: https://github.com/higuaro/poi/pull/1
Comment 2 PJ Fanning 2023-03-06 14:08:25 UTC
Where did the test file come from? We've gotten away without these null checks until now so it appears that it is not common to have the `<a:off>` tag not appear?

Have you run the POI tests with your code changes?

You haven't specified the POI version you are using.
Comment 3 Henry Iguaro 2023-03-06 23:44:09 UTC
I crafted an artificial file (included in the demo project) that reproduces the problem in the bare minimum since I cannot share user data.

The version of POI I'm using is 5.2.3, which is specified in the demo project. I couldn't select this version in the Bugzilla UI, so I updated to from `unspecified` to `5.2.2-FINAL` instead.

Regarding the POI tests, yes, I ran them with my code changes and all tests passed successfully.
Comment 4 PJ Fanning 2023-03-08 10:34:29 UTC
I'm not sure about this change. The existing methods don't appear to have ever been expected to return null. I'd prefer if any change involved us returning an exception. Since the existing methods don't expose exceptions either, we would be stuck with using something that extends RuntimeException (to avoid API changes in a patch release).

In the short term, anyone who hits this issue will need to catch NullPointerException.

Until we get other people reporting similar issues, I think we can assume this is not an endemic issue.
Comment 5 Henry Iguaro 2023-03-09 05:35:15 UTC
Thank you for your response. I would like to clarify that `XSLFSimpleShape#getAnchor()` already returns `null` in a couple of cases. The current implementation checks for the presence of the `xfrm` element and whether the `off` attribute is set, and returns `null` if either of these conditions is false.

However, in the specific case that triggered this issue, the `xfrm` element was present, the `off` attribute was missing, and the method did not return `null`, causing the NPE. The proposed change addresses this specific case and ensures that the method returns `null`. I also extended the approach to `XSLFGroupShape#getAnchor()` for consistency.

As a user of the library, I believe that changing the method signature to throw an exception would be excessive, and I do think that returning `null` is a simple enough and not a terrible solution.