Bug 58002 - EscherAggregate.createAggregate doesn't always read NoteRecord
Summary: EscherAggregate.createAggregate doesn't always read NoteRecord
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.12-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2015-06-04 12:47 UTC by Robert Kish
Modified: 2019-01-06 13:00 UTC (History)
0 users

Sample code to read comments. (1.26 KB, text/x-java-source)
2015-06-04 12:47 UTC, Robert Kish
Debug version of TestDrawingAggregate (148.26 KB, text/x-java-source)
2015-06-30 23:38 UTC, Robert Kish
Debug version of EsceherAggregate.java (39.21 KB, text/x-java-source)
2015-06-30 23:46 UTC, Robert Kish
Small addition to the integration tests to verify the comments which triggers this bug (2.52 KB, patch)
2015-07-07 07:37 UTC, Dominik Stadler
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Kish 2015-06-04 12:47:51 UTC
Created attachment 32793 [details]
Sample code to read comments.

I have this XLS file that when I attempt to retrieve all HSSFComment via worksheet's drawing patriarch's children, I get a NullPointerException when I attempt to get the comment's column (and row). The _note component didn't get initialized. I have tracked this issue down to:

org.apache.poi.hssf.record.EscherAggregate.createAggregate, near line 442:

        // any NoteRecords that follow the drawing block must be aggregated and and saved in the tailRec collection
        while (loc < records.size()) {
            if (sid(records, loc) == NoteRecord.sid) {
                NoteRecord r = (NoteRecord) records.get(loc);
                agg.tailRec.put(r.getShapeId(), r);
            } else {

If the current loc points to a record where the sid is NOT NoteRecord, the loop exits without checking additional records. In my test file, the next record is a DrawingSelectionRecord [MSODRAWINGSELECTION] with sid 237 (0xED). The fix that works for my document is to remove the else break, as in:

        // any NoteRecords that follow the drawing block must be aggregated and and saved in the tailRec collection
        while (loc < records.size()) {
            if (sid(records, loc) == NoteRecord.sid) {
                NoteRecord r = (NoteRecord) records.get(loc);
                agg.tailRec.put(r.getShapeId(), r);

This code change seems without side effects because loc is not referenced anymore in the method.

Attached is my document that causes this error. I tried changing the pictures or adding new comments, and it ends up rewriting the file in such a way that there's no error. So I have it in the original form. (The pictures come from Windows 7 sample pictures). The document is currently 1.6 MB and might be too large to upload.

Also attached is a simple program (Java 8) to read the file comments:

Expected output (when I use the debugger to change loc to the next value before testing if it's a NoteRecord):
Robert Kish:
Yo!: 0,2

Robert Kish:
2nd note!: 1,2

Robert Kish:
Woah!: 22,45

Robert Kish:
Woah top!: 21,0

Actual output:
Robert Kish:
Yo!: Exception in thread "main" java.lang.NullPointerException
	at org.apache.poi.hssf.usermodel.HSSFComment.getColumn(HSSFComment.java:184)
	at NoteTest.main(NoteTest.java:26)

Thank you.
Comment 1 Robert Kish 2015-06-04 13:09:36 UTC
As mentioned before, when I try to change the pictures in the xls file, the file gets rewritten in a better way and I can't reproduce the error. If you want to see the file, please recommend some file sharing service I should use. The error message that bugzilla displayed didn't provide recommendations. Or even email works.
Comment 2 Dominik Stadler 2015-06-20 09:46:14 UTC
Does it get smaller when you compress it using zip? Oterwise please try sending the document via mail to the poi-dev or user-list or upload it to dropbox or some other file sharing functionality.
Comment 3 Robert Kish 2015-06-23 19:59:16 UTC
I tried dropbox and placed a zip file with an xls inside.

Comment 4 Dominik Stadler 2015-06-29 13:12:56 UTC
FYI, the proposed change breaks unit tests, at least TestDrawingAggregate, not sure if these are expected or if they indicate a problem with the changes...
Comment 5 Robert Kish 2015-06-29 21:41:15 UTC
You are correct; my patch fails :(. I downloaded trunk from source and tried at it some more. This will take some time for me to figure out, I'll give it a try in the next week or so and try to submit a new patch. The problem is still the same, NoteRecords out of sequence. What I learned is that the code assumes the records are always in sequence, so it can remove records start to end. But what we really have is start1 to end1, start2 to end2, etc. The referenced JUnit has the same assumption and needs to be updated also.
Comment 6 Robert Kish 2015-06-30 23:38:50 UTC
Created attachment 32868 [details]
Debug version of TestDrawingAggregate

get method in this test now detects all DrawingRecord blocks in the file - not just the first. There is debugging with println and stacktrace to aid in trying to get all the unit tests to pass.
Comment 7 Robert Kish 2015-06-30 23:46:03 UTC
Created attachment 32869 [details]
Debug version of EsceherAggregate.java

This debug version has a modified version of createAggregate which processes multiple DrawingRecords - not just the one mentioned at locFirstDrawingRecord. There is use of println debugging.

The section "// Create one big buffer" will now find all sections to copy bytes.

The section "// Associate the object records with the shapes" now has an additional loop around it so that each DrawingRecord can be processed. The records added to the aggregate are removed at the end of the loop.

The EscherAggregate is added after the loop.

With the updated JUnit, this still fails some files. Many files pass, including those with multiple DrawingRecords. But some files fail.
Comment 8 Robert Kish 2015-06-30 23:48:24 UTC
I have spent many hours in the past day trying to figure this out, but it is beyond my current level of experience in the POI innards. I leave my work to someone else who may continue at what I started, or learn from my mistakes and come up with a better fix.
Comment 9 Dominik Stadler 2015-07-07 07:37:23 UTC
Created attachment 32887 [details]
Small addition to the integration tests to verify the comments which triggers this bug

Attached a small addition to the integration test which we can apply when this is fixed as it currently triggers this bug as well.