Bug 47924

Summary: cell.getCellComment may return wrong comment
Product: POI Reporter: Karl Eilebrecht <Karl.Eilebrecht>
Component: HSSFAssignee: POI Developers List <dev>
Severity: normal CC: Karl.Eilebrecht
Priority: P2    
Version: 3.5-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: standalone quickfix for testing

Description Karl Eilebrecht 2009-10-01 04:31:08 UTC
Created attachment 24328 [details]
standalone quickfix for testing

The method getCellComment() located at HSSFCell may return a wrong comment if
more comments have been added later.

Problem: The method HSSFCell.findCellComment() makes an assumption about
         the ordering of text records that is probably wrong.
         Line 1084
         " //find the next TextObjectRecord which holds the comment's text
           //the order of TXO matches the order of NoteRecords: i-th TXO record 
             corresponds to the i-th NoteRecord"         

Steps to reproduce:

- create an excel sheet with some cell comments
- close and re-open it
- add a new comment to a cell above(!) the existing comments
- go through existing cells and output the result of cell.getCellComment()

Some comments will not match these you typed in in excel.

Solution proposal: 
Match objectId of text record with note's shape Id for
a correct assignment.

Replace list by a map in line 1057
      Map<Integer, TextObjectRecord> noteTxo = 
                       new HashMap<Integer, TextObjectRecord>();
Change line 1089
      noteTxo.put(cmo.getObjectId(), (TextObjectRecord) rec);

Change line 1065
      TextObjectRecord txo = noteTxo.get(note.getShapeId());

WARNING: This is a new assumption. I did not verify this under all
circumstances or against documentation. But in my environment it runs fine.

For those being in a hurry I attach a standalone quick fix hack (works without
patching the jar).
Comment 1 Yegor Kozlov 2009-10-29 09:56:05 UTC

A very good catch, thanks!

The logic to match NoteRecord and TextObjectRecord should not rely on the order of records, instead it should match them by objectId, as you suggested. I easily reproduced the bug in Excel 2003 and Excel 2007. 

P.S. I wrote this code some years ago when the Excel spec was closed. Most of it was derived by trial and error and hence can be not optimal.