Bug 48018

Summary: Defect in - org.apache.poi.hwpf.model.ListLevel
Product: POI Reporter: ashokkumar755
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 3.5-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: Fix multiple list processing bug in ListTables.java
Fix Character/Paragraph formatting problem in ListLevel.java

Description ashokkumar755 2009-10-17 05:29:53 UTC
Please check List level processing in Class org.apache.poi.hwpf.model.ListLevel.
 
I think it needs to be changed to copy character formatting (_grpprlChpx) first followed by paragraph formatting(_grpprlPapx).

/* Change to copy _grpprlChpx first and then _grpprlPapx */
System.arraycopy(buf, offset, _grpprlChpx, 0, _cbGrpprlChpx);
offset += _cbGrpprlChpx;

System.arraycopy(buf, offset, _grpprlPapx, 0, _cbGrpprlPapx);
offset += _cbGrpprlPapx;

/* Change Ends */

Looks like this bug is carried over from the MS Word (97-2007) Binary File format specification (page 47-48).
Comment 1 ashokkumar755 2009-10-18 08:52:27 UTC
On second thought may be the actual problem is in method ListLevel.toByteArray rather than in the ListLevel constructor method. 

Method ListLevel.toByteArray() needs to be changed for copying _grpprlPapx first and then copy _grpprlChpx into the buffer (buf). This fixes the paragraph/character formatting issues with list processing.
Comment 2 ashokkumar755 2009-10-20 11:47:02 UTC
Created attachment 24401 [details]
Fix multiple list processing bug in ListTables.java
Comment 3 ashokkumar755 2009-10-20 11:48:12 UTC
Created attachment 24402 [details]
Fix Character/Paragraph formatting problem in ListLevel.java
Comment 4 ashokkumar755 2009-10-20 11:48:56 UTC
There is another related bug in Class ListTables which causes discrepancy in the way MS Word displays multiple lists in a document. Most of the time Word replaces bullets of 2nd and subsequent lists in the document, with bullets retrieved from the first LSTF/LVLF records. 

This seems to occur due to re-ordering of ListData(LSTF/LVLF) when it is read into a HashMap (_listMap) in ListTables constructor method and then written back to the word document in method writeListDataTo(). 

Even though the  MS Word (97-2007) Binary File format specification (Paragraph List Formatting, page 48, point 3) states that the LSTF record will be identified with "List id" from corresponding LFO record by searching, this doesn't seem to happen inside Word, still there is some dependency on the original order of LSTF/LVLF records.  

This is fixed in the attached ListTables.java source with the use of a ArrayList (_ListArr) in the Constructor and writeListDataTo methods in place of the HashMap (_ListMap).

Also attached is the changed ListLevel.java (method toByteArray()) source for fixing the Character/Paragraph formatting errors.
Comment 5 Josh Micich 2009-10-21 14:44:36 UTC
Rather than just making arbitrary changes to the code, it would be nice to have some additional justification - junits are usually a good start.  Can you give details as to how the current code causes errors?

I understand very little about the code that you are changing, but there seems to be a few outstanding issues in your patch that either need fixing or explaining.

ListLevel.java

Problems with the order of field serialisation.

the current order is:
<init>()      { .., cbGrpprlChpx, cbGrpprlPapx, .., grpprlPapx, grpprlChpx, .. }
toByteArray() { .., cbGrpprlChpx, cbGrpprlPapx, .., grpprlChpx, grpprlPapx, .. }

but you suggest:
<init>()      { .., cbGrpprlPapx, cbGrpprlChpx, .., grpprlPapx, grpprlChpx, .. }
toByteArray() { .., cbGrpprlPapx, cbGrpprlChpx, .., grpprlPapx, grpprlChpx, ..  }

So the patch does the exact opposite of what you describe in (comment 0) which suggests the correct code should be 'Ch' before 'Pa'.  Note - the ordering in the current code is internally inconsistent at lines 106-109 (the only place that currently has 'Pa' before 'Ch').  Perhaps this is the one place you wanted to change.

I am pretty sure that the proposed change at line 122 makes no difference:
-    _numberText[x] = (char)LittleEndian.getShort(buf, offset);
+    _numberText[x] = (char)LittleEndian.getUShort(buf, offset);


ListTables.java

You've made a new field _listArr which seems to store the same data as _listMap.   Around line 106 should there be an extra line added, similar to what you have done at line 62?
     _listMap.put(new Integer(lsid), lst);
+    _listArr.add(lst);

It would also be good to add a comment explaining why you're forcing the ListData objects to be serialised in the same order they were read in.



Lastly, can you make your changes with respect to the latest code from trunk?  Your original change seems to be from prior to svn r805492 .  Try composing the patch using "svn diff" or "git diff".  Patch/diff files are easier to manage.
Comment 6 ashokkumar755 2009-10-22 01:25:07 UTC
Hi Josh,
Thanks for replying. I'll try to give the requested information in the appropriate format shortly.  But in the meanwhile to clear up your understanding of the problem I will re-state it again:

1. ListLevel.java: 
   Only issue here is in the toByteArray() method, field grpprlPapx need to be 
   copied before grpprlChpx into buf. Please ignore any other changes that you 
   found in the code.

2. ListTables.java 
   Here again the only problem I faced is with multiple lists, Word was not 
   displaying the lists consistently it was showing bullets and formatting from 
   the first list (in LSTF/LVLF records) for most of the other lists. This I 
   found out while testing is caused due to re-ordering of LSTF/LVLF records 
   when using a HashMap. With the ArrayList (as the order is preserved) the 
   Word is displaying all the lists as expected.

Lastly I am using source takens out of poi-src-3.5-FINAL-20090928.zip.

Thanks Again.
Comment 7 ashokkumar755 2009-10-22 01:26:22 UTC
Hi Josh,
Thanks for replying. I'll try to give the requested information in the appropriate format shortly.  But in the meanwhile to clear up your understanding of the problem I will re-state it again:

1. ListLevel.java: 
   Only issue here is in the toByteArray() method, field grpprlPapx need to be 
   copied before grpprlChpx into buf. Please ignore any other changes that you 
   found in the code.

2. ListTables.java 
   Here again the only problem I faced is with multiple lists, Word was not 
   displaying the lists consistently it was showing bullets and formatting from 
   the first list (in LSTF/LVLF records) for most of the other lists. This I 
   found out while testing is caused due to re-ordering of LSTF/LVLF records 
   when using a HashMap. With the ArrayList (as the order is preserved) the 
   Word is displaying all the lists as expected.

Lastly I am using source taken out of poi-src-3.5-FINAL-20090928.zip.

Thanks Again.
Comment 8 Nick Burch 2010-09-20 11:23:14 UTC
I believe this has been fixed in r998943, along with lots of unit tests that seem to show everything working fine.

If the problem still remains, please re-open the bug, and ideally upload a failing unit test that shows the problem.