Bug 49820 - ParagraphProperties.getLvl() returns 0 for both Level 1 and Body text
ParagraphProperties.getLvl() returns 0 for both Level 1 and Body text
Status: RESOLVED FIXED
Product: POI
Classification: Unclassified
Component: HWPF
3.7-dev
All All
: P2 normal (vote)
: ---
Assigned To: POI Developers List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2010-08-25 07:16 UTC by Viliam Anirud
Modified: 2010-09-20 07:46 UTC (History)
0 users



Attachments
Fixed version (29.70 KB, text/x-java)
2010-08-25 07:16 UTC, Viliam Anirud
Details
Patched files and test case (29.73 KB, application/octet-stream)
2010-08-26 02:17 UTC, Viliam Anirud
Details
Patched files and test case version 2 (26.23 KB, application/octet-stream)
2010-09-08 03:01 UTC, Viliam Anirud
Details
Diff file to be applied to repository (2.34 KB, patch)
2010-09-13 05:14 UTC, Viliam Anirud
Details | Diff
New files not included in the diff (7.67 KB, application/octet-stream)
2010-09-13 05:15 UTC, Viliam Anirud
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Viliam Anirud 2010-08-25 07:16:27 UTC
Created attachment 25934 [details]
Fixed version

When you ParagraphProperties.getLvl() for any style sheet, that is a part of the outline, it returns a value from 0 to 8 for Heading 1 to Heading 9. But for normal styles it returns 0, which makes it indistinguishable from each other.

The MICROSOFT OFFICE WORD 97-2007 BINARY FILE FORMAT SPECIFICATION states the following:

  The standard PAP is all zeros except: 
  fWidowControl  1 
  fMultLineSpace  1 
  dyaLine  240 twips 
  Lvl  9

I solved this problem by changing the initial value for property org.apache.poi.hwpf.model.types.PAPAbstractType#field_58_lvl to 9. After this, I can read the same outline levels as Word shows me: getLvl() returns values from 0 to 9; 9 is Body text and 0..8 are outline levels 1..9.

Attached is a modified version of PAPAbstractType.java, which also alters the initial value for the property field_17_fWidowControl according to specification. Properties for fMultLineSpace and dyaLine are not there, it is probably for some newer version of Word.
Comment 1 Nick Burch 2010-08-25 07:46:33 UTC
Any chance you could do a quick unit test, which shows it getting the correct values for the headings even after the change, along with then getting the correct one for normal text with the change?
Comment 2 Viliam Anirud 2010-08-26 02:12:05 UTC
I studied the problem further and found several other problems:

- the ParagraphSprmUncompressor class sets value for operation 0x40 incorrectly to ilvl instead of to lvl, according to binary format specification

- the lvl in operation 0x40 should be set always, not only when istd is between 1..9. The binary format specification says that, but in Word you can set outline level for any paragraph except Heading 1..9, and with this condition in place POI does not see it. When it is commented out, everything seems OK, see testcase.

- I added the getLvl() method to Paragraph, which enables to read outline level at the paragraph level.

Attached is the JUnit testcase. This is its output in current version (3.7b2, style level shown in place of paragraph level, as current version does not support reading level at paragraph level):
Style level: 0, paragraph level: 0, text: Heading 1
Style level: 0, paragraph level: 0, text: Heading 2
Style level: 0, paragraph level: 0, text: Heading 3
Style level: 0, paragraph level: 0, text: Heading 4
Style level: 0, paragraph level: 0, text: Heading 5
Style level: 0, paragraph level: 0, text: Heading 6
Style level: 0, paragraph level: 0, text: Heading 7
Style level: 0, paragraph level: 0, text: Heading 8
Style level: 0, paragraph level: 0, text: Heading 9
Style level: 0, paragraph level: 0, text: Body text with unchanged outline level
Style level: 0, paragraph level: 0, text: Body text with outline level changed to Level 1
Style level: 0, paragraph level: 0, text: Body text with outline level changed to Level 5


This is the output after applying attached patches to version 3.7b2:
Style level: 0, paragraph level: 0, text: Heading 1
Style level: 1, paragraph level: 1, text: Heading 2
Style level: 2, paragraph level: 2, text: Heading 3
Style level: 3, paragraph level: 3, text: Heading 4
Style level: 4, paragraph level: 4, text: Heading 5
Style level: 5, paragraph level: 5, text: Heading 6
Style level: 6, paragraph level: 6, text: Heading 7
Style level: 7, paragraph level: 7, text: Heading 8
Style level: 8, paragraph level: 8, text: Heading 9
Style level: 9, paragraph level: 9, text: Body text with unchanged outline level
Style level: 9, paragraph level: 0, text: Body text with outline level changed to Level 1
Style level: 9, paragraph level: 4, text: Body text with outline level changed to Level 5


Specification used: http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/Word97-2007BinaryFileFormat(doc)Specification.pdf
Comment 3 Viliam Anirud 2010-08-26 02:17:00 UTC
Created attachment 25946 [details]
Patched files and test case
Comment 4 Viliam Anirud 2010-09-08 03:00:31 UTC
I found a mistake in my patch. I added the initial values to PAPAbstractType, but they already were in ParagraphProperties's constructor. But the initial value for lvl was incorrectly assigned to ilvl (which are two different properties) - that is the problem to be fixed.

I will attach new patch. Please apply the patch to version 3.7 so we could use the final version without patching.
Comment 5 Viliam Anirud 2010-09-08 03:01:39 UTC
Created attachment 26000 [details]
Patched files and test case version 2
Comment 6 Viliam Anirud 2010-09-13 05:14:58 UTC
Created attachment 26020 [details]
Diff file to be applied to repository

This is a patch created by following the rules in the POI Contribution Guidelines, along with a testcase. It is created upon the current repository head version. Please check it and possibly merge it.
Comment 7 Viliam Anirud 2010-09-13 05:15:48 UTC
Created attachment 26021 [details]
New files not included in the diff
Comment 8 Nick Burch 2010-09-20 07:46:02 UTC
Thanks, patch applied in r998897.