Summary: | [PATCH] Nested Field Codes are not parsed correctly | ||
---|---|---|---|
Product: | POI | Reporter: | Josh Holthaus <josh.holthaus> |
Component: | HWPF | Assignee: | POI Developers List <dev> |
Status: | NEW --- | ||
Severity: | normal | Keywords: | PatchAvailable |
Priority: | P2 | ||
Version: | 3.9-FINAL | ||
Target Milestone: | --- | ||
Hardware: | PC | ||
OS: | All | ||
Attachments: |
patch
Sample Document |
Created attachment 31509 [details]
Sample Document
This is a sample document that contains nested field codes. The patch runs junit against this file.
Hm, the patch seems to remove quite a bit of code, I don't know this area well, so am not sure why it was necessary before and how it becomes be obsolete after your changes, can you please explain in a few sentences how you managed the same behaviour+fix with your changes? I am basically redoing the iteration in the parseFieldStructureImpl method. The current switch statement will ignore a field code if a second FIELD_BEGIN_MARK is encountered before a FIELD_END_MARK. Example: { OUTER_FIELD { INNER_FIELD }}. In this case the OUTER_FIELD would be ignored because the start of the INNER_FIELD occurs before the OUTER_FIELD is closed. I changed the method to store all the begin marks in a list to retain their order when fields are nested. When a FIELD_END_MARK is encountered I take the last FIELD_BEGIN_MARK and corresponding FIELD_SEPARATOR_MARK and add them to the results. I no longer needed the binarySearch method to determine the next start since I changed it to iterate through all fields between startOffsetInclusive and endOffsetExclusive. |
Created attachment 31508 [details] patch When a document contains nested field codes only the inner most field code is processed. This causes unwanted text to appear when converting a document to html. Example filed code: { SYMBOL { =32+ { SEQ IPItem \n } \f "TDPUNDNO" \h \s 10 }{ADVANCE \l 11 } This example has 1 field code SYMBOL which contains 2 more field codes within.