Bug 56384 - [PATCH] Nested Field Codes are not parsed correctly
Summary: [PATCH] Nested Field Codes are not parsed correctly
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.9-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Keywords: PatchAvailable
Depends on:
Reported: 2014-04-10 16:07 UTC by Josh Holthaus
Modified: 2015-08-10 09:43 UTC (History)
0 users

patch (13.44 KB, text/plain)
2014-04-10 16:07 UTC, Josh Holthaus
Sample Document (27.00 KB, application/octet-stream)
2014-04-10 16:09 UTC, Josh Holthaus

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Holthaus 2014-04-10 16:07:21 UTC
Created attachment 31508 [details]

When a document contains nested field codes only the inner most field code is processed. This causes unwanted text to appear when converting a document to html.

Example filed code:
{ SYMBOL { =32+ { SEQ IPItem \n } \f "TDPUNDNO" \h \s 10 }{ADVANCE \l 11 }
This example has 1 field code SYMBOL which contains 2 more field codes within.
Comment 1 Josh Holthaus 2014-04-10 16:09:05 UTC
Created attachment 31509 [details]
Sample Document

This is a sample document that contains nested field codes. The patch runs junit against this file.
Comment 2 Dominik Stadler 2015-03-11 20:38:55 UTC
Hm, the patch seems to remove quite a bit of code, I don't know this area well, so am not sure why it was necessary before and how it becomes be obsolete after your changes, can you please explain in a few sentences how you managed the same behaviour+fix with your changes?
Comment 3 Josh Holthaus 2015-03-12 21:11:54 UTC
I am basically redoing the iteration in the parseFieldStructureImpl method. The current switch statement will ignore a field code if a second FIELD_BEGIN_MARK is encountered before a FIELD_END_MARK. Example: { OUTER_FIELD { INNER_FIELD }}. In this case the OUTER_FIELD would be ignored because the start of the INNER_FIELD occurs before the OUTER_FIELD is closed. I changed the method to store all the begin marks in a list to retain their order when fields are nested. When a FIELD_END_MARK is encountered I take the last FIELD_BEGIN_MARK and corresponding FIELD_SEPARATOR_MARK and add them to the results. I no longer needed the binarySearch method to determine the next start since I changed it to iterate through all fields between startOffsetInclusive and endOffsetExclusive.