56384 – [PATCH] Nested Field Codes are not parsed correctly

Bug 56384 - [PATCH] Nested Field Codes are not parsed correctly

Summary: [PATCH] Nested Field Codes are not parsed correctly

Status:	NEW

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HWPF (show other bugs)
Version:	3.9-FINAL
Hardware:	PC All

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:	PatchAvailable

Depends on:
Blocks:

Reported:	2014-04-10 16:07 UTC by Josh Holthaus
Modified:	2015-08-10 09:43 UTC (History)
CC List:	0 users

Attachments
patch (13.44 KB, text/plain) 2014-04-10 16:07 UTC, Josh Holthaus	Details
Sample Document (27.00 KB, application/octet-stream) 2014-04-10 16:09 UTC, Josh Holthaus	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Josh Holthaus 2014-04-10 16:07:21 UTC

Created attachment 31508 [details]
patch

When a document contains nested field codes only the inner most field code is processed. This causes unwanted text to appear when converting a document to html.

Example filed code:
{ SYMBOL { =32+ { SEQ IPItem \n } \f "TDPUNDNO" \h \s 10 }{ADVANCE \l 11 }
This example has 1 field code SYMBOL which contains 2 more field codes within.

Comment 1 Josh Holthaus 2014-04-10 16:09:05 UTC

Created attachment 31509 [details]
Sample Document

This is a sample document that contains nested field codes. The patch runs junit against this file.

Comment 2 Dominik Stadler 2015-03-11 20:38:55 UTC

Hm, the patch seems to remove quite a bit of code, I don't know this area well, so am not sure why it was necessary before and how it becomes be obsolete after your changes, can you please explain in a few sentences how you managed the same behaviour+fix with your changes?

Comment 3 Josh Holthaus 2015-03-12 21:11:54 UTC

I am basically redoing the iteration in the parseFieldStructureImpl method. The current switch statement will ignore a field code if a second FIELD_BEGIN_MARK is encountered before a FIELD_END_MARK. Example: { OUTER_FIELD { INNER_FIELD }}. In this case the OUTER_FIELD would be ignored because the start of the INNER_FIELD occurs before the OUTER_FIELD is closed. I changed the method to store all the begin marks in a list to retain their order when fields are nested. When a FIELD_END_MARK is encountered I take the last FIELD_BEGIN_MARK and corresponding FIELD_SEPARATOR_MARK and add them to the results. I no longer needed the binarySearch method to determine the next start since I changed it to iterate through all fields between startOffsetInclusive and endOffsetExclusive.