Bug 56384

Summary:	[PATCH] Nested Field Codes are not parsed correctly
Product:	POI	Reporter:	Josh Holthaus <josh.holthaus>
Component:	HWPF	Assignee:	POI Developers List <dev>
Status:	NEW ---
Severity:	normal	Keywords:	PatchAvailable
Priority:	P2
Version:	3.9-FINAL
Target Milestone:	---
Hardware:	PC
OS:	All
Attachments:	patch Sample Document

Description Josh Holthaus 2014-04-10 16:07:21 UTC

Created attachment 31508 [details]
patch

When a document contains nested field codes only the inner most field code is processed. This causes unwanted text to appear when converting a document to html.

Example filed code:
{ SYMBOL { =32+ { SEQ IPItem \n } \f "TDPUNDNO" \h \s 10 }{ADVANCE \l 11 }
This example has 1 field code SYMBOL which contains 2 more field codes within.

Comment 1 Josh Holthaus 2014-04-10 16:09:05 UTC

Created attachment 31509 [details]
Sample Document

This is a sample document that contains nested field codes. The patch runs junit against this file.

Comment 2 Dominik Stadler 2015-03-11 20:38:55 UTC

Hm, the patch seems to remove quite a bit of code, I don't know this area well, so am not sure why it was necessary before and how it becomes be obsolete after your changes, can you please explain in a few sentences how you managed the same behaviour+fix with your changes?

Comment 3 Josh Holthaus 2015-03-12 21:11:54 UTC

I am basically redoing the iteration in the parseFieldStructureImpl method. The current switch statement will ignore a field code if a second FIELD_BEGIN_MARK is encountered before a FIELD_END_MARK. Example: { OUTER_FIELD { INNER_FIELD }}. In this case the OUTER_FIELD would be ignored because the start of the INNER_FIELD occurs before the OUTER_FIELD is closed. I changed the method to store all the begin marks in a list to retain their order when fields are nested. When a FIELD_END_MARK is encountered I take the last FIELD_BEGIN_MARK and corresponding FIELD_SEPARATOR_MARK and add them to the results. I no longer needed the binarySearch method to determine the next start since I changed it to iterate through all fields between startOffsetInclusive and endOffsetExclusive.