Bug 45001 - Range.insertBefore() and Range.delete() fail with Unicode characters
Summary: Range.insertBefore() and Range.delete() fail with Unicode characters
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.0-dev
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks: 45252
  Show dependency tree
 
Reported: 2008-05-15 00:21 UTC by N. Hira
Modified: 2008-06-28 11:53 UTC (History)
0 users



Attachments
patch to Range, FileInformationBlock, and TextPiece to address problem described above (see bug text for limitations of patch) (4.64 KB, patch)
2008-05-15 00:21 UTC, N. Hira
Details | Diff
JUnit to test Range.insertBefore() when Range uses Unicode (use with sample document from next attachment) (3.85 KB, text/plain)
2008-06-13 22:38 UTC, N. Hira
Details
Sample document used to test Range.insertBefore() when Range uses Unicode (use with test case from previous attachment) (102.00 KB, application/msword)
2008-06-13 22:42 UTC, N. Hira
Details
Zip file that contains a patch, a test case, and an MS Word document to support the test case (5.42 KB, application/zip)
2008-06-18 16:12 UTC, N. Hira
Details
Patch for TextPiece, with unit test and illustrative document showing problem with delete() (5.77 KB, application/zip)
2008-06-22 21:03 UTC, N. Hira
Details

Note You need to log in before you can comment on or make changes to this bug.
Description N. Hira 2008-05-15 00:21:34 UTC
Created attachment 21966 [details]
patch to Range, FileInformationBlock, and TextPiece to address problem described above (see bug text for limitations of patch)

When OpenOffice.org creates MS Word 97 formatted *.doc files, it uses Unicode.  When Range.insertBefore() and Range.delete() are used with these multi-byte representations, a couple of different problems occur:
1.  The indices are not calculated correctly so delete() seems to delete arbitrary characters or fail with IndexOutOfBoundsExceptions
2.  For the same reason, insertBefore() seems to insert text at an arbitrary position and subsequent operations fail with IndexOutOfBoundsExceptions

There is a marginally related problem with these operations; they do not update FileInformationBlock.CCPText, and this throws OpenOffice.org for a loop.  It stops reading character text prematurely and renders document headers and footers incorrectly.

(see attachment for a partial patch to address both problems; note that the patch does not address overloaded versions of insertBefore(), nor does it address insertAfter())
Comment 1 Nick Burch 2008-05-20 09:57:27 UTC
Thanks for this patch, applied to trunk

Any chance you could also do us a little unit test, so we can be sure this doesn't get broken again in the future? I'm leaving the bug open for now, until we've got one
Comment 2 N. Hira 2008-06-13 22:38:46 UTC
Created attachment 22125 [details]
JUnit to test Range.insertBefore() when Range uses Unicode (use with sample document from next attachment)
Comment 3 N. Hira 2008-06-13 22:42:41 UTC
Created attachment 22126 [details]
Sample document used to test Range.insertBefore() when Range uses Unicode (use with test case from previous attachment)
Comment 4 N. Hira 2008-06-13 22:49:19 UTC
Sorry for the delay.  Also have a replaceText() that can be cleaned up and would make a great addition to the API for mail-merge-type uses...

/**
 *	Replace (one instance of) a piece of text with another...
 *
 *	@param	pPlaceHolder		The text to be replaced (e.g., "${company}")
 *	@param	pValue				The replacement text (e.g., "Cognocys, Inc.")
 *	@param	pDocument			The <code>HWPFDocument</code> in which the placeholder was found
 *	@param	pStartOffset		The offset or index where the <code>CharacterRun</code> begins
 *	@param	pPlaceHolderIndex	The offset or index of the placeholder, relative to the
 *								<code>CharacterRun</code> where <code>pPlaceHolder</code> was found
 *
 *	@throws	DocumentFillerException
 */
protected void replaceText(String pPlaceHolder, String pValue, 
	int pStartOffset, int pPlaceHolderIndex, HWPFDocument pDocument) 
	throws DocumentFillerException {

	int absPlaceHolderIndex = pStartOffset + pPlaceHolderIndex;
	Range subRange = new Range(
		absPlaceHolderIndex, 
		(absPlaceHolderIndex + pPlaceHolder.length()), pDocument);
	if (subRange.usesUnicode()) {

		absPlaceHolderIndex = pStartOffset + (pPlaceHolderIndex * 2);
		subRange = new Range(
			absPlaceHolderIndex, 
			(absPlaceHolderIndex + (pPlaceHolder.length() * 2)), 
			pDocument);
	}

	subRange.insertBefore(pValue);

	// re-create the sub-range so we can delete it
	subRange = new Range(
		(absPlaceHolderIndex + pValue.length()),
		(absPlaceHolderIndex + pPlaceHolder.length() + pValue.length()), 
			pDocument);
	if (subRange.usesUnicode())
		subRange = new Range(
			(absPlaceHolderIndex + (pValue.length() * 2)),
			(absPlaceHolderIndex + (pPlaceHolder.length() * 2) + 
				(pValue.length() * 2)), pDocument);

	subRange.delete();
}
Comment 5 Nick Burch 2008-06-16 05:50:33 UTC
Thanks for the test case, added to svn

Any chance you could do a unit test for your new replaceText method too? I've added that to svn too.
Comment 6 N. Hira 2008-06-18 16:12:49 UTC
Created attachment 22139 [details]
Zip file that contains a patch, a test case, and an MS Word document to support the test case
Comment 7 N. Hira 2008-06-18 16:14:44 UTC
The attachment includes the test case and a patch to Range to simplify replaceText()...

(Thanks, Nick.)
Comment 8 Nick Burch 2008-06-19 04:47:52 UTC
Thanks for this patch+test, applied to svn
Comment 9 N. Hira 2008-06-22 20:57:39 UTC
Follow up...

(Please let me know if I should create a new bug for this kind of thing in future.)

I've discovered that the original patch to TextPiece does not function as expected in that a delete() on a Unicode TextPiece results in the TextPiece being adjusted to an incorrect length.  

For every N characters deleted, the new length should be (previousLength - N), but the current code sets it to (previousLength - (N/2)) when the TextPiece uses Unicode.

I've attached a Unit Test, an illustrative document, and another patch to (the current version of) TextPiece.

The Unit Test also illustrates how one can delete all instances of some text from a Range.
Comment 10 N. Hira 2008-06-22 21:03:55 UTC
Created attachment 22156 [details]
Patch for TextPiece, with unit test and illustrative document showing problem with delete()
Comment 11 Nick Burch 2008-06-28 11:53:55 UTC
Thanks for the latest patch + test, applied to svn trunk