Bug 52285 - [Patch] Enhance XWPF Paragraph to parse (nested) smart tags
Summary: [Patch] Enhance XWPF Paragraph to parse (nested) smart tags
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-12-05 12:48 UTC by Fabian Lange
Modified: 2011-12-06 04:31 UTC (History)
0 users



Attachments
patch tar.gz and xml file (14.87 KB, application/zip)
2011-12-05 12:48 UTC, Fabian Lange
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Lange 2011-12-05 12:48:05 UTC
Created attachment 28026 [details]
patch tar.gz and xml file

Word sometimes adds smart tags to text entered by the user.

They might be simle, like this:
			<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
				w:element="country-region">
				<w:r>
					<w:rPr>
						<w:lang w:val="en-US" />
					</w:rPr>
					<w:t>India</w:t>
				</w:r>
			</w:smartTag>

or even nested:

			<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
				w:element="PersonName">
				<w:smartTag w:uri="urn:schemas:contacts" w:element="GivenName">
					<w:r>
						<w:rPr>
							<w:lang w:val="en-US" />
						</w:rPr>
						<w:t>Marilyn</w:t>
					</w:r>
				</w:smartTag>
				<w:r>
					<w:rPr>
						<w:lang w:val="en-US" />
					</w:rPr>
					<w:t xml:space="preserve"> </w:t>
				</w:r>
				<w:smartTag w:uri="urn:schemas:contacts" w:element="Sn">
					<w:r>
						<w:rPr>
							<w:lang w:val="en-US" />
						</w:rPr>
						<w:t>Monroe</w:t>
					</w:r>
				</w:smartTag>
			</w:smartTag>

The previous implementation for a paragraph simply ignores instances of CTSmartTagRun.
My proposed patch introduces recusrive parsing for CTSmartTagRun. 
I did consider making all tags recursive, but this failed other tests. I think this might be an option for further improvement.

This makes test cases checking for smart tags pass and fixes two issues in Tika.

My implementation does discard the information from the smart tag.

Patch also contains minor cleanup of the mixed tab/spacing in this class, and removed a duplicate document!= null check.
Comment 1 Nick Burch 2011-12-06 04:31:57 UTC
Thanks for this, applied (with a few little tweaks) in r1210774.

(For future reference, it might be better to avoid trying to reformat the rest of the code, as it makes the reviewing a bit harder. We should really standardise, but not always in a feature patch!)