Bug 52285

Summary: [Patch] Enhance XWPF Paragraph to parse (nested) smart tags
Product: POI Reporter: Fabian Lange <fabian.lange>
Component: XWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: patch tar.gz and xml file

Description Fabian Lange 2011-12-05 12:48:05 UTC
Created attachment 28026 [details]
patch tar.gz and xml file

Word sometimes adds smart tags to text entered by the user.

They might be simle, like this:
			<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
				w:element="country-region">
				<w:r>
					<w:rPr>
						<w:lang w:val="en-US" />
					</w:rPr>
					<w:t>India</w:t>
				</w:r>
			</w:smartTag>

or even nested:

			<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
				w:element="PersonName">
				<w:smartTag w:uri="urn:schemas:contacts" w:element="GivenName">
					<w:r>
						<w:rPr>
							<w:lang w:val="en-US" />
						</w:rPr>
						<w:t>Marilyn</w:t>
					</w:r>
				</w:smartTag>
				<w:r>
					<w:rPr>
						<w:lang w:val="en-US" />
					</w:rPr>
					<w:t xml:space="preserve"> </w:t>
				</w:r>
				<w:smartTag w:uri="urn:schemas:contacts" w:element="Sn">
					<w:r>
						<w:rPr>
							<w:lang w:val="en-US" />
						</w:rPr>
						<w:t>Monroe</w:t>
					</w:r>
				</w:smartTag>
			</w:smartTag>

The previous implementation for a paragraph simply ignores instances of CTSmartTagRun.
My proposed patch introduces recusrive parsing for CTSmartTagRun. 
I did consider making all tags recursive, but this failed other tests. I think this might be an option for further improvement.

This makes test cases checking for smart tags pass and fixes two issues in Tika.

My implementation does discard the information from the smart tag.

Patch also contains minor cleanup of the mixed tab/spacing in this class, and removed a duplicate document!= null check.
Comment 1 Nick Burch 2011-12-06 04:31:57 UTC
Thanks for this, applied (with a few little tweaks) in r1210774.

(For future reference, it might be better to avoid trying to reformat the rest of the code, as it makes the reviewing a bit harder. We should really standardise, but not always in a feature patch!)