Bug 45592

Summary: OOXML text extraction improvement
Product: POI Reporter: Andrzej Bialecki <ab>
Component: HWPFAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Attachments: Patch

Description Andrzej Bialecki 2008-08-07 14:36:26 UTC
Created attachment 22406 [details]

This patch improves the extraction of text found in docx documents, by processing nested tables and extracting text from pictures with text.
Comment 1 Nick Burch 2008-08-09 03:46:00 UTC
Thanks for this patch, applied to the ooxml branch

Is there any chance you could upload a sample file including both text containing pictures, and tables? That would allow us to write a unit test for this, to ensure it doesn't get broken in the future
Comment 2 Nick Burch 2010-09-19 07:13:20 UTC
Unit tests for XWPF extraction of images and text have since been added to svn