Bug 45592 - OOXML text extraction improvement
Summary: OOXML text extraction improvement
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2008-08-07 14:36 UTC by Andrzej Bialecki
Modified: 2010-09-19 07:13 UTC (History)
0 users

Patch (2.58 KB, patch)
2008-08-07 14:36 UTC, Andrzej Bialecki
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrzej Bialecki 2008-08-07 14:36:26 UTC
Created attachment 22406 [details]

This patch improves the extraction of text found in docx documents, by processing nested tables and extracting text from pictures with text.
Comment 1 Nick Burch 2008-08-09 03:46:00 UTC
Thanks for this patch, applied to the ooxml branch

Is there any chance you could upload a sample file including both text containing pictures, and tables? That would allow us to write a unit test for this, to ensure it doesn't get broken in the future
Comment 2 Nick Burch 2010-09-19 07:13:20 UTC
Unit tests for XWPF extraction of images and text have since been added to svn