Bug 53556

Summary: Mispositioned Textboxes In Reading Doc Files Through HWPF
Product: POI Reporter: Vipul Kumar <vipulucky93>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED WONTFIX    
Severity: major CC: vipulucky93
Priority: P2 Keywords: APIBug
Version: 3.8-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Attachments: This is the document which i was unable to read properly.

Description Vipul Kumar 2012-07-17 07:40:03 UTC
Created attachment 29070 [details]
This is the document which i was unable to read properly.

I tried reading doc and docx files using Apache POI 3.8. It worked fine until i encountered textboxes.

If the format of the document is like this: 
paragraph 1 
textbox 1 
paragraph 2 
textbox 2 
paragraph 3 

Then the output should be: 
paragraph 1 textbox 1 paragraph 2 textbox 2 paragraph 3 
But HWPF reads such .doc file as: 
paragraph 1 paragraph 2 paragraph 3 textbox 1 textbox 2 

It seems to be adding textboxes at the end and not at the place where it should be, i.e. between the paragraphs.

In case of .docx files, XWPF didn't read textboxes at all.

I tried methods getText(), getTextFromPieces(), extractText(), getParagraphText(), but none of these helped.
Comment 1 Sergey Vladimirov 2012-11-06 16:42:33 UTC
Vipur,

Textboxes are graphical objects. Currently POI unable to detect exact place for textbox to be placed on the page. Another problem -- textbox can be anchored to the page (not to some paragraph), and there is no way to detect position in text to insert textbox content without page rendering (which POI doesn't).

Patches to detect textbox anchors position (exact point to insert text box content into document) are always welcomed.
Comment 2 Dominik Stadler 2017-09-11 19:36:23 UTC
No update on this for a very long time and as explained above, it is very hard to get this right for all cases. Therefore I am closing this as WONTFIX, please report new bugs if there are any contributions in this area.