Bug 63576 - WordExtractor - capitalized text
Summary: WordExtractor - capitalized text
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2019-07-21 10:02 UTC by Franz Seidl
Modified: 2019-07-21 10:07 UTC (History)
1 user (show)

Example (5.91 KB, application/zip)
2019-07-21 10:02 UTC, Franz Seidl

Note You need to log in before you can comment on or make changes to this bug.
Description Franz Seidl 2019-07-21 10:02:04 UTC
Created attachment 36671 [details]

WordExtractor doesn't respect text which is formatted capitalized.

See attached example:
  - WordTextExtractorDoc.java: test program
  - capitalized.doc: test file
  - capitalized.txt: "text only" version saved with Word

I expect the text: "The following word is: CAPITALIZED."
Instead I get: "The following word is: capitalized."
Comment 1 Franz Seidl 2019-07-21 10:04:08 UTC
Similar to bug Bug 63575