Bug 63576 - WordExtractor - capitalized text
Summary: WordExtractor - capitalized text
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-21 10:02 UTC by Franz Seidl
Modified: 2022-08-29 12:43 UTC (History)
1 user (show)



Attachments
Example (5.91 KB, application/zip)
2019-07-21 10:02 UTC, Franz Seidl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Franz Seidl 2019-07-21 10:02:04 UTC
Created attachment 36671 [details]
Example

WordExtractor doesn't respect text which is formatted capitalized.

See attached example:
  - WordTextExtractorDoc.java: test program
  - capitalized.doc: test file
  - capitalized.txt: "text only" version saved with Word

I expect the text: "The following word is: CAPITALIZED."
Instead I get: "The following word is: capitalized."
Comment 1 Franz Seidl 2019-07-21 10:04:08 UTC
Similar to bug Bug 63575
Comment 2 PJ Fanning 2022-08-28 14:16:15 UTC
added r1903738