Bug 56130 - Let's expose more of the codepage detection components in HSMF/Outlook
Summary: Let's expose more of the codepage detection components in HSMF/Outlook
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: HSMF (show other bugs)
Version: 3.10-dev
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-12 16:35 UTC by Tim Allison
Modified: 2014-02-12 16:39 UTC (History)
0 users



Attachments
First version attached (6.85 KB, patch)
2014-02-12 16:39 UTC, Tim Allison
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2014-02-12 16:35:08 UTC
HSMF currently gets code page information from one of three sources: the properties, the headers and/or the htmlbody (if it exists).  Let's refactor guess7BitEncoding() and enable clients to get this information.

Patch on way.
Comment 1 Tim Allison 2014-02-12 16:39:31 UTC
Created attachment 31305 [details]
First version attached

I noticed that there's a check for "utf-8" (and if it is "utf-8" ignore it) in the headers extraction component.  Do we want to add that to the codepoint and html extraction chunks, too?