Bug 56130

Summary: Let's expose more of the codepage detection components in HSMF/Outlook
Product: POI Reporter: Tim Allison <tallison>
Component: HSMFAssignee: POI Developers List <dev>
Status: NEW ---    
Severity: enhancement    
Priority: P2    
Version: 3.10-dev   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: First version attached

Description Tim Allison 2014-02-12 16:35:08 UTC
HSMF currently gets code page information from one of three sources: the properties, the headers and/or the htmlbody (if it exists).  Let's refactor guess7BitEncoding() and enable clients to get this information.

Patch on way.
Comment 1 Tim Allison 2014-02-12 16:39:31 UTC
Created attachment 31305 [details]
First version attached

I noticed that there's a check for "utf-8" (and if it is "utf-8" ignore it) in the headers extraction component.  Do we want to add that to the codepoint and html extraction chunks, too?