Created attachment 27616 [details] Patch for issue Some message files appear to have additional information for charset when dealing with some US-ASCII types. Patch attached, looks for an occurrence of a semicolon and substrings the string if present. NOTE: won't work if a valid charset encoding for a string can contain semicolons as a valid option. Other option could be to modify Pattern used to produce charsets. Actual m.group(1) string returned from Content-Type: "US-ASCII; format=flowed; delsp=yes" Unable to attach sample file due to sensitive nature. Exception Message Stack Trace: POI-3.8-beta4 BaseTextExtractionService - Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2ddd595d org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2ddd595d Caused by: java.lang.RuntimeException: Encoding not found - US-ASCII; format=flowed at org.apache.poi.hsmf.datatypes.StringChunk.parseAs7BitData(StringChunk.java:155) at org.apache.poi.hsmf.datatypes.StringChunk.parseString(StringChunk.java:86) at org.apache.poi.hsmf.datatypes.StringChunk.set7BitEncoding(StringChunk.java:74) at org.apache.poi.hsmf.MAPIMessage.set7BitEncoding(MAPIMessage.java:413) at org.apache.poi.hsmf.MAPIMessage.guess7BitEncoding(MAPIMessage.java:373) at org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:73) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:219) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ... 49 more Caused by: java.io.UnsupportedEncodingException: US-ASCII; format=flowed at java.lang.StringCoding.decode(StringCoding.java:170) at java.lang.String.<init>(String.java:443) at java.lang.String.<init>(String.java:515) at org.apache.poi.hsmf.datatypes.StringChunk.parseAs7BitData(StringChunk.java:153) ... 56 more
Thanx. I updated regular expression that searches for charset in revision r1176780.
Thanks for the quick fix, I tested it and confirmed it works against the files I was having an issue with. (In reply to comment #1) > Thanx. I updated regular expression that searches for charset in revision > r1176780.