Bug 53951 - java.io.UnsupportedEncodingException: Codepage number may not be 0
Summary: java.io.UnsupportedEncodingException: Codepage number may not be 0
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: unspecified
Hardware: Macintosh other
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-02 12:33 UTC by Matt MacDonald
Modified: 2012-10-04 11:32 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matt MacDonald 2012-10-02 12:33:37 UTC
Hi,

I'm using Nutch to crawl websites, using Tika to parse documents. Encountered the following ERROR and thought that this would be the place to log it.

2012-09-22 22:30:03,321 ERROR tika.TikaParser - Error parsing http://www.montpelier-vt.org/upload/groups/384/files/meac_11.17.10.doc
java.io.UnsupportedEncodingException: Codepage number may not be 0
	at org.apache.poi.hpsf.VariantSupport.codepageToEncoding(VariantSupport.java:338)
	at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:240)
	at org.apache.poi.hpsf.Property.<init>(Property.java:164)
	at org.apache.poi.hpsf.Section.<init>(Section.java:277)
	at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452)
	at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:247)
	at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:67)
	at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:57)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
	at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:124)
	at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:36)
	at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:23)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)
2012-09-22 22:30:03,322 WARN  parse.ParseUtil - Unable to successfully parse content http://www.montpelier-vt.org/upload/groups/384/files/meac_11.17.10.doc of type application/x-tika-msoffice
Comment 1 Yegor Kozlov 2012-10-04 11:32:29 UTC
poi-trunk can parse the referenced file without problems. Please upgrade POI jars in your Nutch distribution or wait for the next Tika release.

Yegor