Summary: | Reading special characters from the MS Word Document thru POI and webdav | ||
---|---|---|---|
Product: | POI | Reporter: | Srivani <srivani.tsama> |
Component: | HPSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED INVALID | ||
Severity: | critical | ||
Priority: | P3 | ||
Version: | 2.5-FINAL | ||
Target Milestone: | --- | ||
Hardware: | Sun | ||
OS: | Solaris | ||
Attachments: |
Checkinfilter.java
Sample Document |
Description
Srivani
2004-08-23 18:16:51 UTC
I don't understand what you are doing and especially I don't know what is means to "read from the MS Word Document thru POI and webdav". You should give more details so we can help better. However, I suppose that this is not a POI problem since - as you say - reading the POI file under Windows works. Did you set the LANG environment under Solaris to a sensible value? If you don't the JVM reads ASCII characters only and transforms anything else to '?' characters. I did change the lang property to UTF8/ISO8859-1, but i still have the problem. What i am trying to do here is 1. Webdav folders are like Windows Explorer which follows HTTP protcol are accessible from My networkplaces. 2. Webdav - Dropping the MS word doc in the webdav folder thru my networkplaces ( This should automatically check in the doc to the CMS Server) 3. When i drop in the file, i am applying POI library to read the title from the MS-word before checking into the Content Server(JUST FYI content server allows some check in filters and the code is enclosed here). public int doFilter(Workspace ws, DataBinder binder, ExecutionContext cxt) throws DataException, ServiceException { if(isWordDoc(binder)) { String fileName = binder.getLocal("primaryFile:path"); try{ POIFSReader r = new POIFSReader(); MyPOIFSReaderListener listner = new MyPOIFSReaderListener(); //r.registerListener(listner,""); r.registerListener (listner,"\005SummaryInformation"); r.read(new FileInputStream(fileName)); String title = listner.getTitle(); System.out.println(" My Title: \"" + title + "\""); if(title != null) binder.putLocal("dDocTitle",title); }catch(java.io.FileNotFoundException e) { System.out.println("FileNotFoundException : " + fileName); }catch(java.io.IOException e) { System.out.println("IOException : " + fileName); } } // filter executed correctly. Return CONTINUE return CONTINUE; } Two questions: - What does your MyPOIFSReaderListener look like? - Can you provide a sample document together with the output of your CMS filter code? Created attachment 12537 [details]
Checkinfilter.java
Created attachment 12538 [details]
Sample Document
Attached the CheckinFilter.java and Sample Document that i am reading from. And the output of the getTitle is ������ Network Appliance - Press Release - 02/17/2004�议�� The sample document contains those funny characters in the title, and POI extracts them correctly. The rest of the sample document looks fine. How the special characters got into the title property and whether that's correct or not is outside the scope of POI resp. HPSF. |