Bug 56347 - Get digital title error
Summary: Get digital title error
Status: RESOLVED WORKSFORME
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.8-FINAL
Hardware: PC All
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-04 02:50 UTC by klark_pang
Modified: 2017-07-22 08:11 UTC (History)
0 users



Attachments
test word file (29.00 KB, application/msword)
2014-04-04 02:50 UTC, klark_pang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description klark_pang 2014-04-04 02:50:10 UTC
Created attachment 31474 [details]
test word file

hi poi:
    i use poi-3.8 for extration text from world。But Not normally get a digital  title 。like this:
    
    world original text:
    一、  i like poi
       1、  i like poi
       2、  i like poi
    二、  i like poi
    三、  i like poi
    
    
   use poi-3.8 get:
    1、  i like poi
    1、  i like poi
    2、  i like poi
    1、  i like poi
    2、  i like poi

java code:
        FileInputStream fis = new FileInputStream(new File("e:/test.doc"));
        HWPFDocument doc =  new HWPFDocument(fis);
        WordToTextConverter wordToTextConverter= new WordToTextConverter();
    
        wordToTextConverter.processDocument( doc );
        System.out.println(wordToTextConverter.getText());
Comment 1 klark_pang 2014-04-04 06:39:41 UTC
sorry!!!
world ---> word
Comment 2 Nick Burch 2014-04-04 08:46:29 UTC
I don't believe that WordToTextConverter does any style related things

There are other converters which do, including ones in Apache POI, along with Apache Tika, I'd suggest you try one of those
Comment 3 klark_pang 2014-04-08 03:19:49 UTC
hi nick:
     Thank you for your answer. And hope give me some suggestion about converters. I can't find this api
(In reply to Nick Burch from comment #2)
> I don't believe that WordToTextConverter does any style related things
> 
> There are other converters which do, including ones in Apache POI, along
> with Apache Tika, I'd suggest you try one of those
Comment 4 Dominik Stadler 2017-07-22 08:11:11 UTC
The discussion here stalled at some point, if you still need more information about ways to extract text/style information, then please start a discussion on StackOverflow or on the mailing lists.