Bug 51678 - Extracting text from Bug51524.zip is slow
Summary: Extracting text from Bug51524.zip is slow
Status: VERIFIED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.8-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-18 13:18 UTC by Antoni Mylka
Modified: 2011-08-18 15:11 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Antoni Mylka 2011-08-18 13:18:11 UTC
The fix to the issue number 51524 solved the problem of a slow constructor. It takes 2 seconds on my machine now. It's still difficult to get any text from that document:

HWPFDocument d = HWPFTestDataSamples.openSampleFileFromArchive( "Bug51524.zip" );
WordExtractor e = new WordExtractor(d);
e.getText();

It seems to spend 99,99% of its time in o.a.p.hwpf.usermodel.Range.findRange(). Dunno if it's possible to do anything about it.
Comment 1 Sergey Vladimirov 2011-08-18 14:29:57 UTC
4 seconds in trunk now (including constructor)
Comment 2 Antoni Mylka 2011-08-18 15:11:25 UTC
You're fast. I had a 90%-working binary search implementation myself, after 4 hours. Gotta seriously brush up on my TopCoder skills.

Thanks very much anyway.