Bug 64020 - getHostAddress called unexpectedly, causing significant performance hit
Summary: getHostAddress called unexpectedly, causing significant performance hit
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 4.1.1-FINAL
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2019-12-19 11:22 UTC by Jamie
Modified: 2019-12-19 12:02 UTC (History)
0 users

stack trace (4.26 KB, text/plain)
2019-12-19 11:22 UTC, Jamie

Note You need to log in before you can comment on or make changes to this bug.
Description Jamie 2019-12-19 11:22:20 UTC
Created attachment 36923 [details]
stack trace

Our server uses POI for text extraction. When processing some documents, there is a deterioration in performance due to unexpected call to URLStreamHandler.getHostAddress(). .Please refer to the attached stack for an illustration of how this happens. It is due to a known oddity in the way that URL hashCode is implemented whereby it actually attempt to resolve a URL for equality testing purposes. A possible workaround is use the URI class instead of URL?
Comment 1 PJ Fanning 2019-12-19 11:53:23 UTC
in your stack trace, it appears to be org.apache.catalina.loader.WebappClassLoaderBase that is using the HashSet - not XMLBeans or POI code
Comment 2 PJ Fanning 2019-12-19 11:55:10 UTC
I'm not sure it would help but it might be useful if we added some options to XMLBeans to get it to configure the SAX parser not to read external files at all
Comment 3 Jamie 2019-12-19 12:02:49 UTC
My apologies. I guess I was skimming the stack too quick and missed that. Yes, it would be a great help if there was an option not to read external files. It would beespecially useful when performing text extraction on older documents for which the external references are likely to no longer exist. It could also be beneficial if some sort of parsing timeout could be implemented.