Bug 54838 - StringBuffer could be replaced by StringBuilder for performance
Summary: StringBuffer could be replaced by StringBuilder for performance
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.9-FINAL
Hardware: Macintosh other
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-13 10:01 UTC by Fabian Zeindl
Modified: 2015-08-16 21:19 UTC (History)
1 user (show)



Attachments
List of places where StringBuffer or Vector are used currently (60.28 KB, text/plain)
2013-05-25 15:23 UTC, Dominik Stadler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Zeindl 2013-04-13 10:01:34 UTC
I noticed the usage of StringBuffer in classes like ReadOnlySharedStringTable.java.

Since POI isn't thread-safe, the StringBuffers could be replaced by StringBuilder which are faster, since they require no-locking. Same goes for possibly used Vectors etc.
Comment 1 Nick Burch 2013-04-14 17:59:26 UTC
Any chance you could work up a patch for this, and some tests that show what the performance improvement works out as?
Comment 2 Fabian Zeindl 2013-04-14 20:22:40 UTC
Not at the moment, sorry.

Most of it could probably be done by simply replacing HashTable with HashMap, Vector with ArrayList and StringBuffer with StringBuilder, since I think they're all API-compatible.

If you are threading somewhere and make use of the thread-safety of these classes there, that would need special care. A quick search for the words "synchronized" and "Thread" gives some results, for example RecordContainer.java, HWPFOutputStream.java etc.

It's also possible that users use POI in a multi-threaded manner and it works for them because of the threadsafe Collections, or because they are lucky. This usage might be broken after changing that.

Regards.
Comment 3 Nick Burch 2013-04-14 20:35:54 UTC
The general rule on POI and thread safety is that you should only ever have one thread working on a given file, but you should always be fine to have multiple threads each working on their own individual files.

As long as your patch (when you get a chance to write it!) keeps us within those rules, and makes it faster, it'll be much appreciated :)
Comment 4 Dominik Stadler 2013-05-25 15:23:19 UTC
Created attachment 30324 [details]
List of places where StringBuffer or Vector are used currently

I scanned the current source code for StringBuffer, Vector and HashTable. 

HashTable seems to be not used anywhere currently. 

For StringBuffer and Vector I found 551 occurrences all over the place. 

I would not blindly start replacing all those, risking bugs or race-conditions being introduced underway, without some idea where in the code performance is important and should be improved. 

Is there a set of unit tests or enhanced tests which run a set of performance-sensitive operations which we can use to profile and analyze where performance improvements are best targeted at? Or are there certain things where you would like to gain performance which could benefit from using the non-thread-safe variants?
Comment 5 Nick Burch 2014-07-24 16:43:26 UTC
In r1613186, I've updated the Vector uses that are clearly fine to ArrayList

Where a StringBuilder is used within a method, then toString'd at the end and discarded, I can't see why those can't be quickly and easily updated. Where the StringBuffer is persisted beyond one method in any way, or passed outside the method, then someone would need to think a little bit.

Does someone fancy updating the StringBuffer instances where are self contained (eg used in toString() methods)? We can then review what's left, which'll hopefully be a much smaller list
Comment 6 Dominik Stadler 2015-08-16 21:19:41 UTC
I don't think it makes a lot of sense to start replacing those in all places and potentially introducing hard to find bugs in exchange for none or only minor performance improvements. We should rather find out cases where there is a performance penalty because of these and fix those cases one by one. 

Feel free to report bugs for any such place together with a simple unit-test which allows to see the effect of the change.