CellFormat.getInstance() uses a global static cache which shares CellFormat instances across threads. But a CellFormat with a CellDateFormatter wraps a java.text.SimpleDateFormat which isn't thread-safe. The result is that calls like DataFormatter.formatCellValue() aren't thread-safe even when each thread uses a different instance of DataFormatter. Here's a test case that reproduces the problem. If you remove the parallelism from this test then it passes, otherwise it almost always fails: import org.apache.poi.ss.usermodel.DataFormatter; import org.junit.Test; import java.util.concurrent.CompletableFuture; import static org.junit.Assert.assertEquals; public class FormatterTest { @Test public void testCellFormat() { int formatIndex = 105; String formatString = "[$-F400]m/d/yy h:mm:ss\\ AM/PM;[$-F400]m/d/yy h:mm:ss\\ AM/PM;_-* \"\"??_-;_-@_-"; CompletableFuture<Void> future1 = CompletableFuture.runAsync( () -> doFormatTest(43551.50990171296, "3/27/19 12:14:15 PM", formatIndex, formatString)); CompletableFuture<Void> future2 = CompletableFuture.runAsync( () -> doFormatTest(36104.424780092595, "11/5/98 10:11:41 AM", formatIndex, formatString)); future1.join(); future2.join(); } private void doFormatTest(double n, String expected, int formatIndex, String formatString) { DataFormatter formatter = new DataFormatter(); for (int i = 0; i < 1_000; i++) { String actual = formatter.formatRawCellContents(n, formatIndex, formatString); assertEquals("Failed on iteration " + i, expected, actual); } } }
Maybe we should switch to java.time formatters.
Thanks for the nice test-case. The cache-map itself is guarded via "synchronized getInstance()", so only the SimpleDateFormat seems to be the culprit. Minimal change would be to synchronize during access to the dateFmt object in CellDateFormatter. An option would also be to add a dependency on commons-lang3 and use FastDateFormat which is a drop-in replacement.
I added a test using https://github.com/apache/poi/commit/ee5fa41dccdd7f18034cd73ed139b654429a98e0 It passes without any code modification. It would be great if someone could review my test to see if I made a mistake with it.
In your 'doFormatTestConcurrent' method all threads are using the same value of 'n' so they're all getting the same string. That's why the test passes. In my original version, one thread is using 43551.50990171296 while the other is using 36104.424780092595. Eventually one thread returns a result with field values from the other.
Thanks. I've reworked the test and updated the code to get it to pass.
https://github.com/apache/poi/commit/cd74839d17d9b3813d9622a4b42be638ce4a3cfa
Thanks! I haven't actually run your changes locally, but one thing to verify is that the fix still works if each thread uses a different instance of DataFormatter. The actual problem call to 'SimpleDateFormat' occurs from this line: new CellFormatResultWrapper( cfmt.apply(cellValueO) ) which is wrapped by the 'private synchronized Format getFormat' which synchronizes on the DataFormatter instance, not globally. I have a concern that two different DataFormatter instances could still race on the same instance of 'SimpleDateFormat'. The test case doesn't cover this. It might be better to add the 'synchronize' to the CellDateFormatter.formatValue() method instead of to the methods in DataFormatter.
You are right, I can make the updated test fail by using two different concurrent DataFormatter instances. Since all the other format types appear to be thread-safe, I think it is best to make thread safety a documented part of the CellFormatter abstract methods. Then CellDateFormatter can handle its own synchronization on the specific SimpleDateFormat instances, as you suggest. Removing synchronized from DataFormatter.getFormat() and adding it to CellDateFormatter.formatValue() passes the updated test that fails for multiple concurrent DataFormatter instances, which I believe is a superset of the testing for a single DataFormatter instance, so this change fixes all the cases. committed in revision 1856647.