Bug 63291

Summary: CellFormat global cache isn't thread-safe
Product: POI Reporter: Shawn Smith <shawn>
Component: SS CommonAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 4.0.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X 10.1   

Description Shawn Smith 2019-03-27 17:55:14 UTC
CellFormat.getInstance() uses a global static cache which shares CellFormat instances across threads.  But a CellFormat with a CellDateFormatter wraps a java.text.SimpleDateFormat which isn't thread-safe.  The result is that calls like DataFormatter.formatCellValue() aren't thread-safe even when each thread uses a different instance of DataFormatter.

Here's a test case that reproduces the problem.  If you remove the parallelism from this test then it passes, otherwise it almost always fails:

import org.apache.poi.ss.usermodel.DataFormatter;
import org.junit.Test;

import java.util.concurrent.CompletableFuture;

import static org.junit.Assert.assertEquals;

public class FormatterTest {
    @Test
    public void testCellFormat() {
        int formatIndex = 105;
        String formatString = "[$-F400]m/d/yy h:mm:ss\\ AM/PM;[$-F400]m/d/yy h:mm:ss\\ AM/PM;_-* \"\"??_-;_-@_-";

        CompletableFuture<Void> future1 = CompletableFuture.runAsync(
                () -> doFormatTest(43551.50990171296, "3/27/19 12:14:15 PM", formatIndex, formatString));
        CompletableFuture<Void> future2 = CompletableFuture.runAsync(
                () -> doFormatTest(36104.424780092595, "11/5/98 10:11:41 AM", formatIndex, formatString));
        future1.join();
        future2.join();
    }

    private void doFormatTest(double n, String expected, int formatIndex, String formatString) {
        DataFormatter formatter = new DataFormatter();
        for (int i = 0; i < 1_000; i++) {
            String actual = formatter.formatRawCellContents(n, formatIndex, formatString);
            assertEquals("Failed on iteration " + i, expected, actual);
        }
    }
}
Comment 1 PJ Fanning 2019-03-27 18:06:02 UTC
Maybe we should switch to java.time formatters.
Comment 2 Dominik Stadler 2019-03-27 18:52:41 UTC
Thanks for the nice test-case.

The cache-map itself is guarded via "synchronized getInstance()", so only the SimpleDateFormat seems to be the culprit. 

Minimal change would be to synchronize during access to the dateFmt object in CellDateFormatter.

An option would also be to add a dependency on commons-lang3 and use FastDateFormat which is a drop-in replacement.
Comment 3 PJ Fanning 2019-03-27 21:54:16 UTC
I added a test using https://github.com/apache/poi/commit/ee5fa41dccdd7f18034cd73ed139b654429a98e0

It passes without any code modification.

It would be great if someone could review my test to see if I made a mistake with it.
Comment 4 Shawn Smith 2019-03-27 22:06:07 UTC
In your 'doFormatTestConcurrent' method all threads are using the same value of 'n' so they're all getting the same string.  That's why the test passes.

In my original version, one thread is using 43551.50990171296 while the other is using 36104.424780092595.  Eventually one thread returns a result with field values from the other.
Comment 5 PJ Fanning 2019-03-27 23:12:03 UTC
Thanks. I've reworked the test and updated the code to get it to pass.
Comment 7 Shawn Smith 2019-03-27 23:27:55 UTC
Thanks!

I haven't actually run your changes locally, but one thing to verify is that the fix still works if each thread uses a different instance of DataFormatter.

The actual problem call to 'SimpleDateFormat' occurs from this line:

  new CellFormatResultWrapper( cfmt.apply(cellValueO) )

which is wrapped by the 'private synchronized Format getFormat' which synchronizes on the DataFormatter instance, not globally.  I have a concern that two different DataFormatter instances could still race on the same instance of 'SimpleDateFormat'.  The test case doesn't cover this.

It might be better to add the 'synchronize' to the CellDateFormatter.formatValue() method instead of to the methods in DataFormatter.
Comment 8 Greg Woolsey 2019-03-30 19:33:53 UTC
You are right, I can make the updated test fail by using two different concurrent DataFormatter instances.

Since all the other format types appear to be thread-safe, I think it is best to make thread safety a documented part of the CellFormatter abstract methods.  Then CellDateFormatter can handle its own synchronization on the specific SimpleDateFormat instances, as you suggest.

Removing synchronized from DataFormatter.getFormat() and adding it to CellDateFormatter.formatValue() passes the updated test that fails for multiple concurrent DataFormatter instances, which I believe is a superset of the testing for a single DataFormatter instance, so this change fixes all the cases.

committed in revision 1856647.