Summary: | Multithreading bug when reading 2 similar files | ||
---|---|---|---|
Product: | POI | Reporter: | Uri Sherman <urishe> |
Component: | HSSF | Assignee: | POI Developers List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | seebass |
Priority: | P2 | ||
Version: | 3.10-FINAL | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | All | ||
Attachments: | A file for which the problem reproduces |
This was introduced in Bug 55612, unfortunately only a lock on a static object will suffice unless we replace it with something a bit more sophisticated, e.g. a thread-local. Fixed via r1597637, we now use ThreadLocals to keep the cache and thus avoid multi-threading issues. There should not be a huge overhead per Thread as we keep a string, a short and a list of (int+boolean+string), which is replaced for every cache-entry, so it cannot grow unbounded. *** Bug 56453 has been marked as a duplicate of this bug. *** A short suggestion - I saw you are trying to avoid synchronization on a static variable here. Note that every call to DateUtil.isCellDateFormatted(a-numeric-cell) ends up in a call to DateUtil.isADateFormat() which itself also maintains a static cache of the last result and synchronizes on DateUtil.class. In my case for example, the issue with the formatString arises by a call to DateUtil.isCellDateFormatted(cell), so I still get monitor locks upon each call even though you use thread locals for the style formatString cache. Seems like it would be a good idea to use the same strategy and switch the DateUtil.isADateFormat() cache to be ThreadLocal based rather than synchronization based. |
Created attachment 31660 [details] A file for which the problem reproduces When reading two copies of the same file (doesn't necessarily need to be the exact same file, but do need to contain the same styles or something along those lines), from two threads simultaneously (each thread processes its own file), the format string of cell styles get mixed up. The following code demonstrates this, printing to System.out every time a non date cell is mistakenly recognized as date. Note that starting only one of the threads yields no System.out messages. public class CellFormatBugExample implements Runnable { public static void main(String[] args) { new Thread(new CellFormatBugExample("C:/temp/file_1.xls")).start(); new Thread(new CellFormatBugExample("C:/temp/file_2.xls")).start(); } String filePath; public CellFormatBugExample(String filePath) { this.filePath = filePath; } @Override public void run() { File inputFile = new File(filePath); try (FileInputStream stream = new FileInputStream(inputFile)) { Workbook wb = WorkbookFactory.create(stream); Sheet sheet = wb.getSheetAt(0); for (Row row : sheet) { for (Integer idxCell = 0; idxCell < row.getLastCellNum(); idxCell++) { Cell cell = row.getCell(idxCell); cell.getCellStyle().getDataFormatString(); if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) { boolean isDate = HSSFDateUtil.isCellDateFormatted(cell); if (idxCell > 0 && isDate) { System.out.println("cell " + idxCell + " is not a date!"); } } } } } catch (Exception e) { e.printStackTrace(); } } } Make another copy of the attached file - "file_2.xls" and run the code to reproduce. Digging around a bit, seems the cause for this is a caching bug in the HSSFCellStyle.getDataFormatString() method. As far as I could understand a simple resolution would be to synchronize access to that method.