Created attachment 26970 [details] Patch against SVN trunk Opening ~5M XLS-Files takes about half a second and high CPU load. By a quick profiling I discovered a 55-million-times loop for getting the user style name for my XLS-files. This patch against trunk fixes the number of loops significantly, by fixing the 'iterative search' causing the high loop count. I'll attach the profiling details in a moment. As a summary, for my test cases I measured an average speedup = old_time / new_time = ~ 1.3 . Would be great if the patch could be included in the next 3.8 release. Cheers, Marcel
Created attachment 26971 [details] CPU Hotspots, before to the patch. Notice the 55m inv count on WorkbookRecordList
Created attachment 26972 [details] Including the patch, fixing the previous top hotspots (these were all related, up to the SharedValueManger thing).
The profiling screenshots attached show opening a single XLS file once: new HSSFWorkbook(new FileInputStream("....")); You can see that the top hotspots (first 7 are related) got fixed when you compare before/after the patch. Looping 55 million times does not occur anymore. Another micro bench mark including warmup phase followed by timed loops shows similiar results (speedup of ~1.3, or only ~75% of original time on average).
Fixed in r1103502, but in a diffrent way. The real problem was that HSSFCell.setCellStyle was called for every cell when constructing a workbook. This method is expensive and designed for assigning styles to individual cell and applying it to workbook scope causes performance issues. It appears that we don't need to call setCellStyle in the HSSFCell constructior at all, this line remained from all times (POI-3.5 or earlier) and does not make any sense in POI-3.8. So, I removed it. This fix should boost performance of opening .xls files even greater than ~1.3. Yegor
(In reply to comment #4) > Fixed in r1103502, but in a diffrent way. > > The real problem was that HSSFCell.setCellStyle was called for every cell when > constructing a workbook. This method is expensive and designed for assigning > styles to individual cell and applying it to workbook scope causes performance > issues. It appears that we don't need to call setCellStyle in the HSSFCell > constructior at all, this line remained from all times (POI-3.5 or earlier) and > does not make any sense in POI-3.8. So, I removed it. > > This fix should boost performance of opening .xls files even greater than ~1.3. > > Yegor Thanks, Yegor - your fix was even better :-) I profiled again against latest version 1127506 and noticed another hotspot in the SharedValueManager class. Here's an 8 million times iteration doing 'findFormularGroup'.
Created attachment 27064 [details] Profiled hotspots (opening of XLS file using current trunk v1127506)
Created attachment 27065 [details] Profiled cpu calltree (opening of XLS file using current trunk v1127506)
Created attachment 27067 [details] Improves SharedValueManager.findFormulaGroup
Created attachment 27068 [details] Shows profiled hotspots after second patch for SharedValueManager.findFormulaGroup
Created attachment 27069 [details] Shows profiled calltree after second patch for SharedValueManager.findFormulaGroup