Created attachment 39153 [details] file showing problem In 5.2.3, we used to get "ERROR: #VALUE!" instead of "#VALUE!" and we used to get "TRUE" or "FALSE" instead of "1" or "0". I haven't yet tracked down the cause, but in the debugger, I can see that in XSSFSheetXMLHandler#outputCell the value of this.nextDataType is ERROR for the first and BOOLEAN for the second in 5.2.3. However, the type is FORMULA in 5.2.4 for both, thus bypassing the correct handling for the extracted string.
Again, my apologies for not catching this in the regression tests before the 5.2.4 release.
In XSSFSheetXMLHandler in 5.2.3, there was a check before overwriting nextDataType: if (this.nextDataType == XSSFSheetXMLHandler.xssfDataType.NUMBER) { this.nextDataType = XSSFSheetXMLHandler.xssfDataType.FORMULA; } In 5.2.4, that check is gone, and the nextDataType is set to FORMULA and the containing type (at the cell level) is overwritten. <c r="A1" s="1" t="b"> <f aca="false">(2 > 5)</f> <v>0</v> </c> This change came in here: https://svn.apache.org/viewvc/poi/trunk/poi-ooxml/src/main/java/org/apache/poi/xssf/eventusermodel/XSSFSheetXMLHandler.java?r1=1897331&r2=1905575
Anyone have an idea what this fixes? commit message: "possible issue when streaming xssf cells with formulas"
that day, all the other changes were for https://bz.apache.org/bugzilla/show_bug.cgi?id=66365 - so it probably relates to that In the end of the day, POI has a lot of bugs and low test coverage. Can we start by working out of the 'regression' causes working code to now have a bug or if we have changed the issue because the old behaviour was a bug? I much prefer not reverting code without having a test scenario that can be used to decide if the regression is needed and the test case can protect against future regressions. I just added r1913064 and that seems to be the correct behaviour (to me).
It really seems to me that XSSFSheetXMLHandler data type tracking is deficient. In XSSFCell, we track the cellType and also have an extra CachedFormulaResultType so that we know can know a cell type is formula type but also know what type the cached result has.
I'm trying out r1913067. No guarantee that it won't have other side effects but it reverts in Comment 2 and uses a reworked set of code to support the option to output formulas instead of cached results (for formula cells).
Tim - my changes seem to work ok. Could you have a look at the latest code? I have noticed that the non streaming XSSFExcelExtractor does not have the code for taking formula results for booleans and errors and formatting the results like the streaming version does. I'll raise an issue for that and will change XSSFExcelExtractor if there are no objections. It makes sense that both extractors work similarly.
PJ thank you for diving into this. I only had time yesterday to figure out the source of the change. I would never suggest mindlessly reverting mods without unit tests for both the original mod and the potential reversion or other fix. I should have a chance to take a look later this morning. Thank you, again!