Bug 65562 - SXSSF doesn't update dimension field
Summary: SXSSF doesn't update dimension field
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: SXSSF (show other bugs)
Version: 5.0.0-FINAL
Hardware: All All
: P2 minor (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on: 64536
Blocks:
  Show dependency tree
 
Reported: 2021-09-08 15:43 UTC by handymenny
Modified: 2022-08-15 10:37 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description handymenny 2021-09-08 15:43:30 UTC
I don't think it's a "real" bug, I haven't found the code that should update the dimension field, but rather an expected behaviour that I think could be handled better.
Of course it's an optional field, so it's not fundamental, but if it's populated with an invalid value it can break an application that relies on that field (as happened to me).

Let me suggest some ways to handle this "bug":
1. Add a warning to the documentation, advising to unset the dimension field from default CTWorksheet
2. Don't add the dimension field by default (the code is in org.apache.poi.xssf.usermodel.XSSFSheet)
3. Properly update the dimension field for SXSSFSheet too
Comment 1 PJ Fanning 2021-09-08 16:27:40 UTC
Could you provide some sample code to demo how to reproduce the issue?

This example creates a sheet with <dimension ref="A1"/> - possibly not ideal but doesn't seem to cause any harm.

https://github.com/pjfanning/poi-shared-strings-sample/blob/master/src/main/java/com/github/pjfanning/poi/sample/StreamingWrite.java

We probably don't want to try to produce a dimension with a cell range because that could affect the streaming - since this appears at the start of the sheet XML and we don't really know the last cell at that stage.

If we were to make a change, omitting the dimension element might be best - if that is something that Excel handles ok.
Comment 2 handymenny 2021-09-12 12:30:44 UTC
By "invalid" value, I am referring precisely to "<dimension ref="A1"/> "
I used the wrong word, it's not invalid but rather not "accurate".

What happens is that an application I use (not publicly available unfortunately) only imports cell A1 from the worksheet.
I don't think it's a bug of that application, because that value actually really means that there is only one cell in the worksheet and I expect that there are other applications that rely on that field (maybe to speed up parsing)

I've been using xlsx files without the dimension field for a few days now, I haven't noticed any issues with the latest versions of Excel and LibreOffice Calc.
Comment 3 PJ Fanning 2022-03-21 10:48:06 UTC
What if we add a `setDimension(CellRangeAddress)` on SXSSFSheet? I've recently added a getDimension on XSSFSheet (and we could add that on SXSSFWorkbook too).

This would at least allow users who know the values to set them on SXSSFSheet and we could write the value when we output. https://bz.apache.org/bugzilla/show_bug.cgi?id=64536#c2 describes why this is not an easy problem to solve for SXSSF.
Comment 4 PJ Fanning 2022-07-26 21:46:09 UTC
I added r1903037 - this best effort - since the data is not all kept in memory, the cell positions are used when they are added but if you delete some of them before writing the sheet, this could lead to the values not being correct.

I added a method on SXSSFWorkbook to turn this calculation off. Up until now, the dimension data for each sheet was added as just "A1" and this behaviour is retained if you call setShouldCalculateSheetDimensions(false) on SXSSFWorkbook.