Bug 66586 - unable to convert file to CSV using ConvertExcelToCSVProcessor
Summary: unable to convert file to CSV using ConvertExcelToCSVProcessor
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HDF (show other bugs)
Version: unspecified
Hardware: Other Linux
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-01 21:02 UTC by Nabeel Raza
Modified: 2023-05-06 19:29 UTC (History)
0 users



Attachments
ConvertExcelToCSVProcessor Error SS (109.21 KB, image/png)
2023-05-01 21:02 UTC, Nabeel Raza
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nabeel Raza 2023-05-01 21:02:57 UTC
Created attachment 38549 [details]
ConvertExcelToCSVProcessor Error SS

I was working on a task where the input excel file size of around ~35MBs and while parsing from the ConvertExcelToCSVProcessor it failed. 

ConvertExcelToCSVProcessor[id=9434baba-8749-3634-8ee1-7189f6d7831e] Failed to process incoming Excel document. Tried to allocate an array of length 226,239,061, but the maximum length for this record type is 100,000,000.
If the file is not corrupt or large, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride(): org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 226,239,061, but the maximum length for this record type is 100,000,000.
If the file is not corrupt or large, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()




ConvertExcelToCSVProcessor Version: ConvertExcelToCSVProcessor 1.18.0.2.2.6.0-260
Comment 1 PJ Fanning 2023-05-01 21:09:52 UTC
Have you tried this bit in the exception message - 'consider setting a higher override value with IOUtils.setByteArrayMaxOverride()'?

The point of the 'If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type' bit of the message is that you provide us with a file where we can decide if the default limits in POI should be increased.

POI is not released very regularly, so even if we do increase the default limits, it could be quite some time until a new set of official release jars are available.
Comment 2 Nabeel Raza 2023-05-01 21:35:30 UTC
Hi Pj - thanks for commenting. I am using Apache Nifi in CDP Public Cloud and I am facing this exception in ConvertExcelToCSVProcessor (v: 1.18.0.2.2.6.0-260)
Comment 3 PJ Fanning 2023-05-01 21:41:38 UTC
You would be best off to talk to NiFi users about how to set the POI IOUtils setByteArrayMaxOverride value.
Comment 4 Dominik Stadler 2023-05-06 17:17:48 UTC
The CSV-processor is provided by Apache NiFi ( https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java ) which uses Apache POI for reading provided worksheets. 

35MB compressed bytes of Excel document are already fairly large, so it would be best if Apache NiFi can add some way of allowing users to adjust "ByteArrayMaxOverride" when handling such large documents because if we would increase it in Apache POI for everyone, we would actually make this important security safeguard much less useful.

Thus we do not plan to change anything here for now.
Comment 5 Nabeel Raza 2023-05-06 19:29:49 UTC
Hi Team, Thanks for commenting. I connected with Cloudera and they are working on it for the fix in the latest version of CDP. 

Thanks. I totally understand that there should be a limit for this but technically it shouldn't be so. This should be flexible enough to handle such things instead of changing the code on POI side.