Created attachment 38549 [details] ConvertExcelToCSVProcessor Error SS I was working on a task where the input excel file size of around ~35MBs and while parsing from the ConvertExcelToCSVProcessor it failed. ConvertExcelToCSVProcessor[id=9434baba-8749-3634-8ee1-7189f6d7831e] Failed to process incoming Excel document. Tried to allocate an array of length 226,239,061, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride(): org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 226,239,061, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() ConvertExcelToCSVProcessor Version: ConvertExcelToCSVProcessor 1.18.0.2.2.6.0-260
Have you tried this bit in the exception message - 'consider setting a higher override value with IOUtils.setByteArrayMaxOverride()'? The point of the 'If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type' bit of the message is that you provide us with a file where we can decide if the default limits in POI should be increased. POI is not released very regularly, so even if we do increase the default limits, it could be quite some time until a new set of official release jars are available.
Hi Pj - thanks for commenting. I am using Apache Nifi in CDP Public Cloud and I am facing this exception in ConvertExcelToCSVProcessor (v: 1.18.0.2.2.6.0-260)
You would be best off to talk to NiFi users about how to set the POI IOUtils setByteArrayMaxOverride value.
The CSV-processor is provided by Apache NiFi ( https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-poi-bundle/nifi-poi-processors/src/main/java/org/apache/nifi/processors/poi/ConvertExcelToCSVProcessor.java ) which uses Apache POI for reading provided worksheets. 35MB compressed bytes of Excel document are already fairly large, so it would be best if Apache NiFi can add some way of allowing users to adjust "ByteArrayMaxOverride" when handling such large documents because if we would increase it in Apache POI for everyone, we would actually make this important security safeguard much less useful. Thus we do not plan to change anything here for now.
Hi Team, Thanks for commenting. I connected with Cloudera and they are working on it for the fix in the latest version of CDP. Thanks. I totally understand that there should be a limit for this but technically it shouldn't be so. This should be flexible enough to handle such things instead of changing the code on POI side.