Created attachment 39457 [details] Crash samples Recently we discovered a bug in poi (5.2.3). Due to the lack of contextual knowledge in the poi library, we cannot thoroughly fix some bugs hence we look forward to any proposed plan from the developers in fixing these bugs. # Crash Stack ``` ('org.apache.poi.hssf.record.aggregates.FormulaRecordAggregate.<init>', 'FormulaRecordAggregate.java:73'), Exception in thread "main" java.lang.IllegalArgumentException: Unexpected base token id (-64) at org.apache.poi.ss.formula.ptg.Ptg.createBasePtg(Ptg.java:170) at org.apache.poi.ss.formula.ptg.Ptg.createPtg(Ptg.java:92) at org.apache.poi.ss.formula.ptg.Ptg.readTokens(Ptg.java:66) at org.apache.poi.ss.formula.Formula.getTokens(Formula.java:89) at org.apache.poi.hssf.record.FormulaRecord.getParsedExpression(FormulaRecord.java:213) at org.apache.poi.hssf.record.aggregates.FormulaRecordAggregate.handleMissingSharedFormulaRecord(FormulaRecordAggregate.java:94) at org.apache.poi.hssf.record.aggregates.FormulaRecordAggregate.<init>(FormulaRecordAggregate.java:73) at org.apache.poi.hssf.record.aggregates.ValueRecordsAggregate.construct(ValueRecordsAggregate.java:179) at org.apache.poi.hssf.record.aggregates.RowRecordsAggregate.<init>(RowRecordsAggregate.java:113) at org.apache.poi.hssf.model.InternalSheet.<init>(InternalSheet.java:189) at org.apache.poi.hssf.model.InternalSheet.createSheet(InternalSheet.java:128) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:382) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:431) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:411) at com.test.Entry.main(Entry.java:34) ``` ``` ('org.apache.poi.hssf.record.common.UnicodeString.<init>', 'UnicodeString.java:96'), Exception in thread "main" java.lang.IllegalArgumentException: Cannot create a ChainLoopDetector with negative size, but had: -2147483648 at org.apache.poi.poifs.filesystem.BlockStore$ChainLoopDetector.<init>(BlockStore.java:89) at org.apache.poi.poifs.filesystem.POIFSMiniStore.getChainLoopDetector(POIFSMiniStore.java:237) at org.apache.poi.poifs.filesystem.POIFSStream$StreamBlockByteBufferIterator.<init>(POIFSStream.java:195) at org.apache.poi.poifs.filesystem.POIFSStream.getBlockIterator(POIFSStream.java:96) at org.apache.poi.poifs.filesystem.POIFSStream.iterator(POIFSStream.java:87) at org.apache.poi.poifs.filesystem.POIFSDocument.getBlockIterator(POIFSDocument.java:177) at org.apache.poi.poifs.filesystem.DocumentInputStream.<init>(DocumentInputStream.java:92) at org.apache.poi.poifs.filesystem.DirectoryNode.createDocumentInputStream(DirectoryNode.java:160) at org.apache.poi.poifs.filesystem.DirectoryNode.createDocumentInputStream(DirectoryNode.java:137) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:369) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:431) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:411) at com.test.Entry.main(Entry.java:34) ``` # Test Program ``` package com.test; import java.io.File; import java.io.InputStream; import java.io.FileInputStream; import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import org.apache.poi.hssf.usermodel.HSSFWorkbook; import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.CellType; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.xssf.usermodel.XSSFWorkbook; public class Entry { public static void main (String args[]) throws IOException { assert args.length == 1; System.out.println("Testing Harness with args[0]: " + args[0]); try { FileInputStream fis = new FileInputStream(args[0]); Workbook workbook = null; workbook = new HSSFWorkbook(fis); int numberOfSheets = workbook.getNumberOfSheets(); for(int i=0; i < numberOfSheets; i++){ Sheet sheet = workbook.getSheetAt(i); Iterator<Row> rowIterator = sheet.iterator(); while (rowIterator.hasNext()) { String name = ""; String shortCode = ""; Row row = rowIterator.next(); Iterator<Cell> cellIterator = row.cellIterator(); while (cellIterator.hasNext()) { Cell cell = cellIterator.next(); if (cell.getCellType() == CellType.STRING){ name = cell.getStringCellValue().trim(); System.out.println("Random data::"+ name); } else if (cell.getCellType() == CellType.NUMERIC){ System.out.println("Random data::"+cell.getNumericCellValue()); } } } fis.close(); } } catch (IOException e) { e.printStackTrace(); } } ```
* POI is a volunteer project and the community is no longer very active * The test case and stacktrace you provided are not very useful. Please provide an xls file that reproduces the issue. * The IllegalArgumentException could be because there is a number overflow somewhere - but this could be a sign that you have a very big file * I am not going to open your zip file. I have no idea what is in it but it doesn't appear to be an xls file. In the end of the day, users will need to get used to the idea that they will need to roll up their own sleeves and do a lot of the investigation themselves. The POI code is plain Java. It's fairly complicated but a motivated developer should be able to make some reasonable progress with working out how it works.
At this stage, there isn't much interest in maintaining of enhancing the HSSF code. xls format is prehistoric. xlsx is much better supported. I, for one, will occasionally look at XSSF issues (xlsx files) but have very little interest in HSSF issues.
Created attachment 39459 [details] POC xls file Sorry for the inconvenient. I attach the xls file that can crash the test program.
Created attachment 39460 [details] POC xls file2
I tried 'POC xls file' and it won't open in Excel (outlook.com). I also tried 'POC xls file2' and it was 'repaired' by Excel but nothing was left after it was repaired. Those files are corrupt.
See https://bz.apache.org/bugzilla/show_bug.cgi?id=68336 -- this user is just fuzzing files and expecting someone else to deal with them.
Apache POI does not try to handle broken documents without throwing exceptions. It tries to not allocate endless amounts of memory and not run into endless loops/stackoverflow-exceptions. Therefore in this case it seems fine to get this type of exception when the input data is actually a document produced by a fuzzer. See https://github.com/google/oss-fuzz/tree/master/projects/apache-poi/src/main/java/org/apache/poi for some fuzz-targets and which exceptions they currently handle "gracefully".