Created attachment 34808 [details] this is a report table received from email.
Thanks for the bug report and including the problematic file. Before I spend time researching this, it would help us if you could answer a few questions. I'm assuming you get an exception (what exception class?) with the message "Initialisation of record 0x31 left 4 bytes remaining still to be read" when you open the workbook with > Workbook wb = WorkbookFactory.create(new File("buzhengc.xls")); If not, please include a stack trace and sample code so that we can reproduce the problem. What version of POI are you using? Is there anything written to stderr or the POILogger, that would suggest why we were 4 bytes short? Does this file open without any errors in Microsoft Excel or other spreadsheet application? If so, what version? Thanks in advance for the info.
FYI, Bug 57093 sounds similar.
@Javen O'Neal this file can open without any errors in Microsoft Excel or WPS application. i tested on poi-3.15.jar. this is my code. public static void main(String[] args) throws Exception{ String fileName = "F:\\Desktop\\buzhengc.xls"; InputStream inputStream = new FileInputStream(fileName); POIFSFileSystem fs = new POIFSFileSystem(inputStream); DirectoryEntry root = fs.getRoot(); System.out.println(root.getEntryNames()); HSSFWorkbook hssfworkbook = new HSSFWorkbook(fs); System.out.println(hssfworkbook.getNameName(0)); } i got this errors: [Workbook] Exception in thread "main" org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x31(FontRecord) left 4 bytes remaining still to be read. at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:174) at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:253) at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:494) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:341) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:304) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:251) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:221) at com.test.pppp.main(test.java:26)
Created attachment 34812 [details] The Excel BIFF output file created by BIFFVIEW.exe I found another problem. there is a WINDOW2 record before at ROW record.
I have the same exception when reading xls file the stack trace org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x31(FontRecord) left 4 bytes remaining still to be read. at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177) ~[poi-3.16.jar:3.16] at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:234) ~[poi-3.16.jar:3.16] at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:494) ~[poi-3.16.jar:3.16] at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:348) ~[poi-3.16.jar:3.16] at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:99) ~[poi-ooxml-3.16.jar:3.16] at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:182) ~[poi-ooxml-3.16.jar:3.16] at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:144) ~[poi-ooxml-3.16.jar:3.16]
There is no problem opening the file using Microsoft Excel on Mac operating system
(In reply to lintongchuan from comment #5) > I have the same exception when reading xls file > the stack trace > org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: > Initialisation of record 0x31(FontRecord) left 4 bytes remaining still to be > read. > at > org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream. > java:177) ~[poi-3.16.jar:3.16] > at > org.apache.poi.hssf.record.RecordFactoryInputStream. > nextRecord(RecordFactoryInputStream.java:234) ~[poi-3.16.jar:3.16] > at > org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java: > 494) ~[poi-3.16.jar:3.16] > at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:348) > ~[poi-3.16.jar:3.16] > at > org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:99) > ~[poi-ooxml-3.16.jar:3.16] > at > org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:182) > ~[poi-ooxml-3.16.jar:3.16] > at > org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:144) > ~[poi-ooxml-3.16.jar:3.16] There is no problem opening the file using Microsoft Excel on Mac operating system, when I open the excel file using Microsoft Excel , then save it as another file, another file can be correctly read and process.
We have 15 stacktraces like this in our regression corpus for Tika. I was hoping from the file attached here (first) and the file attached on Bug 57093 (second), that the first byte or two specified a length somehow. However, from govdocs1 085890.xls (third), it looks like junk at the end of the font record. By junk, of course, I mean, "I don't understand why it's there"...like junk DNA. :) But seriously, in 085890.xls, when I open the file in Excel and search for "providing", I don't find anything. First line is font name : length Remaining lines are: byte index : byte&0xff : char (if above 20) FONT NAME:黑体 : 2 0 : 0 : 1 : 0 : 2 : 0 : 3 : 0 : FONT NAME:MS Sans Serif : 13 0 : 19 : 1 : 0 : 2 : 1 : 3 : 0 : 4 : 0 : 5 : 88 : X 6 : 1 : 7 : 0 : 8 : 0 : 9 : 89 : Y 10 : 95 : _ 11 : 41 : ) 12 : 63 : ? 13 : 95 : _ 14 : 41 : ) 15 : 59 : ; 16 : 95 : _ 17 : 40 : ( 18 : 64 : @ 19 : 95 : _ 20 : 41 : ) 21 : 0 : FONT NAME:MS Sans Serif : 13 0 : 116 : t 1 : 129 : 2 : 84 : T 3 : 73 : I 4 : 84 : T 5 : 85 : U 6 : 84 : T 7 : 73 : I 8 : 79 : O 9 : 78 : N 10 : 95 : _ 11 : 80 : P 12 : 82 : R 13 : 79 : O 14 : 86 : V 15 : 73 : I 16 : 68 : D 17 : 73 : I 18 : 78 : N 19 : 71 : G 20 : 95 : _ 21 : 68 : D 22 : 65 : A 23 : 84 : T 24 : 65 : A 25 : 95 : _ 26 : 73 : I 27 : 68 : D 28 : 10 : 29 : 0 : 30 : 0 : 31 : 67 : C 32 : 79 : O 33 : 78 : N 34 : 84 : T 35 : 65 : A 36 : 67 : C 37 : 84 : T 38 : 95 : _ 39 : 73 : I 40 : 68 : D 41 : 20 :
Whoa... and govdocs1/093/093996.xls has seven font records with an extra 1918 bytes! No intelligible text (on a quick look)... FONT NAME:MS Sans Serif : 13 0 : 149 : 1 : 129 : 2 : 95 : _ 3 : 41 : ) 4 : 59 : ; 5 : 95 : _ ... 1893 : 0 : 1894 : 115 : s 1895 : 142 : 1896 : 78 : N 1897 : 0 : 1898 : 75 : K 1899 : 161 : 1900 : 78 : N 1901 : 0 : 1902 : 3 : 1903 : 180 : 1904 : 78 : N 1905 : 0 : 1906 : 227 : 1907 : 198 : 1908 : 78 : N 1909 : 0 : 1910 : 145 : 1911 : 214 : 1912 : 78 : N 1913 : 0 : 1914 : 48 : 0 1915 : 65 : A 1916 : 85 : U 1917 : 8 :
Stats from our regression corpus on which Excel records cause LeftoverDataExceptions (Record \t number of exceptions). 0x850(ChartFRTInfoRecord) 763 (Bug 47247) 0x85(BoundSheetRecord) 95 0x1D(SelectionRecord) 35 0x31(FontRecord) 15 0x203(NumberRecord) 8 0x42(CodepageRecord) 5 0x3C(ContinueRecord) 2 0x868(FeatRecord) 2 0x5B(FileSharingRecord) 1 0x5F(SaveRecalcRecord) 1 0xE(PrecisionRecord) 1
I'm experiencing same issue while using TIKA, the failure is very annoying and preventing from us to parse many excel files. Thanks