Created attachment 37957 [details] An Example File for this case Hi everybody, We use poi lib to consolidate different Excel files from different sources. Among other filetypes, RAW-XML files with the extension .xls are also available as a special case. These files were catched with a try-catch and the exception NotOLE2FileException. On the catch part we passed on that stuff to Tika for further processing. Up to version 4.1.2, this worked very well. Unfortunately, as of POI 5.0.0, the WorkbookFactory no longer throws this exception in the respective error case. Much more is delivered back a null value instead of an workbook. I was able to see a change in the FileMagic class, which seems to be used for this. The constants OOXML_FILE_HEADER and RAW_XML_FILE_HEADER from POIFSConstants no longer exist. Therefor the values are given directly to the ENUMS in the FileMagic class. But the type of the value is no longer an array of bytes. it may be possible that this is causing the error, but i'm not sure. Anyway, the enums BIFF2 and BIFF3 is also changing the explizit declariton to the type byte[]. Thx for any help in advance! Code Snip: Workbook myWorkBook; File xls = new File("Example.XLS"); try { myWorkBook = WorkbookFactory.create(xls); } catch (NotOLE2FileException ex) { if (ex.getMessage().contains("The supplied data appears to be a raw XML file")) { return MyTika.parseHTMLandXMLTable(xls); } else if (ex.getMessage().contains("Invalid header signature")) { return MyTika.parseHTMLandXMLTable(xls); } else throw ex; }
An observation - WorkbookFactory.create(InputStream) does fail as expected - an IOException("Can't open workbook - unsupported file type: XML') for the attached file
added a fix for POI 5.1.0 - r1894097
PS the new change is an IOException not an NotOLE2FileException - to keep the behaviour consistent with other WorkbookFactory create methods
The POIDocument Factories should behave the same way. please decide if this is handled on this entry or a new one.
This issue was raised about one factory - WorkbookFactory - and its create methods that are not consistent in POI 5.0.0. Would it be possible to track changes for other factories on a separate issue and specify the factories?