Bug 65452

Summary: NotOLE2FileException not thrown in POI 5.0.0 by opening an XML-RAW File with WorkbookFactory.create()
Product: POI Reporter: johannes.summerer
Component: POIFSAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: regression    
Priority: P2    
Version: 5.0.0-FINAL   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: An Example File for this case

Description johannes.summerer 2021-07-15 14:06:34 UTC
Created attachment 37957 [details]
An Example File for this case

Hi everybody,

We use poi lib to consolidate different Excel files from different sources. Among other filetypes, RAW-XML files with the extension .xls are also available as a special case.

These files were catched with a try-catch and the exception NotOLE2FileException. On the catch part we passed on that stuff to Tika for further processing. Up to version 4.1.2, this worked very well.

Unfortunately, as of POI 5.0.0, the WorkbookFactory no longer throws this exception in the respective error case. Much more is delivered back a null value instead of an workbook.

I was able to see a change in the FileMagic class, which seems to be used for this.

The constants OOXML_FILE_HEADER and RAW_XML_FILE_HEADER from POIFSConstants no longer exist. 
Therefor the values are given directly to the ENUMS in the FileMagic class. But the type of the value is no longer an array of bytes.

it may be possible that this is causing the error, but i'm not sure.

Anyway, the enums BIFF2 and BIFF3 is also changing the explizit declariton to the type byte[].

Thx for any help in advance!

Code Snip:

Workbook myWorkBook;
File xls = new File("Example.XLS");

try {
        myWorkBook = WorkbookFactory.create(xls);
      } catch (NotOLE2FileException ex) {
        if (ex.getMessage().contains("The supplied data appears to be a raw XML file")) {
          return MyTika.parseHTMLandXMLTable(xls);
        } else if (ex.getMessage().contains("Invalid header signature")) {
          return MyTika.parseHTMLandXMLTable(xls);
        } else throw ex;
      }
Comment 1 PJ Fanning 2021-10-10 09:47:43 UTC
An observation - WorkbookFactory.create(InputStream) does fail as expected - an IOException("Can't open workbook - unsupported file type: XML') for the attached file
Comment 2 PJ Fanning 2021-10-10 09:58:50 UTC
added a fix for POI 5.1.0 - r1894097
Comment 3 PJ Fanning 2021-10-10 10:02:05 UTC
PS the new change is an IOException not an NotOLE2FileException - to keep the behaviour consistent with other WorkbookFactory create methods
Comment 4 Andreas Beeker 2021-10-10 11:46:14 UTC
The POIDocument Factories should behave the same way. please decide if this is handled on this entry or a new one.
Comment 5 PJ Fanning 2021-10-10 12:44:03 UTC
This issue was raised about one factory - WorkbookFactory - and its create methods that are not consistent in POI 5.0.0. 

Would it be possible to track changes for other factories on a separate issue and specify the factories?