Bug 62951

Summary: FileMagic not correctly identified
Product: POI Reporter: Andreas Beeker <kiwiwings>
Component: POI OverallAssignee: POI Developers List <dev>
Severity: normal    
Priority: P2    
Version: 4.0.x-dev   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Andreas Beeker 2018-11-25 20:29:35 UTC
Looking at the common crawl regression results, we see lots of documents being an "UNKNOWN" file:

java.lang.IllegalArgumentException: The document is really a UNKNOWN file

Although they are probably HTML files.

This following patch covers at least the failure of identifying the already known magics.
Comment 1 Andreas Beeker 2018-11-25 20:50:42 UTC
Patched via r1847429