Bug 62951 - FileMagic not correctly identified
Summary: FileMagic not correctly identified
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 4.0.x-dev
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-25 20:29 UTC by Andreas Beeker
Modified: 2018-11-25 20:50 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Beeker 2018-11-25 20:29:35 UTC
Looking at the common crawl regression results, we see lots of documents being an "UNKNOWN" file:

java.lang.IllegalArgumentException: The document is really a UNKNOWN file

Although they are probably HTML files.

This following patch covers at least the failure of identifying the already known magics.
Comment 1 Andreas Beeker 2018-11-25 20:50:42 UTC
Patched via r1847429