62951 – FileMagic not correctly identified

Bug 62951 - FileMagic not correctly identified

Summary: FileMagic not correctly identified

Status:	RESOLVED FIXED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	POI Overall (show other bugs)
Version:	4.0.x-dev
Hardware:	All All

Importance:	P2 normal (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-11-25 20:29 UTC by Andreas Beeker
Modified:	2018-11-25 20:50 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andreas Beeker 2018-11-25 20:29:35 UTC

Looking at the common crawl regression results, we see lots of documents being an "UNKNOWN" file:

java.lang.IllegalArgumentException: The document is really a UNKNOWN file

Although they are probably HTML files.

This following patch covers at least the failure of identifying the already known magics.

Comment 1 Andreas Beeker 2018-11-25 20:50:42 UTC

Patched via r1847429