Bug 61337 - Downgrade AssertionError to RecordFormatException?
Summary: Downgrade AssertionError to RecordFormatException?
Status: NEW
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.17-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
Depends on:
Reported: 2017-07-25 11:12 UTC by Tim Allison
Modified: 2017-07-26 18:46 UTC (History)
0 users

triggering file (57.00 KB, application/msword)
2017-07-25 11:12 UTC, Tim Allison

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Allison 2017-07-25 11:12:54 UTC
Created attachment 35171 [details]
triggering file

A fuzzed version of testException2.doc in Tika's test corpus triggers:

Caused by: java.lang.AssertionError
	at org.apache.poi.hwpf.usermodel.Range.sanityCheck(Range.java:1158)
	at org.apache.poi.hwpf.usermodel.Range.<init>(Range.java:195)
	at org.apache.poi.hwpf.usermodel.HeaderStories.getSubrangeAt(HeaderStories.java:357)
	at org.apache.poi.hwpf.usermodel.HeaderStories.getOddHeaderSubrange(HeaderStories.java:196)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:180)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)

We should definitely have a range check, but I think it would be better to throw a RecordFormatException or RuntimeException?

More generally, there is room to make our "broken record" exception handling more consistent.  My preference would be to throw RecordFormatException for this kind of thing.

Other ideas?
Comment 1 Tim Allison 2017-07-25 11:20:19 UTC
I see 73 instances of "assert(", and, yes, several of these are my fault.

In most cases, I'd think we could convert these to a RecordFormatException.  If there were a use case for turning assertions off and hoping for the best, I'd want to leave the asserts in.  However, it looks (on quick review) like turning the assertions off will yield corrupt objects/data.  So, I don't see a use case for assert instead of a RecordFormatException.

I'm happy to make the changes, but given that this will be not a small patch, I'd like to get feedback before I fix this globally.
Comment 2 Tim Allison 2017-07-25 11:32:30 UTC
Of course, there are many, many more without the paren: "assert "
Comment 3 Andreas Beeker 2017-07-25 11:42:44 UTC
+1 for replacing java-assert

I think we need at least two different cases:
asserts based on record format (throws RecordFormatException)
and other asserts (throws ? extends RuntimeException)
Comment 4 Tim Allison 2017-07-25 19:37:05 UTC
I'll probably start on this one class/family at a time unless there are objections.

It looks like there are basically 4 places where we rely on assert.

1) assert that things are or aren't null, as in DrawTextParagraph:

        String buFontStr = bulletStyle.getBulletFont();
        if (buFontStr == null) {
            buFontStr = paragraph.getDefaultFontFamily();
        assert(buFontStr != null);
        FontInfo buFont = new DrawFontInfo(buFontStr);

2) assert instanceof, as in Range:

        assert ( _doc instanceof HWPFDocument );

3) assert x == y to confirm that a record is not wonky, as in HwmfBitmapDib:

        assert(introSize == headerSize);

 or in LittleEndianByteArrayInputStream:
        assert skipped == size : "Buffer overrun";

4) checks on limitations of implementation as in:
      assert false : "hashCode not designed";

There may be other uses as well...

Proposed solutions:
1) RecordFormatException?  Or something else?
2) What exception should we use here?
3) is straightforward, I think: RecordFormatException
4) is straightforward, I think: implement/auto-generate a hashcode
Comment 5 Tim Allison 2017-07-26 18:46:27 UTC
in r1803092, I made some modifications to hwpf.Range.  I left in the asserts in the binary search code.

I added DocumentFormatException as a RuntimeException to handle larger scale problems with parsing the document than RecordFormatException.  I think I'd want to use this for 1) and 2), RecordFormatException for 3), and just implement 4).

I'll wait a bit before making any other changes to give folks a chance to review and offer feedback.