org.apache.poi.hwpf.usermodel.Range.getTable(Paragraph) checks if a given Paragraph is the first in the table by comparing it with the previous paragraph. So the previous paragraph is created in line 926 by calling <code> Paragraph previous = Paragraph.newParagraph( this, _paragraphs.get( r._parStart - 1 ) ); </code> Unfortunatly, the end index in the sections list of the Range of the previous Paragraph is not correctly initialized. So the comparison in line 931 fails, if r._sectionStart is larger than 1 (as previous._sectionEnd is still 0. <code> if ( previous.isInTable() && // previous.getTableLevel() == tableLevel // && previous._sectionEnd >= r._sectionStart ) </code> Resolution: Initialize the previous Paragraph by calling initAll(): <code> if ( r._parStart != 0 ) { Paragraph previous = Paragraph.newParagraph( this, _paragraphs.get( r._parStart - 1 ) ); previous.initAll(); // initialize sections for proper comparison if ( previous.isInTable() && // previous.getTableLevel() == tableLevel // && previous._sectionEnd >= r._sectionStart ) { throw new IllegalArgumentException( "This paragraph is not the first one in the table" ); } } </code>
Are you able to put together a small unit test that shows off the problem, and shows that your proposed fix solves it?
Created attachment 30244 [details] Testcase
Created attachment 30245 [details] Example document for the testcase
Please find here a test case for the bug.
I tried to finally apply this, but unfortunately this change breaks some existing tests: TestWordExtractorBugs.testProblemMetadata, I could not see quickly why it happens now and how to resolve this...