Bug 60556 - IllegalArgumentException: The end () must not be before the start ()
Summary: IllegalArgumentException: The end () must not be before the start ()
Status: RESOLVED INVALID
Alias: None
Product: POI
Classification: Unclassified
Component: HWPF (show other bugs)
Version: 3.15-FINAL
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-05 16:32 UTC by ismaelgomezs
Modified: 2017-01-31 16:27 UTC (History)
0 users



Attachments
File which the code fails (850.00 KB, application/msword)
2017-01-05 16:32 UTC, ismaelgomezs
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ismaelgomezs 2017-01-05 16:32:45 UTC
Created attachment 34596 [details]
File which the code fails

I'm extracting the text from a WordExtractor class (apache POI), but I have an error for some .doc files. Here the code:

"
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.OfficeXmlFileException;

public class [class name] 
{ 
	public static void main(String... args) throws FileNotFoundException, IOException, NullPointerException, OfficeXmlFileException {
	   
		File[] files = new File("[input path]").listFiles();    
		showFiles(files);
	}
	
	public static void showFiles(File[] files) throws FileNotFoundException, IOException, NullPointerException, OfficeXmlFileException {
		
		File log = new File("[output name]/out.tsv");
		
	    for (File file : files) {
	       	if (file.isDirectory()) {
	           	//System.out.println("Directory/" + file.getName());
	           	showFiles(file.listFiles()); // Calls same method again.
	        } else {
	        	    String N = file.getName();  
	        	    
	        	    // caso .docx
	        		if (N.toLowerCase().endsWith(".docx") && !N.toLowerCase().startsWith("~"))
	        		{	
	        			System.out.println(file.getAbsolutePath());
	        			XWPFDocument docx = new XWPFDocument(new FileInputStream(file));
	        			XWPFWordExtractor we = new XWPFWordExtractor(docx);
        				String T = we.getText().replaceAll("\\n", " ").replaceAll("\\r", " ");
        				
            			// PARA ESCRIBIR EL ARCHIVO
            			try{
//            				if(!log.exists()){
//            					System.out.println("We had to make a new file.");
//            					log.createNewFile();
//            				}

            				FileWriter fileWriter = new FileWriter(log, true);
            				BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
            				bufferedWriter.write(file.getAbsolutePath()+"\t"+T+"\n");
            				bufferedWriter.close();

            			} catch (IOException e) {
            	            System.err.println("Problem writing .DOCX to the file out.txt " + e.getMessage());
            	        }
	        		} 
	        		else {
	        		
	        			if (N.toLowerCase().endsWith(".doc") && !N.toLowerCase().startsWith("~"))
	        			{
	        				System.out.println(file.getAbsolutePath());
	        			
	        				HWPFDocument doc = new HWPFDocument(new FileInputStream(file));
	        				WordExtractor we = new WordExtractor(doc);
		        			//WordExtractor we = new WordExtractor(new FileInputStream(file));
	        				String T = we.getText().replaceAll("\\n", " ").replaceAll("\\r", " ");
        				
	        				// PARA ESCRIBIR EL ARCHIVO
	        				try{
//	        					if(!log.exists()){
//	        						log.createNewFile();
//	        					}
	        					
	        					FileWriter fileWriter = new FileWriter(log, true);
	        					BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
	        					bufferedWriter.write(file.getAbsolutePath()+"\t"+T+"\n");
	        					bufferedWriter.close();

	        				} catch (IOException e) {
	        					System.err.println("Problem writing .DOC to the file out.txt " + e.getMessage());
	        					}
	        			}
	        		}	
	       		}
	    	}
	}
}
"

For most .docx and .doc files it's work fine.

The error message is:

Exception in thread "main" java.lang.RuntimeException: 
java.lang.IllegalArgumentException: The end (4958) must not be before the start (4990)

How can I fix it?
Comment 1 ps26oct 2017-01-20 14:39:00 UTC
The problem seems to be with the document and not the code and specifically with the hidden bookmark with the name: "_Toc263095067". Apparently poi is inferring the end of the bookmark to be located before the start. Deleting that particular hidden bookmark makes everything work fine. 

Let me know of any other corrective measures or any actual problem in the poi code this highlights, if any.
Comment 2 ismaelgomezs 2017-01-31 15:29:54 UTC
Thank you very much. That was the problem. Now it works fine!
Regards.
Comment 3 Dominik Stadler 2017-01-31 16:27:12 UTC
Seems to have been caused by a slightly broken document.