Bug 61104 - new XWPFDocument(fis) is blocked
Summary: new XWPFDocument(fis) is blocked
Status: RESOLVED INVALID
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: 3.16-FINAL
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-18 13:26 UTC by zxh
Modified: 2017-06-16 18:40 UTC (History)
0 users



Attachments
code file (2.31 KB, text/plain)
2017-05-18 13:26 UTC, zxh
Details
docx file (12.91 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-05-20 05:36 UTC, zxh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zxh 2017-05-18 13:26:19 UTC
Created attachment 34997 [details]
code file

I have added log output, as follow:
writer = new BufferedWriter(new FileWriter(textFile));
InputStream is = new FileInputStream(file);

LOGGER.info("bytes:{}",is.available());
LOGGER.info("SIGN1");
document = new XWPFDocument(is);
LOGGER.info("SIGN2");
if(null == document){
    LOGGER.info("document is null");
}

extractor = new XWPFWordExtractor(document);
writer.write(extractor.getText());
writer.flush();
LOGGER.info("Extract text from {}, write text to {}", file.getName(), textFile);

the output is as follow:
[INFO ][2017-05-18 10:19:41][io.transwarp.extractor.ExtractorWorker.run(ExtractorWorker.java:27)]pool-1-thread-1 start extracting doc:E:\IDEA\DocumentDemo\document_dir\test.docx
[INFO ][2017-05-18 10:19:41][io.transwarp.docutils.DocxExtractor.extract(DocxExtractor.java:41)]bytes:13331
[INFO ][2017-05-18 10:19:41][io.transwarp.docutils.DocxExtractor.extract(DocxExtractor.java:42)]SIGN1

the code after "document = new XWPFDocument(is);" is not executed and the application is in RUNNING STATE  ,  and no exception or error is reported.
I am also puzzled!!!
Comment 1 Tim Allison 2017-05-19 15:43:55 UTC
Are you able to share the triggering .docx file?
Comment 2 zxh 2017-05-20 05:36:41 UTC
Created attachment 34998 [details]
docx file
Comment 3 zxh 2017-05-20 05:37:40 UTC
(In reply to Tim Allison from comment #1)
> Are you able to share the triggering .docx file?

I have shared the triggering docx file
Comment 4 Javen O'Neal 2017-05-31 05:39:23 UTC
I am unable to reproduce your issue with the provided file using the latest POI trunk code, tested locally on my computer. I removed code that appeared to be irrelevant to the demonstrated problem.

    @Test
    public void test61104() throws IOException {
        File file = new File("test-data/document/61104.docx");
        InputStream is = new FileInputStream(file);
        System.out.println(is.available());
        XWPFDocument document = new XWPFDocument(is);
        document.close();
    }

My best guess is the issue you're having is due to ExtractorWorker or DocxExtractor. I am not familiar with the io.transwarp library, so I can't suggest anything more specific. Make sure you aren't writing to the file while you're reading from it (possibly by another thread, given how Worker classes tend to run in a multi-threaded environment). In general, POI is not thread safe.

Usually any task that is run inside some kind of Worker is executed on its own thread, and has its own exception handler stack that will not inform the caller. Another possibility is that `document = XWPFDocument(is)` is throwing an exception that your thread never catches, causing the thread pool executor to suspend. Add some try/catch print code to make sure this isn't the case.

If you're running your program in some web container where your `file` object lives in some restricted filesystem with restricted I/O, the problem may be with the I/O layer of the container.
You could use POI's `IOUtils.readFully(is)` to see if there's an I/O problem on your platform.

If none of the above resolve your question, please respond with the following:
Are you using poi-3.16.jar, poi-ooxml-3.16.jar, and poi-ooxml-schemas-3.16.jar?
What happens when you remove buffered file writer and replace the loggers with System.out.println?
What vendor and version of Java are you running?
What OS are you running this on?
Comment 5 Javen O'Neal 2017-05-31 05:41:32 UTC
From the mailing list:
> OS is Windows 10, JDK version is 1.7
Comment 6 Javen O'Neal 2017-05-31 06:26:52 UTC
Looking at your full code example from attachment 34997 [details], your FileInputStream `is` is never closed. Leaving an open file handle would likely cause problems the next time you try to create a FileInputStream from the same resource. This is another possible cause for your problem.
Comment 7 Dominik Stadler 2017-06-16 18:40:50 UTC
I think we provided some suggestions as to the nature of the problem, thus closing this here for now, please reopen if you still think there is a bug in Apache POI and you have sample code that allows to reproduce the problem outside of your application.