Created attachment 27238 [details] Examples of 66048 bytes XLS which generates and infinite loop In IOUtils.readFully() with certain specific file types / size (tested with Microsoft Excel) I had an infinite loop when basically channel.read() infinitely returns 0 (instead of -1 or the number of bytes read). As per JDK 1.6 NIO documentation (http://download.oracle.com/javase/6/docs/api/java/nio/channels/ReadableByteChannel.html), it's "possible" that ReadableByteChannel.read returns 0, quoting: int read() Returns: The number of bytes read, possibly zero, or -1 if the channel has reached end-of-stream Now basically I got an infinite loop on all files which are funnily enough of the same file size (66048 bytes = 65536 content + 512 header), so I'm wondering whether this is related to that or to the specific file. I'm attaching the XLS which generates the infinite loop (broken.xls) and a working one (working.xls). I also created a test case mimicking the way I was reading the file (using Tika OfficeParser) which shows the issue (and the working case). In order to run the JUnit you'll need the two test files on the same folder you run it from. I fine wrote a tentative patch (1 character), which passes the test case and temporarily solved the problem in the production instance I was working on, but I'm not fully sure it's the correct way to do it. I had this on Win 2008 Server (virtualized with VMWare) and my local MacOSX 10.6
Created attachment 27239 [details] Example of XLS which does NOT generate infinite loop Other file needed to run the test case I'm gonna attach
Created attachment 27240 [details] Test case which generates the infinite loop with "broken.xls" This JUnit test case breaks with broken.xls and will work with working.xls using poi-3.8-beta3. The JUnit passes when applying the patch I'm gonna attach.
Created attachment 27241 [details] 1 character patch which makes the test case pass with "broken.xls" The patch basically handles the case that channel.read() returns 0 indefinitely. It only adds a = to the < condition in the if. Test case passes, but not sure why the issue generates (seems related to the file size being multiple of 512 but I tried with a simple 512 header + 512 content xls file and it actually worked without the patch, so a bit puzzled :)
Hmm, pesky. I had hit something like this before, but I thought I'd fixed it already (but checking when we reached the end of the block). It does seem to be something around reading to the end of the last block when it's an exact multiple. What's the proposed fix?
Created attachment 27242 [details] Example of 66048 bytes XLS which generates and infinite loop (clean)
The content of attachment 27238 [details] has been deleted by Tony Stevenson <tony@pc-tony.com> who provided the following reason: Asked for by the user - 13:55:25 < mindthegab> pctony: can you try this link https://issues.apache.org/bugzilla/attachment.cgi?id=27238&action=delete ? The token used to delete this attachment was generated at 2011-07-01 13:04:41 UTC.
Thanks Tony! For the test case purposes please check attachment https://issues.apache.org/bugzilla/attachment.cgi?id=27242 NOTE: You might need to remame it to broken.xls (or change the test case which expects this name), since now it was renamed by bugzilla to broken-1.xls
I can't reproduce the infinite loop using your supplied file - it works fine for me I'm not sure we want to stop reading as soon as we hit zero bytes back from a read call though, as your patch suggests. Think of the case of reading from a slop stream, eg from a network. It's quite possible for that to return zero when there's no data yet, but more still to come. The contract for readFully is that it should keep going until there is no more data available, which we'd break for that situation At the point your file hits an infinite loop, what are the values of: * total * b.capacity() * b.position() ?
There's a rumour that one of the machines at work might be showing up this JVM specific issue too, which'll hopefully let me get the details on what the state is when the loop is triggered if Gabriele isn't able to do so. (Until we know what's going on on these JVM's when they loop, we can't identify if it's our fault or the JVM's, and try to fix/workaround as appropriate)
Hey Nick, I do have the same problem an I was able to reproduce it with the attached file. The values you've requested are: total = 65536 b.capacity = 66048 b.position = 66048 Let me know if you need more data. Regards, Torben
(In reply to comment #10) > The values you've requested are: > total = 65536 > b.capacity = 66048 > b.position = 66048 That should be breaking out on if (total == b.capacity() || b.position() == b.capacity()) { That was fixed as part of issue #51100 on 2011-04-21, so I think the issue is people using an older version of POI without this fix Gabriele or Torben - can you confirm when the version of POI you're using dates from? (I fear it may be earlier in April than this)
Hey Nick, you are right. I'm using the version 20110401. I also had a peek at the svn but I only compared the files "manually" :-). Didn't see (without code highlighting) that there are two methods "readFully" and I only saw the line "if (total == len)" and thought that nothing had changed. Thanks for the hint. Regards, Torben
Looks like this is actually bug #51100, but everyone got confused and were sure the fix from that was present, when alas it wasn't... *** This bug has been marked as a duplicate of bug 51100 ***