I am getting a strange message when I read the paragraphs of a Word file. Here is the message: property claimed to start before zero, at -512! Resetting it to zero, and hoping for the best Here is my code. It happens on the createion of the new HWPFDocument. final POIFSFileSystem fileSystem = new POIFSFileSystem(new FileInputStream(file)); final HWPFDocument document = new HWPFDocument(fileSystem); The error occurs in org.apache.poi.hwpf.model.PropertyNode in the constructor : protected PropertyNode(int fcStart, int fcEnd, Object buf) { _cpStart = fcStart; _cpEnd = fcEnd; _buf = buf; if(_cpStart < 0) { System.err.println("A property claimed to start before zero, at " + _cpStart + "! Resetting it to zero, and hoping for the best"); _cpStart = 0; } } The -512 originates in a calculation done in org.apache.poi.hwpf.model. CHPFormattedDiskPage, where getStart(x) returns 1536 and fcMin is 2048. public CHPFormattedDiskPage(byte[] documentStream, int offset, int fcMin, TextPieceTable tpt) { super(documentStream, offset); for (int x = 0; x < _crun; x++) { boolean isUnicode = tpt.isUnicodeAtByteOffset( getStart(x) ); _chpxList.add(new CHPX(getStart(x) - fcMin, getEnd(x) - fcMin, getGrpprl(x), isUnicode)); } } I am using these jars, the latest version available to download using Maven2: poi-3.5-beta4.jar poi-scratchpad-3.5-beta4.jar It still seems to work, but I get a lot of these messages. NOTE: This message appears in bug 46220 but that bug does not seem to be related.
This warning is because our understanding of how the file's structure should work out, and how it has been found differs. If you're just reading stuff, you're likely to be ok. However, writing changes back out might break, as other things are likely to be wrong too. (It generally seems that ignoring the negative start and assuming it's zero lets most stuff work, but we're not entirely sure why) Any help figuring out where one of our assumptions is wrong, so we can handle this cleanly, are greatly appreciated!
*** Bug 46511 has been marked as a duplicate of this bug. ***
Sorry for the duplicate. Is there a good reason to leave this in the final libraries as it can be quite annoying when it occurs frequently. We are using it for text extraction and get it quite a lot.
See Comment #1. The warning remains because the problem remains
Nick, I understand your reasoning for the warning, but from a large scale deployed server side application that is simply extracting text from hundreds of thousands of word documents for indexing, this message creates a massive amount of noise and obscures other potentially important errors. You refer to comment #1 where you say "If you're just reading stuff, you're likely to be ok" so why does POI forcing a system.err when it is not aware of the higher level use case. At least can't you make the output conditional on some system property, so at least applications, for which this is not important, can disable it.
The same applies to the error in SectionTable.java System.err.println("Your document seemed to be mostly unicode, but the section definition was in bytes! Trying anyway, but things may well go wrong!"); You also say that read-only case should be fine.
Part of the hope is that with an annoying enough error, someone'll be motivated to get around to fixing the underlying issue... There really is something screwy going in which needs to be fixed, and hiding the error is just papering over the cracks :(
Can you attach the problematic document? Without it we can't do much to figure out what's wrong. Yegor
No update for a long time and there is not much we can do without a sample document, therefore closing this as WONTFIX for now, please reopen with a sample document if this is still an issue with a recent version of POI.