46443 – property claimed to start before zero, at -512!

Bug 46443 - property claimed to start before zero, at -512!

Summary: property claimed to start before zero, at -512!

Status:	RESOLVED WONTFIX

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HWPF (show other bugs)
Version:	3.5-dev
Hardware:	PC Windows XP

Importance:	P2 normal with 1 vote (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:	ErrorMessage

Duplicates (1):	46511 (view as bug list)
Depends on:
Blocks:

Reported:	2008-12-29 20:30 UTC by Doug McComsey
Modified:	2016-06-13 19:21 UTC (History)
CC List:	1 user (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Doug McComsey 2008-12-29 20:30:26 UTC

I am getting a strange message when I read the paragraphs of a Word file. Here is the message:
	
property claimed to start before zero, at -512! Resetting it to zero, and hoping for the best

Here is my code. It happens on the createion of the new HWPFDocument.
        final POIFSFileSystem fileSystem = new POIFSFileSystem(new FileInputStream(file));
        final HWPFDocument document = new HWPFDocument(fileSystem);

The error occurs in org.apache.poi.hwpf.model.PropertyNode in the constructor :

  protected PropertyNode(int fcStart, int fcEnd, Object buf)
  {
      _cpStart = fcStart;
      _cpEnd = fcEnd;
      _buf = buf;
      
      if(_cpStart < 0) {
    	  System.err.println("A property claimed to start before zero, at " + _cpStart + "! Resetting it to zero, and hoping for the best");
    	  _cpStart = 0;
      }
  }

The -512 originates in a calculation done in org.apache.poi.hwpf.model. CHPFormattedDiskPage, where getStart(x) returns 1536 and fcMin is 2048.

    public CHPFormattedDiskPage(byte[] documentStream, int offset, int fcMin, TextPieceTable tpt)
    {
      super(documentStream, offset);

      for (int x = 0; x < _crun; x++)
      {
    	boolean isUnicode = tpt.isUnicodeAtByteOffset( getStart(x) );
        _chpxList.add(new CHPX(getStart(x) - fcMin, getEnd(x) - fcMin, getGrpprl(x), isUnicode));
      }
    }

I am using these jars, the latest version available to download using Maven2:
poi-3.5-beta4.jar
poi-scratchpad-3.5-beta4.jar

It still seems to work, but I get a lot of these messages.

NOTE: This message appears in bug 46220 but that bug does not seem to be related.

Comment 1 Nick Burch 2009-01-06 11:13:52 UTC

This warning is because our understanding of how the file's structure should work out, and how it has been found differs.

If you're just reading stuff, you're likely to be ok. However, writing changes back out might break, as other things are likely to be wrong too. (It generally seems that ignoring the negative start and assuming it's zero lets most stuff work, but we're not entirely sure why)

Any help figuring out where one of our assumptions is wrong, so we can handle this cleanly, are greatly appreciated!

Comment 2 Nick Burch 2009-01-12 02:38:15 UTC

*** Bug 46511 has been marked as a duplicate of this bug. ***

Comment 3 Antony Bowesman 2009-01-12 02:43:53 UTC

Sorry for the duplicate.  Is there a good reason to leave this in the final libraries as it can be quite annoying when it occurs frequently.  We are using it for text extraction and get it quite a lot.

Comment 4 Nick Burch 2009-01-12 02:50:30 UTC

See Comment #1. The warning remains because the problem remains

Comment 5 Antony Bowesman 2009-04-22 16:30:11 UTC

Nick, I understand your reasoning for the warning, but from a large scale deployed server side application that is simply extracting text from hundreds of thousands of word documents for indexing, this message creates a massive amount of noise and obscures other potentially important errors.  You refer to comment #1 where you say

"If you're just reading stuff, you're likely to be ok"

so why does POI forcing a system.err when it is not aware of the higher level use case.

At least can't you make the output conditional on some system property, so at least applications, for which this is not important, can disable it.

Comment 6 Antony Bowesman 2009-04-22 16:38:45 UTC

The same applies to the error in SectionTable.java

System.err.println("Your document seemed to be mostly unicode, but the section definition was in bytes! Trying anyway, but things may well go wrong!");

You also say that read-only case should be fine.

Comment 7 Nick Burch 2009-04-23 07:15:43 UTC

Part of the hope is that with an annoying enough error, someone'll be motivated to get around to fixing the underlying issue...

There really is something screwy going in which needs to be fixed, and hiding the error is just papering over the cracks :(

Comment 8 Yegor Kozlov 2011-06-24 10:41:58 UTC

Can you attach the problematic document? Without it we can't do much to figure out what's wrong.

Yegor

Comment 9 Dominik Stadler 2016-06-13 19:21:06 UTC

No update for a long time and there is not much we can do without a sample document, therefore closing this as WONTFIX for now, please reopen with a sample document if this is still an issue with a recent version of POI.