Bug 19203 - ValueIndexer for Multi-Byte Numeric Types Broken
Summary: ValueIndexer for Multi-Byte Numeric Types Broken
Alias: None
Product: Xindice
Classification: Unclassified
Component: DB Engine (show other bugs)
Version: cvs head (1.1)
Hardware: All All
: P3 normal
Target Milestone: ---
Assignee: Xindice Developers
Depends on:
Reported: 2003-04-21 21:49 UTC by Terry Rosenbaum
Modified: 2007-01-30 17:09 UTC (History)
0 users

Patch (6.31 KB, patch)
2006-11-19 11:44 UTC, Natalia Shilenkova
Details | Diff
Testcase (5.07 KB, text/plain)
2006-11-19 14:31 UTC, Natalia Shilenkova
Indexer patch for numerical values (7.57 KB, patch)
2006-12-27 18:03 UTC, Natalia Shilenkova
Details | Diff
Testcase (5.72 KB, text/plain)
2006-12-27 18:07 UTC, Natalia Shilenkova
Rebuild collection/index utility (6.39 KB, text/plain)
2006-12-27 18:10 UTC, Natalia Shilenkova

Note You need to log in before you can comment on or make changes to this bug.
Description Terry Rosenbaum 2003-04-21 21:49:58 UTC
I've discovered that defining a ValueIndexer with multi-byte
numeric types (e.g. long, int, short, etc.) does not work correctly
when resolving relational comparisons other than equals (e.g.
can produce incorrect results for gretater than, less than, etc.).

The reason is that all data is handled as anonymous byte
arrays using the Value class and the Value class has no
type associated with the data. Thus, the Value.compareTo method
is not able to properly perform comparisons for such data. It
performs byte-by-byte comparisons of the data.

As an example, ValueIndexer encodes the value
1049903940000 as the bytes 0, 0, 0, -12, 115, 38, -63, -96
and encodes the value 1050687000291 as the bytes
0, 0, 0, -12, -95, -45, 78, -29. Thus, when comparing
a Value representing 1049903940000 to a Value
representing 1050687000291  (in pseudocode:
the incorrect result of 5 is returned indicating that
1049903940000 is greater than 1050687000291 (which
is incorrect).

The problem can be encountered during XPath comparisons
if a ValueIndexer is defined as a numeric indexer for an
attribute or element.

e.g. if you define an index to index the Time attribute as type long:

<Bug Time="1050687000291" />

<index name="BugIndex" class="org.apache.xindice.core.indexer.ValueIndexer"
   pattern="Bug@Time" type="long" />

And use an XPath like /Bug[(1049903940000 < @Time) and (1050687000291 > @Time)]
you may see incorrect query results.

Part of the problem is in the code of Value.compareTo(Value):

          short s1 = (short)(b1 >>> 0);
          short s2 = (short)(b2 >>> 0);
          return s1 > s2 ? (i+1)
                          : -(i+1);

This code attempts to shed the sign bits via ">>> 0" which
has no effect. The result is that s1 and s2 are sign-extended
versions of b1 and b2 and the comparison becomes a signed
comparison rather than the desired unsigned comparison.
This can be fixed by using:

           int i1 = (int)b1 & 0xff;
           int i2 = (int)b2 & 0xff;
           return i1 > i2 ? (i+1)
                          : -(i+1);

But, this only fixes the problem for 0 and positive values being
compared. It does not work for the entire domain of numeric

One possible way to fix this would be to add a type field
to the Value object and perform conversion based on type
prior to performing comparisons. A drawback of this approach
would be that exisiting persisted Value objects (e.g. in existing
databases) would be rendered incompatible.
Comment 1 Terry Rosenbaum 2003-06-05 07:39:08 UTC
Another possible way to fix this might
be through the use of a comparator
function. One might alter IndexQuery
to allow the setting of a comparator
for Value objects. The ValueIndexer
could set the comparator to the proper
comparator for the index type (string,
integer, long, boolean, etc.) in its
queryMatches method. This might be a
feasible solution allowing
backwards compatibility with existing
persisted databases.
Comment 2 Kevin Ross 2003-06-17 04:35:35 UTC
As was discussed in a recent thread on the mail list, backwards compatibility 
means an export and import, so we can consider your solution, or that may 
untie your hands as far as the 'best' solution goes.
Comment 3 Natalia Shilenkova 2006-11-19 11:44:52 UTC
Created attachment 19143 [details]

Here is the possible solution - to encode values depending on type of indexer
in the way they can be compared as byte arrays. Please note that it does not
fix existing indexers, so indexes have to be recreated.
Comment 4 Natalia Shilenkova 2006-11-19 11:53:52 UTC
One more thing, in the patch I changed the way float/double are handled, instead
of storing only 4 digits after the dot, it will store full range of values
allowed  for float/double.

Indexer types short, integer, and long will always use 8 bytes to store the
value, and both float and double use 8 bytes. Does it make a sense to change it
so value takes exact amount of space - 2 byte for short, 4 bytes for integer and
so on?
Comment 5 Natalia Shilenkova 2006-11-19 14:31:52 UTC
Created attachment 19145 [details]
Comment 6 Vadim Gritsenko 2006-11-26 14:54:44 UTC
(In reply to comment #3)
> Here is the possible solution - to encode values depending on type of indexer
> in the way they can be compared as byte arrays. Please note that it does not
> fix existing indexers, so indexes have to be recreated. 

This patch changes the way how Values are compared, and as a result all
collections using BTree filer (not only indexes) have to be re-created.

Can existing import/export be a solution for an upgrade, or do we need an
upgrade utility?
Comment 7 Natalia Shilenkova 2006-12-27 18:03:16 UTC
Created attachment 19309 [details]
Indexer patch for numerical values
Comment 8 Natalia Shilenkova 2006-12-27 18:07:41 UTC
Created attachment 19310 [details]
Comment 9 Natalia Shilenkova 2006-12-27 18:10:29 UTC
Created attachment 19311 [details]
Rebuild collection/index utility
Comment 10 Vadim Gritsenko 2007-01-30 17:09:12 UTC
Looks good, patch applied (with minor changes).

However, some more work is still needed on (how it is now called)
DatabaseRebuild utility. Need some shell script so it is more convenient to use.
Would be nice to have single step process, with copy & re-indexing done in one
step. Better usage documentation.

Looking forward to new patches! :)
Comment 11 Vadim Gritsenko 2007-01-30 17:09:57 UTC
(closing this bug as indexing is working now. pls create new issue for any
follow up patches).