Bug 5586 - make parser faster !
Summary: make parser faster !
Status: REOPENED
Alias: None
Product: Xerces-J
Classification: Unclassified
Component: Core (show other bugs)
Version: 1.4.4
Hardware: All All
: P3 normal
Target Milestone: ---
Assignee: Xerces-J Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2001-12-24 12:48 UTC by Genady
Modified: 2004-11-16 19:05 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Genady 2001-12-24 12:48:16 UTC
The parsing of simple xml files is much slower relatively to older xml4j
version from ibm (e.g. 2.0.15). 
Even after turning off all features it is still almost twice slower than
xml4j 2.0.15.
I checked only parsing (very) large files, not parsing many small files.
Comment 1 Elena Litani 2002-01-02 15:21:08 UTC
You don't give any specific details on what parser are you using: do you use DOM 
or SAX? Is it a validating parser? Do you use DTDs or XML Schemas?
Since XML4J 2.0.15 we've added several enhancements to the parser, like W3C DOM 
L2 implementation, W3C XML Schema implementation.
Thus, it is acceptable that the parser became slower.
We are shifting our development efforts towards Xerces2, and we've stopped 
working on Xerces (1.4.4 is probably the last release).
If you provide more additional information and patches to the code, we will 
gladly accept those.
Thank you!
Comment 2 Genady 2002-01-05 05:06:42 UTC
Ok, few details -
I'm using sax parser using the sax 2.0 framework,
although i don't use any features specific to 2.0.
I don't use validation.
I have a dtd embedded into the file.
I see the same performance both in 1.4.4 and in 2.0.0 beta3.

Also any tips on making the parsing faster will be welcomed!
(I already used those on the web).

Genady
Comment 3 Genady 2002-01-05 05:08:22 UTC
I'll also try to benchmark the parser and send you the results.

Genady
Comment 4 Elena Litani 2002-01-07 06:24:36 UTC
Genady, given your requirements you should use Xerces2. In Xerces2 there are 
different parser configurations that include different components in the 
pipeline. By default, Xerces2b4 parsers are created with 
xerces.parsers.StandardParserCofiguration which includes: Scanner, 
DTDValidator, DTDScanner, NamespaceBinder. Validating parser must read DTD if 
it is present, even if you don't need validation. If you don't want external 
DTD to be read set http://apache.org/xml/features/nonvalidating/load-external-
dtd to false [the internal subset will be always read]. 
If you have more about performance email to the xerces-j-dev list.