1541 – TreeWalker with sparse filter works in exponential time

Bug 1541 - TreeWalker with sparse filter works in exponential time

Summary: TreeWalker with sparse filter works in exponential time

Status:	REOPENED

Alias:	None

Product:	Xerces-J
Classification:	Unclassified
Component:	DOM (show other bugs)
Version:	1.3.1
Hardware:	PC All

Importance:	P1 major
Target Milestone:	---
Assignee:	Xerces-J Developers Mailing List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2001-04-26 11:10 UTC by Anthony Roy
Modified:	2004-11-16 19:05 UTC (History)
CC List:	1 user (show)

Attachments
Revised test case showing new failure modes and unresolved failures (5.32 KB, text/plain) 2001-06-22 11:22 UTC, Dean Brettle	Details
Patch to resolve problems illustrated by TestWalker.java (5.51 KB, patch) 2001-06-22 11:27 UTC, Dean Brettle	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Anthony Roy 2001-04-26 11:10:38 UTC

Even in very small documents, a TreeWalkerImpl created to find nodes of a 
specific name can take arbitrarily long amounts of time. Also, in larger 
documents, similar filters can cause StackOverflowError's.

I have included source for a small test file below that can be used to 
reproduce the bug. In it, an xml document is created and passed to the 
TreeWalker. A filter is created that matches only nodes with a given name. 
TreeWalker.nextNode() begins to take arbitrarily long amounts of time for even 
small documents (~60 seconds for a 3/3/2 (~20) node document). This only occurs 
when there is a low number of nodes that match the filter.

In addition, a wider filter (for example two tags) can cause a 
StackOverflowError on large documents (e.g. 14/13/11 (2000+) nodes).

Running the following code will result in the output:
>>
rootheader
null
approximately 68.098 seconds
>>
Changing the size of the document to 14/13/11, and setting the nodefilter to 
use FILTER_TAGS_B will result in the following output:
>>
rootheader
java.lang.StackOverflowError
>>

Code:
>>>>>
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.TreeWalker;
import org.apache.xerces.dom.TreeWalkerImpl;
import org.w3c.dom.traversal.NodeFilter;
import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.InputSource;
import java.io.StringReader;

public class TestWalker {
	// this many children of root node
	private static final int NUM_BRANCH = 3;
	// this many children of each branch node
	private static final int NUM_TWIG = 3;
	// this many children for each twig node
	private static final int NUM_LEAF = 2;
	private static final String[] FILTER_TAGS_A = {"rootheader"};
	private static final String[] FILTER_TAGS_B = {"rootheader",
										
		 "rootfooter"};

	public TestWalker() {
		try {
			runTest(parseXML(createXML()));
		} catch (StackOverflowError e) {
			System.out.println(e);
		}
	} // end constructor()

	// Creates a TreeWalker with a simple name filter and loops
	// through each node returned until the document is finished.
	private void runTest(Document doc) {
		// Create TreeWalker with inline filter.
		// The filter simply looks for a node matching the filter node.
		TreeWalker walker = new 
			TreeWalkerImpl(doc, Node.ELEMENT_NODE,
						   new NodeFilter() {
								   protected 
String tag[] = FILTER_TAGS_A;
								   
								   public short 
acceptNode (Node n) {
									   
String testName = n.getNodeName();
									   for 
(int i=0; i < tag.length; i++) {
									
	   if (testName.equals(tag[i]))
										
	   return FILTER_ACCEPT;
									   }
									   
return FILTER_SKIP;
								   } // end 
acceptNode(Node)
								   
							   }, true);
		// Loop until done.
		long startTime;
		long endTime;
		startTime = System.currentTimeMillis();
		Node node = walker.nextNode();
		while (node != null) {
			System.out.println(node.getNodeName());
			node = walker.nextNode();
		}
		endTime = System.currentTimeMillis();
		System.out.println("null");
		double runTime = (double)(endTime - startTime);
		System.out.println("approximately " + (runTime/1000.0) + 
						   " seconds");
	} // end runTest()

	// Parses given String and returns a Document
	private Document parseXML(String xmlSrc) {
		// Double check accuracy of created xml:
		// System.out.println("Source:\n" + xmlSrc);
		DOMParser parser = new DOMParser();
		InputSource in = new InputSource(new StringReader(xmlSrc));
		try { parser.parse(in);
		} catch (Exception e) {e.printStackTrace();System.exit(1);}
		return parser.getDocument();
	} // end parseXML()

	// Creates a simple Document based on global parameters.
	// The root has NUM_BRANCH children, each has NUM_TWIG children,
	// and each of those has NUM_LEAF children.
	private String createXML() {
		StringBuffer xmlSrc = new StringBuffer();
		xmlSrc.append("<?xml version=\"1.0\"?>\n");
		xmlSrc.append("<root><rootheader>Root</rootheader>\n");
		for (int x=0; x < NUM_BRANCH; x++) {
			xmlSrc.append
("\t<branch><branchheader>Branch"+x+"</branchheader>\n");
			xmlSrc.append
("\t\t<resource>Resource"+x+"</resource>\n");
			for (int y=0; y<NUM_TWIG; y++) {
				xmlSrc.append
("\t\t<twig><twigheader>Twig"+x+"."+y+
							  "</twigheader>\n");
				for (int z=0; z<NUM_LEAF; z++)
					xmlSrc.append
("\t\t\t<leaf>Leaf"+x+"."+y+"."+z+"</leaf>\n");
				xmlSrc.append("\t\t</twig>\n");
			}
			xmlSrc.append("\t</branch>\n");
		}
		xmlSrc.append("<rootfooter>The End</rootfooter>\n");
		xmlSrc.append("</root>");
		return xmlSrc.toString();
	} // end createXML()

	// Usage: java TestWalker
	public static void main(String[] args) {
		TestWalker bob = new TestWalker();
	} // end main
} // end class TestWalker

>>>>>

Comment 1 Elena Litani 2001-06-19 09:55:10 UTC

Fixed in CVS today. Can you pick up latest Xerces and try it out?

Comment 2 Dean Brettle 2001-06-22 11:22:40 UTC

Created attachment 250 [details]
Revised test case showing new failure modes and unresolved failures

Comment 3 Dean Brettle 2001-06-22 11:27:44 UTC

Created attachment 251 [details]
Patch to resolve problems illustrated by TestWalker.java

Comment 4 Dean Brettle 2001-06-22 11:50:11 UTC

Unfortunately, Elena's fix created a new problem.  Specifically,
TreeWalker.firstChild() now returns the nextSibling() of the current node if the
current node is a leaf node.  I attached a test case (id=250) to illustrate.

The test case also takes forever when traversing backwards (eg via
previousNode()).

The cause of the original problem (and the remaining problem with
previousNode()) is that getNextSibling(Node) and getPreviousSibling(Node) can
traverse all the way up the tree, even when they are called from
getFirstChild(Node) and getLastChild(Node).  I've attached a patch (id=251)
which solves the problem by allowing getFirstChild(Node) and getLastChild(Node)
to specify the parent node above which getNextSibling(Node) and
getPreviousSibling(Node) should not traverse.

Even with my patch, there is still a problem with running out of stack space
when traversing very large documents.  I see this if I set NUM_BRANCH = 10000 in
the test case.  To solve this the class would need to be rewritten to not use
recursion.

Comment 5 Elena Litani 2001-06-22 12:05:22 UTC

Dean thank you for testing the code and submitting the patch: I did not realize 
that getFirstChild() is also used in DOM firstChild() method.. I will take a 
look at your patch.

Comment 6 Elena Litani 2001-06-26 07:18:20 UTC

Applied the patch. Might need to revisit in the future.