Bug 3199 - Getting OutOfMemoryError parsing large xml-file (2.4MB) on OS VMS.
Summary: Getting OutOfMemoryError parsing large xml-file (2.4MB) on OS VMS.
Status: NEW
Alias: None
Product: Xerces-J
Classification: Unclassified
Component: DOM (show other bugs)
Version: 1.4.2
Hardware: All other
: P3 blocker
Target Milestone: ---
Assignee: Xerces-J Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2001-08-21 05:04 UTC by mats.bjornlund
Modified: 2005-03-20 17:06 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mats.bjornlund 2001-08-21 05:04:03 UTC
I get OutOfMemoryError when I try to parse larger xml-files (2.4MB) with one 
xsd-file (10KB):

"Creating DOMParser object ...
Setting Validation ON ...
[Start parsing ...
Exception in thread "main" java.lang.OutOfMemoryError
at java.util.Hashtable.clone(Hashtable.java, Compiled Code)
at org.apache.xerces.validators.common.XMLValidator$ValueStoreCache.star
tElement(XMLValidator.java, Compiled Code)
at org.apache.xerces.validators.common.XMLValidator.callStartElement(XML
Validator.java, Compiled Code) at 
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch
(XMLDocumentScanner.java, Compiled Code) at 
org.apache.xerces.framework.XMLDocumentScanner.parseSome
(XMLDocumentScanner.java, Compiled Code) at 
org.apache.xerces.framework.XMLParser.parse(XMLParser.java, Compiled 
Code) at org.apache.xerces.framework.XMLParser.parse(XMLParser.java, Compiled 
Code) at Dparser.parse(Dparser.java, Compiled Code) at Dparser.main
(Dparser.java, Compiled Code)"


When I parse small sized xml-files it works really fine. It complains on the 
right things etc, but for larger xml-files, I get this OutOfMemoryError. 


My own written java file look like this:
"import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import java.io.IOException;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXParseException;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;

public class Dparser implements ErrorHandler {
   private String errOK = "OK";

   /** Warning. */
   public void warning(SAXParseException ex) {
      System.err.println("[Warning] "+
                        getLocationString(ex)+": "+
                        ex.getMessage());
      errOK = "";
   }

   /** Error. */
   public void error(SAXParseException ex) {
      System.err.println("[Error] "+
                        getLocationString(ex)+": "+
                        ex.getMessage());
      errOK = "";
   }

   /** Fatal error. */
   public void fatalError(SAXParseException ex) throws SAXException {
      System.err.println("[Fatal Error] "+
                        getLocationString(ex)+": "+
                        ex.getMessage());
      errOK = "";
      throw ex;
   }

   /** Returns a string of the location. */
   private String getLocationString(SAXParseException ex) {
      StringBuffer str = new StringBuffer();
      String systemId = ex.getSystemId();
      if (systemId != null) {
         int index = systemId.lastIndexOf('/');
         if (index != -1) 
            systemId = systemId.substring(index + 1);
         str.append(systemId);
      }
      str.append(':');
      str.append(ex.getLineNumber());
      str.append(':');
      str.append(ex.getColumnNumber());
      return str.toString();
   } // getLocationString(SAXParseException):String

   private static void printUsage() {
      System.out.println("Program:     Dparser");
      System.out.println("Description: Check if xml-file is wellformed and 
validate the schemas.");
      System.out.println("Syntax:      java \"Dparser\" [-v] <xmlfile>");
      System.out.println("             -v     No validation of the schemas are 
done");
   }

   /** Doing the parsing job for this class */
   private void parse(String xmlFile, boolean validation) {
      System.out.println("Creating DOMParser object ...");
      DOMParser parser = new DOMParser();
      try {
         parser.setErrorHandler(this); 
         if (validation) {
            System.out.println("Setting Validation ON ...");
            parser.setFeature("http://xml.org/sax/features/validation", true);
         } else {
            System.out.println("No validation will be done.");
         }
         System.out.println("[Start parsing ...");
         parser.parse(xmlFile);
         System.out.println("... parsing Done] "+errOK);
      } catch (SAXNotSupportedException e) {
         System.out.println("ERROR: SAXNotSupportedException throwned !");
      } catch (SAXNotRecognizedException e) {
         System.out.println("ERROR: SAXNotRecognizedException throwned !");
      } catch (SAXException se) {
         System.out.println("Error: NOT WELLFORMED ! \n[");
         se.printStackTrace();
         System.out.println("] ERROR:"+xmlFile+" is NOT WELLFORMED !");
      } catch (IOException ioe) {
         ioe.printStackTrace();
      }
      Document document = parser.getDocument();
   }   

   /** Main program. Validate the xml-file and also check the xsd-files */
   public static void main(String [] args) {
   /*   String xmlFile = "xsdtest.xml"; */
      String xmlFile = "";
      boolean validation = true;
      if (args.length==0) {
         Dparser.printUsage();
         System.exit(0);
      } else if (args.length == 1) {
         xmlFile = args[0];
      } else if (args.length == 2) { // Two parameters
         if (args[0].equals("-v") || args[0].equals("-V")) {
            validation = false;
            xmlFile = args[1];
         } else {
            System.out.println("Error: Wrong parameter \""+args[0]+"\".\n");
            Dparser.printUsage();
            System.exit(0);
         }
      } else {
         System.out.println("Error. Wrong parameters. \n");
         Dparser.printUsage();
         System.exit(0);
      }
      Dparser p = new Dparser(); 
      p.parse(xmlFile, validation);
   }
}"

1. I use the DOM parser
2. The memory is like this:
"AXP AF 092%> show work
Working Set (pagelets)  /Limit=10000  /Quota=250000  /Extent=250000
Adjustment enabled      Authorized Quota=250000  Authorized Extent=250000
Working Set (8Kb pages) /Limit=625  /Quota=15625  /Extent=15625
Authorized Quota=15625  Authorized Extent=15625"

Java version:
"AXP AF 092%> java -version
java version "1.2.2-3"
Classic VM (build J2SDK.v.1.2.2-3:10/31/2000-08:52, native threads, jit_122)
AXP AF 092%>"
Comment 1 mats.bjornlund 2001-08-21 05:12:10 UTC
To be able to use the files in the xerces.jar files I had to extract the files 
othervise the compiler couldent find the files in tha JAR-file. I tried to set 
CLASSPATH and also tried JAVA$CLASSPATH. I have also tried to use -classpath 
option parameter when compiling. Nothing worked, soo it ended up that I 
extracted all files. Then the compiler could find them when I dident worked 
against the JAR-file.

I could then compile and parse small xml-files without any problems, BUT when I 
work against larger files (2.4MB) then the OutOfMemory problem starts. And we 
have xml-files that are as big as over 20MB. Soo for us it is critical that we 
can run larger files.
Comment 2 mats.bjornlund 2001-08-22 02:25:09 UTC
I can now parse the xml-file but I have to allocate ALOT of memory when 
starting the program. To be able to parse a 2.4MB file I have to use 40-
70MB "ram". That is alot, we have to handle 30MB files, soo that could be a 
problem. Well, I shall now try to implement the SAX parser hoping that it is 
faster and also spending much less memory. Parsing a 600KB file takes 40sec, 
but parsing a 2.4MB file takes 3 minutes.

/Mats