When the flags RE.MATCH_SINGLELINE and RE.MATCH_CASEINDEPENDENT are both set, the program dies if the scanned file is too long. If I set either of the flags, but NOT BOTH, the file is scanned successfully. However, if I set both flags, the program dies if scanned file is over a specific length. code snippet: int iFlags = RE.MATCH_CASEINDEPENDENT|RE.MATCH_SINGLELINE; RE re = new RE(strPattern,iFlags); Reader reader = new FileReader(strFilePathAndName); CharacterIterator in = new ReaderCharacterIterator(reader); int iEnd=0; while(re.match(in,iEnd)) { iEnd= re.getParenEnd(0) String strFoundTag = re.getParen(0); ... }
How does the program 'die', please describe. Please also provide sample strPattern and file. Thanks.
die The program simply stops executing. I do not receive any errors. I do not see any exceptions in the log. comments: The code executes successfully when I do not set the flag to RE.MATCH_SINGLELINE. Unfortunately, but as would be expected, I do not get a match when the content is continued onto the next line. However, when I set the flag to RE.MATCH_ SINGLELINE, the program simply stops executing partially through the file. If the file is short, it completes successfully. However, if the file is longer, execution stops. I receive no errors or thrown exceptions. notes: I have since changed the pattern to read as follows: (<A[^>]*>)|(<APPLET[^>]*>)|(<AREA[^>]*>)| etc It seems to work. thoughts: Even if the old pattern and code are stupid, it seems I should still get some type of error. I would think that there would be some type of exception that I could trap or at least see in the log. problem pattern: (<A(.)*>)|(<APPLET(.)*>)|(<AREA(.)*>)| etc code: //Search html file for pattern. try { //Construct an RE object int flags = RE.MATCH_CASEINDEPENDENT|RE.MATCH_SINGLELINE; RE re = new RE(strPattern,flags); //Use the object to match to the input. Reader r = new FileReader(strFilePathAndName); CharacterIterator in = new ReaderCharacterIterator(r); int end=0; while(re.match(in,end)) { //Reset starting point in input file end = re.getParenEnd(0); //Retrieve Tag String strFoundTag = re.getParen(0); logger.debug("Found Tag:"+strFoundTag); //Process tag appropriately. //Retrieve urls from tag and add to array of urls. Iterator iterator = alResourceElementsList.iterator(); while (iterator.hasNext()) { //Search for each possible element of the tag. String strElement = (String)iterator.next(); int iBeginUrl = strFoundTag.trim().toUpperCase().indexOf (strElement); //If an element is found, retrieve the url from the element. char cEndChar = '"'; if (iBeginUrl >= 0) //Element found { int iEndUrl = strFoundTag.trim().toUpperCase().indexOf (cEndChar,iBeginUrl+strElement.length()+2); if( ! ((iBeginUrl+2) <= iEndUrl) ) { logger.error("Cannot retrieve url from element in scanHtmlFileForResourceTags."); logger.error ("FilePathAndName: "+strFilePathAndName); logger.error("Tag: "+ strFoundTag); logger.error("Element: "+ strElement); return 1; } String strTempUrl = strFoundTag.substring (iBeginUrl+strElement.length()+2,iEndUrl); //TO DO: Test for CODEBASE/CODE for APPLET! String strUrl; if (strElement.trim().equalsIgnoreCase ("CODEBASE")) { //Do not code until determined that this code is necessary. strUrl = strTempUrl; logger.error("APPLET tag contains element CODEBASE."); logger.error("Program does not contain code to process CODEBASE."); logger.error("Base url to resolve relative url is current directory of html file."); logger.error("The corresponding database entry is incorrect."); }//EndProcessCodeBase else { strUrl = strTempUrl; }//EndProcessAllOtherTags logger.debug("Url:"+strUrl); //Save each url that does not start with "#" //(Tags A and FRAME can start w/# - see doc for details) if (strUrl != null) { if (strUrl.startsWith("#")) break; }//EndUrlNotNull //Save url. alUrl.add(strUrl); //Save tag name only, not entire tag. String strSpace = " "; String strTag = strFoundTag.substring (1,strFoundTag.indexOf(strSpace)); alLinkTagType.add(strTag); }///EndElementFound }//EndIterateThroughElements }//EndWhileMatchesInHtmlFile }//EndTryFindMatchesInHtml catch(RESyntaxException e) { logger.error("Regular Expression syntax expression."); logger.error("File Path and Name:"+strFilePathAndName); return 1; } catch(FileNotFoundException e) { logger.error("FileNotFoundException on scanHtmlFileForResourceTags"); logger.error("File Not Found:"+strFilePathAndName); return 1; } logger.info("End of Routine"); input file: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Index of Pages in AWStats Web Site</title> <!--- link rel="STYLESHEET" type="text/css" href="../styles/w3c_oldstyle.css"--- > <link rel="STYLESHEET" type="text/css" href="../styles/default.css"> </head> <body> <div id="pageContent"> <h1>Documentation for AWStats Pilot Deployment in APHIS</h1> <h2>Current Notices</h2> <p>This weekend, 1-3 July, testing will be preformed to determine is AWStats can process logs for periods within a month which were missed in a previous run. The application will also be tested to determine if logs for a previous month can be run. In other words, can logs for March be processed if the April logs have already be run.</p> <h2>Web Server Reports</h2> <ul class="menuLinks" title="Dynamic WWW Web Server Reports for Current Month"> <ul class="menuLinkItem"> <li class="source"><a href="/awstats/awstats.pl?config=www.aphis.usda.gov" target="_blank">AWStats Report for Web Server WWW for Current Month</a> (Opens new window)</li> <li class="desc">This link connects the viewer to a collection of AWStats reports for the APHIS Internet web server (www.aphis.usda.gov) for the current month. As these report are created dynamically, be aware that the report can take up to 10 seconds to appear at high-demand times.</li> <li class="format">(html)</li> </ul> <ul class="menuLinks" title="List of Static WWW Web Server Reports for Current Month"> <ul class="menuLinkItem"> <li class="source"><a href="./AWStats_report_index.html">List of AWStats Reports for Web Server WWW for Current Month< /a></li> <li class="desc">This link connects the viewer to a list of static AWStats reports for the APHIS Internet web server (www.aphis.usda.gov) for the current month. These reports are regenerated every day at 3:00 AM.</li> <li class="format">(html)</li> </ul> </ul> </ul> <h2>AWStats Internal On-Line Resources</h2> <ul class="menuLinks" title="AWStats Internal On-Line Resources"> <ul class="menuLinkItem"> <li class="source"><a href="/pages/how_to_run_web_analytic_reports_using_AWStats.html"> How to Run Web Analytic Reports Using AWStats</a></li> <li class="desc">This document briefly describes how to run standard reports using the AWStats web server log analysis tool as a CGI application from a browser</li> <li class="format">(html)</li> </ul> <ul class="menuLinkItem"> <li class="source"><a href="/pages/example_AWStats_reports.html"> Examples of Creating On-Line reports with AWStats</a></li> <li class="desc">This document provides several examples of running AWstats reports as a CGI script using a browser.</li> <li class="format">(html)</li> </ul> <ul class="menuLinkItem"> <li class="source"><a href="/docs/AWStats_pilot_options.doc"> AWStats Options for Pilot Deployment</a></li> <li class="desc">This document provides a brief overview of the business case for deploying AWStats web analytics application in a pilot mode. Also details configuration for AWStats during pilot mode.</li> <li class="format">(MS-Word)</li> </ul> <ul class="menuLinkItem"> <li class="source"><a href="/pages/faq.html"> Frequently Asked Questions</a></li> <li class="desc">This document provides answers to questions frequently asked by AWStats users.</li> <li class="format">(html)</li> </ul> <ul class="menuLinkItem"> <li class="source"><a href="/pages/todo.html"> AWStats Web Site To-Do List</a></li> <li class="desc">This document is a list of items that need to be accomplished to support the AWStats pilot deployment.</li> <li class="format">(html)</li> </ul> </ul> <h2>AWStats External On-Line Resources</h2> <ul class="menuLinks" title="AWStats External On-Line Resources"> <ul class="menuLinkItem"> <li class="source"><a href="http://awstats.sourceforge.net/index.html"> AWStats Project Page</a></li> <li class="desc">Main Sourceforge project web site for AWStats, which bills itself as a free powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically.</li> <li class="format">(html)</li> </ul> <ul class="menuLinkItem"> <li class="source"><a href="http://sourceforge.net/forum/forum.php? forum_id=43428"> AWStats Forum (General)</a></li> <li class="desc">AWStats forum for general users hosted by Sourceforge.</li> <li class="format">(php)</li> </ul> <ul class="menuLinkItem"> <li class="source"><a href="http://awstats.sourceforge.net/docs/awstats.pdf"> AWStats Documentation</a></li> <li class="desc">User documentation for AWStats. <li class="format">(pdf)</li> </ul> </ul> </div> </body> </html>
This simple program (adapted from what you have sent) shows what is really happening: import org.apache.regexp.CharacterIterator; import org.apache.regexp.RE; import org.apache.regexp.ReaderCharacterIterator; import java.io.FileReader; import java.io.IOException; public class Test { public static void main(String[] args) throws IOException { RE re = new RE("(<A(.)*>)|(<APPLET(.)*>)|(<AREA(.)*>)", RE.MATCH_CASEINDEPENDENT | RE.MATCH_SINGLELINE); CharacterIterator in = new ReaderCharacterIterator(new FileReader("index.html")); int end = 0; try { while (re.match(in, end)) { System.out.println("Matched " + re.getParen(0)); end = re.getParenEnd(0); } System.out.println("Done"); } catch (Throwable e) { System.out.println("Exception " + e); } } } If you run it with the input file 'index.html' in the same directory, you'd see: Exception java.lang.StackOverflowError It is duplicate of bug #764. If you have ideas how to fix it please comment in bug #764. *** This bug has been marked as a duplicate of 764 ***