Bug 26826

Summary: Performance enhancements in Utils.escapeXml
Product: Taglibs Reporter: Martin B.E. van Dijken <sunsear>
Component: Standard TaglibAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED DUPLICATE    
Severity: normal CC: kschneider
Priority: P3    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Martin B.E. van Dijken 2004-02-10 12:12:28 UTC
David Wall sent a performance enhancing patch to the jakarta-taglibs-user list.
After modding it some and testing it fairly extensively, I've created a patch
from his enhancements. Below the patch we would like to see applied:

Index: Util.java
===================================================================
RCS file:
/home/cvspublic/jakarta-taglibs/standard/src/org/apache/taglibs/standard/tag/common/core/Util.java,v
retrieving revision 1.12
diff -u -r1.12 Util.java
--- Util.java	13 Dec 2003 05:48:07 -0000	1.12
+++ Util.java	10 Feb 2004 12:07:55 -0000
@@ -152,23 +152,47 @@
      * See also OutSupport.out().
      */
     public static String escapeXml(String input) {
-        StringBuffer sb = new StringBuffer();
-        for (int i = 0; i < input.length(); i++) {
-            char c = input.charAt(i);
-            if (c == '&')
-                sb.append("&amp;");
-            else if (c == '<')
-                sb.append("&lt;");
-            else if (c == '>')
-                sb.append("&gt;");
-            else if (c == '"')
-                sb.append("&#034;");
-            else if (c == '\'')
-                sb.append("&#039;");
-            else
-                sb.append(c);
-        }
-        return sb.toString();
+		StringBuffer sb = null;
+		int length = input.length();
+		int currentPos;
+
+		//Check if there are any special characters to escape
+		for (currentPos = 0; currentPos < length; currentPos++) {
+			char c = input.charAt(currentPos);
+			if (c == '&' || c == '<' || c == '>' || c == '"' || c == '\'') {
+				sb = new StringBuffer(length+5);
+				sb.append(input.substring(0,currentPos)); // copy over the string up until
we found this char
+				break;
+			}
+		}
+
+		// If we didn't create a new buffer, then the input string didn't need any
escaping so we win big time
+		// and we can just return the same string they gave us (no object creation,
no copying).
+		if ( sb == null )
+		{
+			System.out.println("Nothing found to escape. Returning.");
+			return input;
+		}
+		
+		System.out.println("Something found to escape. processing.");
+
+		// Oh well, we did have some escaping to do, so let's check the rest to see
if there are any more to do.
+		for (; currentPos < length;++currentPos) {
+			char c = input.charAt(currentPos);
+			if (c == '&')
+				sb.append("&amp;");
+			else if (c == '<')
+				sb.append("&lt;");
+			else if (c == '>')
+				sb.append("&gt;");
+			else if (c == '"')
+				sb.append("&#034;");
+			else if (c == '\'')
+				sb.append("&#039;");
+			else
+				sb.append(c);
+		}
+		return sb.toString();
     }  
     
     /**
Comment 1 Martin B.E. van Dijken 2004-02-10 12:17:17 UTC
Oops, forgot to remove my last printlns. Below a fresh patch:

Index: Util.java
===================================================================
RCS file:
/home/cvspublic/jakarta-taglibs/standard/src/org/apache/taglibs/standard/tag/common/core/Util.java,v
retrieving revision 1.12
diff -u -r1.12 Util.java
--- Util.java	13 Dec 2003 05:48:07 -0000	1.12
+++ Util.java	10 Feb 2004 12:16:08 -0000
@@ -152,23 +152,45 @@
      * See also OutSupport.out().
      */
     public static String escapeXml(String input) {
-        StringBuffer sb = new StringBuffer();
-        for (int i = 0; i < input.length(); i++) {
-            char c = input.charAt(i);
-            if (c == '&')
-                sb.append("&amp;");
-            else if (c == '<')
-                sb.append("&lt;");
-            else if (c == '>')
-                sb.append("&gt;");
-            else if (c == '"')
-                sb.append("&#034;");
-            else if (c == '\'')
-                sb.append("&#039;");
-            else
-                sb.append(c);
-        }
-        return sb.toString();
+		StringBuffer sb = null;
+		int length = input.length();
+		int currentPos;
+
+		//Check if there are any special characters to escape
+		for (currentPos = 0; currentPos < length; currentPos++) {
+			char c = input.charAt(currentPos);
+			if (c == '&' || c == '<' || c == '>' || c == '"' || c == '\'') {
+				sb = new StringBuffer(length+5);
+				sb.append(input.substring(0,currentPos)); // copy over the string up until
we found this char
+				break;
+			}
+		}
+
+		// If we didn't create a new buffer, then the input string didn't need any
escaping so we win big time
+		// and we can just return the same string they gave us (no object creation,
no copying).
+		if ( sb == null )
+		{
+			return input;
+		}
+		
+
+		// Oh well, we did have some escaping to do, so let's check the rest to see
if there are any more to do.
+		for (; currentPos < length;++currentPos) {
+			char c = input.charAt(currentPos);
+			if (c == '&')
+				sb.append("&amp;");
+			else if (c == '<')
+				sb.append("&lt;");
+			else if (c == '>')
+				sb.append("&gt;");
+			else if (c == '"')
+				sb.append("&#034;");
+			else if (c == '\'')
+				sb.append("&#039;");
+			else
+				sb.append(c);
+		}
+		return sb.toString();
     }  
     
     /**


Comment 2 Kris Schneider 2004-02-11 12:34:04 UTC
My feedback from taglibs-dev (http://marc.theaimsgroup.com/?t=107639893000003):

It looks like most of the speedup of the new version can be gained by just
making the old version take the input length into account (based on my own
limited benchmarking, YMMV):

public static String escapeXml(String input) {
  String output = input;
  if (input != null) {
    int length = input.length();
    StringBuffer sb = new StringBuffer(length);
    for (int i = 0; i < length; i++) {
      char c = input.charAt(i);
      if (c == '&') sb.append("&amp;");
      else if (c == '<') sb.append("&lt;");
      else if (c == '>') sb.append("&gt;");
      else if (c == '"') sb.append("&#034;");
      else if (c == '\'') sb.append("&#039;");
      else sb.append(c);
    }
    output = sb.toString();
  }
  return output;
}

I suppose you could also add a test for input.length() == 0. If the time savings
aren't really that significant, I'd opt for simpler, easier to read code. BTW,
any modification should account for input == null. It doesn't make sense to
throw an NPE.
Comment 3 Felipe Leme 2004-05-12 00:26:49 UTC
CC'ing the taglibs-dev address to all Standard bugs. 
Comment 4 Justyna Horwat 2004-06-11 01:44:08 UTC

*** This bug has been marked as a duplicate of 25382 ***