Bug 68558 - Redundant calls to ByteChunk.toString()
Summary: Redundant calls to ByteChunk.toString()
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.81
Hardware: All All
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-29 17:26 UTC by John Engebretson
Modified: 2024-03-06 21:48 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Engebretson 2024-01-29 17:26:32 UTC
Our high-volume, latency-sensitive application spends approximately 0.16% of cpu executing org.apache.tomcat.util.buf.ByteChunk.toString. We see only two direct callers: org.apache.tomcat.util.buf.MessageBytes.toString() and org.apache.tomcat.util.http.Parameters.processParameters().  Aggregate cost is split evenly between the two.

Call data shows many different routes trigger both these paths. Examination of the specific strings being looked up shows:
1. 40% Redundant parsing of key fields on org.apache.catalina.connector.Request: method, queryString, requestURI, headerNames, content type.
2. 10% Redundant lookup/iteration of header names
3. 50% Redundant parsing of request parameters, triggered from multiple paths within the JSP (see stack traces below).

Key portions of the stack traces are included below.

This code can be substantially optimized by locally caching the translated fields.  Local tests of modifications to the simplest case (Request.getMethod()) confirm improvement.
 
 
 
org.apache.tomcat.util.buf.StringCache.toString
org.apache.tomcat.util.buf.ByteChunk.toString
org.apache.tomcat.util.buf.MessageBytes.toString
org.apache.catalina.connector.Request.getMethod
org.apache.catalina.connector.RequestFacade.getMethod
javax.servlet.http.HttpServletRequestWrapper.getMethod
javax.servlet.http.HttpServletRequestWrapper.getMethod
org.apache.jsp.WEB_002dINF.<proprietary>.common_jsp._jspService

org.apache.tomcat.util.buf.StringCache.toString
org.apache.tomcat.util.buf.ByteChunk.toString
org.apache.tomcat.util.http.Parameters.processParameters
org.apache.tomcat.util.http.Parameters.processParameters
org.apache.tomcat.util.http.Parameters.handleQueryParameters
org.apache.catalina.core.ApplicationHttpRequest.mergeParameters
org.apache.catalina.core.ApplicationHttpRequest.parseParameters
org.apache.catalina.core.ApplicationHttpRequest.getParameter
javax.servlet.ServletRequestWrapper.getParameter
javax.servlet.ServletRequestWrapper.getParameter
javax.servlet.jsp.el.ImplicitObjectELResolver$ScopeManager$7.getAttribute
javax.servlet.jsp.el.ImplicitObjectELResolver$ScopeManager$7.getAttribute
javax.servlet.jsp.el.ImplicitObjectELResolver$ScopeMap.get
javax.el.MapELResolver.getValue
org.apache.jasper.el.JasperELResolver.getValue
org.apache.el.parser.AstValue.getValue
org.apache.el.ValueExpressionImpl.getValue
org.apache.jasper.el.JspValueExpression.getValue
org.apache.jsp.WEB_002dINF.<proprietary>_jsp._jspx_meth_c_005fset_005f1

Ref: <c:set var = "featureName" value= "${ param.featureName }"/> 


org.apache.tomcat.util.buf.StringCache.toString
org.apache.tomcat.util.buf.ByteChunk.toString
org.apache.tomcat.util.http.Parameters.processParameters
org.apache.tomcat.util.http.Parameters.processParameters
org.apache.tomcat.util.http.Parameters.handleQueryParameters
org.apache.catalina.connector.Request.parseParameters
org.apache.catalina.connector.Request.getParameterNames
org.apache.catalina.connector.Request.getParameterMap
org.apache.catalina.connector.RequestFacade.getParameterMap
<proprietary>Filter.doFilter()
Comment 1 Mark Thomas 2024-02-06 07:47:27 UTC
Are you sure about the request parameters being parsed multiple times? They should only be parsed once per request.

Is org.apache.tomcat.util.buf.StringCache not caching the Strings or is just that local caching has much lower latency (which wouldn't surprise me)?

Caching the String in ByteChunk.toString() is an option. I did look at that when I last worked on this code but it raises all sorts of complications around cache invalidation as the byte array is directly accessible.
Comment 2 John Engebretson 2024-02-06 13:55:55 UTC
> Are you sure about the request parameters being parsed multiple times? They should only be parsed once per request.
Yes - this appears to be triggered when an extra param is added by a JSP include, for example:

<jsp:include page="/myExample.jsp" >
 <jsp:param name="type" value="image" />
 <jsp:param name="index" value="${status.index}" />
</jsp:include>

ApplicationDispatcher.wrapRequest is invoked and creates a new ApplicationHttpRequest with un-parsed parameters.  If the included JSP ever checks the parameters, the full set of parameters is parsed: the original plus any new ones added by the <jsp:include>.  Nested JSP includes can trigger reparsing once per nest.

Relevant stack trace:
org.apache.catalina.core.ApplicationDispatcher.org.apache.catalina.core.ApplicationDispatcher.doInclude
org.apache.catalina.core.ApplicationDispatcher.doInclude
org.apache.catalina.core.ApplicationDispatcher.include
org.apache.jasper.runtime.JspRuntimeLibrary.include


> Is org.apache.tomcat.util.buf.StringCache not caching the Strings or is just that local caching has much lower latency (which wouldn't surprise me)?

Lower latency.  70% of StringCache.toString() is spent in org.apache.tomcat.util.buf.StringCache.findClosest() which does appear to function as intended.

> Caching the String in ByteChunk.toString() is an option. I did look at that when I last worked on this code but it raises all sorts of complications around cache invalidation as the byte array is directly accessible.

Makes sense.  :(  This may be easier for org.apache.coyote.Request, where a reference to MessageBytes could be replaced by a reference to String (possibly lazy-loaded).
Comment 3 Mark Thomas 2024-02-12 15:07:13 UTC
Thanks for the additional info. I'll try and address each part in turn.

1. Caching the key fields is low cost as it is just a reference to an existing String. I think that is a better solution that trying to cache in MessageBytes/ByteChunk/CharChunk due to the cache invalidation complexities. I think we can do this quickly.

2. Could you provide examples. If there is any low-hanging fruit similar to1 that should be easy to address.

3. Ah. Got it. That is going to be a little trickier to address. I suspect some refactoring will be required as we have to take account of the parameter merging  rules. Not sure how easy that will be.

My plan at this stage is to address the low hanging fruit and then switch this to an enhancement (strictly this is all an enhancement) for the parameter refactoring.
Comment 4 Mark Thomas 2024-02-12 18:11:38 UTC
A quick update:

1.
getMethod() is already effectively cached due to use of MessageBytes.toStringType()

getQueryString() could be cached but there is a potential performance issue. If applications parse the query string manually or use getParameter() and friends all is well. If an application uses both, using MessageBytes.toStringType() could trigger bytes -> String -> bytes if getQueryString() is called before getParameter() and friends. I'm minded to leave this one alone for now.

requestURI(), contentType() and headerNames can all switch to MessageBytes.toStringType(). I'll do that shortly.

2. Waiting for examples.

3. Cannot reproduce. I've reviewed the code and manually debugged an included JSP. The original query string is parsed once and the additional query string is parsed once.
Comment 5 Mark Thomas 2024-02-12 20:33:15 UTC
Simple things fixed in:
- 11.0.x for 11.0.0-M17 onwards
- 10.1.x for 10.1.19 onwards
-  9.0.x for  9.0.86 onwards
-  8.5.x for  8.5.99 onwards

Switching to NEEDINFO for the remaining issues.
Comment 6 John Engebretson 2024-02-12 21:05:00 UTC
Thank you for the quick turnaround!  Additional data:

Referring to #1:
> getMethod() is already effectively cached due to use of 
> MessageBytes.toStringType()

Our data is clear that this is re-parsing each time.  The call occurs at the beginning of each JSP at the very beginning of the generated _jspService() method:

final java.lang.String _jspx_method = request.getMethod();


Referring to #2: the code triggering this is below.  I can't clearly determine whether the slow enum comes from request.getHeaders() or request.getHeaderNames().

HttpServletRequest request = context.getRequest();
Enumeration<String> nameEnum = request.getHeaderNames();
while (nameEnum.hasMoreElements()) {
  String name = nameEnum.nextElement();
  List<String> headerValues = new LinkedList<String>();

  Enumeration<String> valueEnum = request.getHeaders(name);
  while (valueEnum.hasMoreElements()) {
    headerValues.add(valueEnum.nextElement());
  }

  headerMap.put(name, Collections.unmodifiableList(headerValues));
}

Referring to #3:
> Cannot reproduce.
I don't immediately see another trigger for this case, but will keep looking.  The behavior occurs throughout our app but I'll also compare with some others.
Comment 7 John Engebretson 2024-02-12 21:05:20 UTC
Updating status
Comment 8 Mark Thomas 2024-02-12 23:32:39 UTC
The caching for getMethod() isn't present in 9.0.81. It is present from 9.0.83 onwards.

I've added caching of the header values as well as the names.
Comment 9 John Engebretson 2024-02-13 17:10:32 UTC
Regarding #1: apologies, I was able to track down an invokeinterface that confused our primary tool.  Specifically, a class calling HttpServletRequest.getMethod() on an instance of HttpServletRequestWrapper that could wrap: a) an instance of org.apache.catalina.connector.Request, b) another HttpServletRequestWrapper, or c) one of multiple custom subclasses.

I agree that the value of Request.getMethod() is cached on MessageBytes and I'll fix the issue for our application by implementing local caching on a custom subclass.

Regarding #3: I suspect the same problem as #1, and am certain that the same custom class can be modified to solve this problem.

No Tomcat issues beyond what you've already fixed. Sorry for the noise!
Comment 10 Mark Thomas 2024-02-13 17:20:14 UTC
Thanks for the update. Marking this as fixed based on that update.
Comment 11 John Engebretson 2024-03-06 21:48:48 UTC
These changes reached prod and eliminated the respective profiles.