Bug 64415

Summary: Error in slow query report - Comparison method violates its general contract!
Product: Tomcat Modules Reporter: spiffy2man
Component: jdbc-poolAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED DUPLICATE    
Severity: normal CC: c.webb
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: HP   
OS: Linux   

Description spiffy2man 2020-05-07 15:56:43 UTC
Java Version: 11
Tomcat is running Spring Boot 2.2.4

I am getting the issue below in our production environment. I am unable to reproduce it on my local machine, most likely due to longevity and the number of calls required per second to reproduce the issue. Production receives millions of requests per day and the DB for this project does have connectivity/performance issues at times, which is causing the slow queries. I have seen other bugs for this issue, but they show that this was already fixed in a previous version; however, it is still showing in my logs.

Here is the stack trace for this issue:

Unable to create statement proxy for slow query report.
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.base/java.util.TimSort.mergeLo(Unknown Source) ~[na:na]
	at java.base/java.util.TimSort.mergeCollapse(Unknown Source) ~[na:na]
	at org.apache.tomcat.jdbc.pool.interceptor.SlowQueryReport.removeOldest(SlowQueryReport.java:222) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.interceptor.SlowQueryReport.getQueryStats(SlowQueryReport.java:209) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.interceptor.SlowQueryReport.prepareStatement(SlowQueryReport.java:152) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:77) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:75) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.TrapException.invoke(TrapException.java:40) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:108) ~[tomcat-jdbc-9.0.30.jar:na]
	at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:75) ~[tomcat-jdbc-9.0.30.jar:na]
	at com.sun.proxy.$Proxy140.prepareStatement(Unknown Source) ~[na:na]
	at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$5.doPrepare(StatementPreparerImpl.java:149) ~[hibernate-core-5.4.10.Final.jar:5.4.10.Final]
	at org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPreparationTemplate.prepareStatement(StatementPreparerImpl.java:176) ~[hibernate-core-5.4.10.Final.jar:5.4.10.Final]
	at org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryStatement(StatementPreparerImpl.java:151) ~[hibernate-core-5.4.10.Final.jar:5.4.10.Final]
...
Comment 1 Michael Osipov 2020-05-07 16:15:57 UTC
I have seen this contract violations once in a while too, but where able to reproduce it.
Comment 2 Christopher Schultz 2020-05-07 16:16:51 UTC
Which version of tomcat-pool is in use?

I'll bet the problem is that the QueryStats are changing while the slow-query report is running.

SlowQueryReport.java:222 is a call to Collections.sort() over an ArrayList<QueryStats>. The sort uses a QueryStatsComparator object to sort, and that comparator uses the "last invocation" of each query to provide the ordering.

If that "last invocation" changes during the sorting, weird things can happen.

My guess is that because Java 11 introduced the new TimSort which is a little pickier about changing conditions during the sorting, you are seeing errors which were previously simply ignored by the older merge-sort algorithm.

I'm not super-familiar with the operation of the SlowQueryReport. How often is SlowQueryReport.removeOldest called? If it's "often", then this operation is particularly heavy and probably ought to be optimized a bit.

I don't see an option to select the older merge-sort in the API, so the only quick fix would be to NOT sort, which makes the current behavior (remove the oldest N entries, where N may not = 1) difficult to reproduce.

Another quick fix (in terms of programming effort) would be to clone every QueryStats object in the list before sorting it. Obviously, there would be a performance hit for doing that.
Comment 3 Christopher Schultz 2020-05-07 16:30:07 UTC
Here is a patch that has been only compile-tested.

It introduces a small class to represent the important information used to make decisions during removeOldest() and creates a (fairly) stable collection to sort.

===== CUT =====

diff --git a/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java b/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java
index b6c207d..7657ab5 100644
--- a/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java
+++ b/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java
@@ -213,16 +213,44 @@
         return qs;
     }
 
+    static class MiniQueryStats {
+        public final String query;
+        public final long lastInvocation;
+
+        public MiniQueryStats(String query, long lastInvocation) {
+            this.query = query;
+            this.lastInvocation = lastInvocation;
+        }
+    }
+
+    static class MiniQueryStatsComparator implements Comparator<MiniQueryStats>
+    {
+        @Override
+        public int compare(MiniQueryStats stats1, MiniQueryStats stats2) {
+            return Long.compare(handleZero(stats1.lastInvocation),
+                    handleZero(stats2.lastInvocation));
+        }
+
+        private static long handleZero(long value) {
+            return value == 0 ? Long.MAX_VALUE : value;
+        }
+    }
+
+    private MiniQueryStatsComparator miniQueryStatsComparator = new MiniQueryStatsComparator();
+
     /**
      * Sort QueryStats by last invocation time
      * @param queries The queries map
      */
     protected void removeOldest(ConcurrentHashMap<String,QueryStats> queries) {
-        ArrayList<QueryStats> list = new ArrayList<>(queries.values());
-        Collections.sort(list, queryStatsComparator);
+        ArrayList<MiniQueryStats> list = new ArrayList<>(queries.size());
+        for(QueryStats stats : queries.values())
+            list.add(new MiniQueryStats(stats.query, stats.lastInvocation));
+
+        Collections.sort(list, miniQueryStatsComparator);
         int removeIndex = 0;
         while (queries.size() > maxQueries) {
-            String sql = list.get(removeIndex).getQuery();
+            String sql = list.get(removeIndex).query;
             queries.remove(sql);
             if (log.isDebugEnabled()) log.debug("Removing slow query, capacity reached:"+sql);
             removeIndex++;


===== CUT =====

There are a few race-conditions still present:

1. If the Map.values changes during the execution of the copy-loop, it's possible that QueryStats objects could be counted more than once, or not at all. I'm not an expert at how collections behave under high contention.

2. If the snapshot of the stats-by-query-string schedules the removal of an "old" query, but the query is executed after the snapshot was taken, then the query will be removed from the stats pool which may be surprising. One way to get around this is to check that miniQueryStats.lastInvocation == queryStats.lastInvocation and skip that item if it's been run recently.

Possibly more.
Comment 4 Mark Thomas 2020-05-07 16:30:32 UTC
Correct project
Comment 5 Christopher Schultz 2020-05-07 16:37:19 UTC
An updated patch which avoids one of the race-conditions identified in the previous patch. Comments welcome. Only compile-tested at this point.

diff --git a/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java b/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java
index b6c207d..2bf5c74 100644
--- a/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java
+++ b/modules/jdbc-pool/src/main/java/org/apache/tomcat/jdbc/pool/interceptor/SlowQueryReport.java
@@ -213,18 +213,55 @@
         return qs;
     }
 
+    static class MiniQueryStats {
+        public final QueryStats queryStats;
+        public final long lastInvocation;
+
+        public MiniQueryStats(QueryStats queryStats) {
+            this.queryStats = queryStats;
+            this.lastInvocation = queryStats.lastInvocation;
+        }
+    }
+
+    static class MiniQueryStatsComparator implements Comparator<MiniQueryStats>
+    {
+        @Override
+        public int compare(MiniQueryStats stats1, MiniQueryStats stats2) {
+            return Long.compare(handleZero(stats1.lastInvocation),
+                    handleZero(stats2.lastInvocation));
+        }
+
+        private static long handleZero(long value) {
+            return value == 0 ? Long.MAX_VALUE : value;
+        }
+    }
+
+    private MiniQueryStatsComparator miniQueryStatsComparator = new MiniQueryStatsComparator();
+
     /**
      * Sort QueryStats by last invocation time
      * @param queries The queries map
      */
     protected void removeOldest(ConcurrentHashMap<String,QueryStats> queries) {
-        ArrayList<QueryStats> list = new ArrayList<>(queries.values());
-        Collections.sort(list, queryStatsComparator);
+        // Make a defensive deep-copy of the query stats list to prevent
+        // concurrent changes to the lastModified member during list-sort
+        ArrayList<MiniQueryStats> list = new ArrayList<>(queries.size());
+        for(QueryStats stats : queries.values())
+            list.add(new MiniQueryStats(stats));
+
+        Collections.sort(list, miniQueryStatsComparator);
         int removeIndex = 0;
         while (queries.size() > maxQueries) {
-            String sql = list.get(removeIndex).getQuery();
-            queries.remove(sql);
-            if (log.isDebugEnabled()) log.debug("Removing slow query, capacity reached:"+sql);
+            MiniQueryStats mqs = list.get(removeIndex);
+            // Check to see if the lastInvocation has been updated since we
+            // took our snapshot. If the timestamps disagree, it means
+            // that this item is no longer the oldest (and it likely now
+            // one of the newest).
+            if(mqs.lastInvocation == mqs.queryStats.lastInvocation) {
+                String sql = mqs.queryStats.query;
+                queries.remove(sql);
+                if (log.isDebugEnabled()) log.debug("Removing slow query, capacity reached:"+sql);
+            }
             removeIndex++;
         }
     }
Comment 6 spiffy2man 2020-05-07 18:25:41 UTC
Here are the jdbc interceptor configurations that we are using for the slow query report. I meant to attach them earlier, but I couldn't find them in our code.

SlowQueryReport(threshold=120000,maxQueries=1000,logSlow=true,logFailed=true)
Comment 7 Remy Maucherat 2020-05-07 19:33:51 UTC
Almost certain Chris is right given what the verification code in the JVM does. Isn't the solution a bit costly maybe ?
Comment 8 Christopher Schultz 2020-05-07 21:57:37 UTC
The only other solution I can see is to scan the map values looking for the "oldest" one and evict a maximum of one per scan. Make sure not to evict anything which has changed since the beginning of the scan. Keep scanning until you have a small enough map.

Pseudocode:

while(queries.size() > maxQueries) {
  QueryStats oldest = null;
  long timestamp = Long.MAX_VALUE;
  for(QueryStats qs : queries.values()) {
    if(qs.lastInvocation < timestamp) {
      timestamp = qs.lastInvocation;
      oldest = qs;
    }
  }

  if(null != qs && timestamp == qs.lastInvocation) {
    queries.remove(qs.query);
  }
}

So you replace a List-creation, possible 1000 temporary objects (the MiniQueryStats), and a trip through Collections.sort() with zero object-creations but a bunch of iterations over the collection.

It's probably more efficient to have multiple iterations and not do any of hat object-create stuff.

I'm guessing that the common case is 1-2 evictions. You don't need to actually *sort* the whole 1000-item list just to find the one with the oldest 'lastInvocation'.

If you really wanted to instrument the process and see how many evictions are common, you could even "unroll" the loop a bit by storing not just a single "oldest" and "timestamp" but *N* oldest/last-timestamp pairs and then evict them all at once.
Comment 9 Christopher Schultz 2020-05-07 22:11:37 UTC
Getting down to algorithmic complexity, the TimSort is O(n log n) -- same as merge sort. But again, we don't actually need a completely-sorted array. We just need the oldest N items.

If you do a single scan of the array, that O(n) of course, so N scans of the array will be N * O(n). Note N != n. N is the number of evictions you want to do, and "n" is the number of elements in the list.

So:
sort+evict = O(n log n)
multiple scans = N * O(n)

Given that N should be relatively small, I think multiple scans wins the complexity battle.
Comment 10 Christopher Schultz 2020-05-13 14:45:41 UTC

*** This bug has been marked as a duplicate of bug 58489 ***