Bug 66513 - Primary Key Violation using PersistentManager + PersistentValves +
Summary: Primary Key Violation using PersistentManager + PersistentValves +
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.72
Hardware: PC All
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-06 18:00 UTC by Vincent Liautaud
Modified: 2023-06-01 17:12 UTC (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Liautaud 2023-03-06 18:00:34 UTC
Hi all,
when using PersistentManager + org.apache.catalina.valves.PersistentValve to store session data in a Database table (in order to run Tomcat on containers without sticky sessions) we get sometimes a "primary key constraint violation" due to different thread that are badly synchronized.

When looking at source code, it appears that the usage of PersistentValve assumes that we get only one request per session at any on time (and you propose a filter parameter on the persistentValve component to avoid having several requests at the same time).
However, in the case this filter is not complete (and so we get several requests with the same sessionId at a time), the source code in org.apache.catalina.valves.PersistentValve as well in org.apache.catalina.session.DataSourceStore use a synchronized block to avoid this problem but this synchronized block is useless because it uses a session object that is different for each request.

You could solve this issue by using a static HashMap with keys and values storing sessionId string, and using this sessionId string value as an object to synchronize the blocks used to store the session in the DB table.
The process could be :
-If sessionId String not already store as a HashMap Key
    -Store sessionId String as key and values
Synchronized(theHashMap(sessionId) {
    ....
    ....
    remove it from the hashMap
}

That way you would really synchronize threads that use a same sessionId (and no more) and would avoid those "primary key violation" errors.

Regards
Comment 1 Christopher Schultz 2023-03-07 13:19:53 UTC
The synchronization in PersistentManager and DataSourceStore/JDBCStore appear to be reasonable. The session itself is being used as the monitor and not an arbitrary object stored within the session, which might have been a problem.

I think your problem is not multiple simultaneous requests on a single node (which ought to be safe), but multiple simultaneous requests across more than one node.

When saving the session, DataSourceStore/JDBCStore will DELETE the existing session record and INSERT a new record in its place. This does not happen in a transaction and therefore it's possible for two nodes in the group to execute this series of queries:

Node A: DELETE FROM sessions...
Node B: DELETE FROM sessions...
Node A: INSERT INTO sessions...
Node B: INSERT INTO sessions...

This could be fixed with a transaction which surrounds the DELETE and subsequent INSERT.

It could also be fixed with, as a TODO in the code suggests, using an UPDATE if the record already exists. (IMHO this is much better for performance as fewer indexes, etc. would need to be updated when saving a session.)

But there is another problem: even if the database synchronization issues are resolved, you may find that you have replaced the problem with a race-condition within your own application. Let's say the following situation occurs (paraphrasing):

Node A: session.setAttribute("counts", session.getAttribute("counts") + 1);
Node B: session.setAttribute("counts", session.getAttribute("counts") + 1);
Node B: BEGIN ; DELETE ; INSERT
Node A: BEGIN ; DELETE ; INSERT

One of the increments has been lost -- the increment from node B. You can abstract this to *any* change to the session, since Tomcat saves the session all at once and not as individual records e.g. for each attribute. The same can be true if Node A adds a new attribute to the session and Node B adds a different one.

If the sessions ever disagree about the contents of the session, some session data will be lost. It is very difficult to completely prevent this kind of thing from happening given the architecture of the DataSourceStore/JDBCStore.

In light of this, do you think that either using a transaction or switching to an INSERT ... ON UPDATE type of behavior would resolve this problem for you in a satisfactory way? If your sessions are colliding in the db itself but you aren't worried about conflicting data, then either of these solutions will probably work for you.
Comment 2 Vincent Liautaud 2023-03-07 20:53:11 UTC
Hi Christopher,
i confirm that this issue happens when using only one node.
In my opinion, the issue comes from the session object used to synchronize the delete/insert sql request in DataSourceStore/JDBCStore. From what i can see the session object is different for each request (the session attributes can be the same but the objects themselves are different).

I made a test by replacing the "synchtonize(session)" by a "synchronize(lock)" with a lock object define like this : "private static final Object lock = new Object();"
=> It solves the issue (all "primary key constraint violation" disappear.
Of course this is not the good solution because it synchronize the code block for each request, even for request using different sessionId (what could cause performance problems). 

That is why i suggested to lock on a object that would lock all requests using the same sessionId. 

Finaly regarding the race condition you mention at the end of your comment, this is not something that should happen in our standard JEE application using sessionIds stored in cookies (as all the servlet requests using a specific sessionId should come from a unique client/browser "in sequence" - no AJAX usage or equivalent => So we should not attend conditions where concurrent requests updating the session attributes for a same sessionId be treated by different nodes in //. The issue we face with those "primary key violation" comes from an incomplete filter (where some requests on static components like gif/css/... are not filtered correctly).

Do you agree with that ?

Regards
Comment 3 Christopher Schultz 2023-03-07 23:19:11 UTC
(In reply to Vincent Liautaud from comment #2)
> I confirm that this issue happens when using only one node.

Interesting.

> In my opinion, the issue comes from the session object used to synchronize
> the delete/insert sql request in DataSourceStore/JDBCStore. From what i can
> see the session object is different for each request (the session attributes
> can be the same but the objects themselves are different).

Aha, yes. I believe the PersistentValve causes the session to be re-loaded for each request. So that could result in separate Session objects in memory for a single session-id.

I think this would be very difficult to "fix" because the application will get whatever session is returned by the Manager at the time it is requested. Thread A may cause the session to be loaded from storage and handed to Request A only to have Thread B do the same thing immediately afterward.

You are right: they will get separate objects each time, and therefore synchronization is nearly impossible. I'm not sure what the best was to do this is, because it's very easy to make bad mistakes like synchronizing on the whole session-store which will kill performance.

> I made a test by replacing the "synchtonize(session)" by a
> "synchronize(lock)" with a lock object define like this : "private static
> final Object lock = new Object();"
> => It solves the issue (all "primary key constraint violation" disappear.
> Of course this is not the good solution because it synchronize the code
> block for each request, even for request using different sessionId (what
> could cause performance problems). 
> 
> That is why i suggested to lock on a object that would lock all requests
> using the same sessionId. 

This could be done with a separate map of locks-for-session-ids. Or maybe lock on the String object which holds the session-id as the key of the session-id-to-Session map. Though I don't think you can ask a Map for its key object(s) without iterating through the whole set of Map.Entry values.

String.intern?

> Finaly regarding the race condition you mention at the end of your comment,
> this is not something that should happen in our standard JEE application
> using sessionIds stored in cookies (as all the servlet requests using a
> specific sessionId should come from a unique client/browser "in sequence" -
> no AJAX usage or equivalent => So we should not attend conditions where
> concurrent requests updating the session attributes for a same sessionId be
> treated by different nodes in //. The issue we face with those "primary key
> violation" comes from an incomplete filter (where some requests on static
> components like gif/css/... are not filtered correctly).
> 
> Do you agree with that ?

Yes, I think so. Thanks for giving more details about your specific use-case. Specifically, you aren't trying to ensure bullet-proof session-management under high-frequency session-contention. This is mostly about the PersistentValve+PersistentManager+DataSourceStore/JDBCStore tripping over its own feet.

I think we can take this to the development mailing list for discussion, then come back here with a solution + fix.
Comment 4 Vincent Liautaud 2023-03-08 08:55:16 UTC
Great thanks...
Comment 5 Vincent Liautaud 2023-03-15 14:48:46 UTC
Hi all,
in addition to the fix, may you ask the development team :
1-To patch the following documentation regarding PersistentValve : https://tomcat.apache.org/tomcat-9.0-doc/config/valve.html
The documentation contains an example of filter : filter=".*\.gif|.*\.js|.*\.jpeg|.*\.jpg|.*\.png|.*\.htm|.*\.html| .*\.css|.*\.txt" 
 
This filter introduce a malicious space character before the .*\.css !!!
For users (like me) who will copy and paste this filter in their configuration file, all css requests are not filtered.

2-Could you ask the development team to add a debug trace in order to display all the requests that have not been filtered (see bellow). That way we could verify and detect all the errors/lack of filtering.

Regards and have a nice week. 


@Override
public void invoke(Request request, Response response) throws IOException, ServletException {

        // request without session
        if (isRequestWithoutSession(request.getDecodedRequestURI())) {
            getNext().invoke(request, response);
            return;
        }
       
	// TO BE ADDED TO DISPLAY ALL THE REQUEST URI NOT FILTERED
	if (container.getLogger().isDebugEnabled()) {
                            container.getLogger().debug("Request not filtered:"+request.getRequestURI());
        }
Comment 6 Christopher Schultz 2023-03-21 17:18:20 UTC
(In reply to Vincent Liautaud from comment #5)
> in addition to the fix, may you ask the development team :

Let's try to keep the discussion of this enhancement request on-topic with the original request. If you have other fixes, etc. either post them to the development mailing list or file separate bugs for them.

It's not okay to have "Random changes Vincent wants" to be a perpetually-open bug in Bugzilla.
Comment 7 Remy Maucherat 2023-05-26 08:28:40 UTC
First, I'd like to add that this issue is in the javadoc of PersistentValve: it requires a persistent manager, *and* "to work correctly it assumes only one request exists per session at any one time". So no big surprise then.

Actually, someone already talked about this problem (= the session object can change so sync on that object may fail to do what is expected): http://illegalargumentexception.blogspot.com/2008/04/java-synchronizing-on-transient-id.html
Unfortunately, it seems the solution proposed is not good since it syncs too much.
Similarly, String.intern should not be used for sync (explanations are available online).

Instead, I think we should use the original suggestion from Vincent: the ParsistentManager (or most likely, the StoreBase) should use a concurrent map to deduplicate the session ids. But there's still a problem: all the session ids that are in a store would now be in memory. Maybe that's not a problem these days but it was certainly something that was avoided before.

So I cannot find a solution yet that fixes this while keeping everything as it was.
Comment 8 Vincent Liautaud 2023-05-26 09:13:47 UTC
Hi Remy,

thank you for your reply and i understand the issue regrding the memory usage of this concurrent map.
Anyway, as explained before we have been in a position where multiple requests exist at a time for a same session (which is explicitly explained to avoid in your documentation, as you noticed) because :
1-The documentation contained an example of filter : filter=".*\.gif|.*\.js|.*\.jpeg|.*\.jpg|.*\.png|.*\.htm|.*\.html| .*\.css|.*\.txt" 
 
This filter introduced a malicious space character before the .*\.css !!!
For users (like me) who had copy and paste this filter in their configuration file, all css requests were not filtered (so Servlet + CSS generate concurrent requests for a same session => It seems you have corrected the documentation recently (So Thanks, problem fixed).

2-There had no debug trace in order to display all the requests that have not been filtered. This kind of trace would be very useful to identify some lack on the filter (to check that no file type extention have been forgotten on these filter). 

Regards
Comment 9 Mark Thomas 2023-05-30 16:55:00 UTC
This bug is heading towards being resolved as WONTFIX.

For a long explanation see https://markmail.org/message/5gtpirnouqfraw6w

The short version is that while we can construct a solution to ensure no more than one concurrent request per session on a single Tomcat node, we can't do it across multiple nodes.

Given we can't fix the multiple node issue we have to rely on the clients not issuing more than one concurrent request per session. If the clients break that guarantee, stuff will break.

Is the PersistenceValve still useful in these circumstances?

Re-reading this issue, you mention both using this with containers plus non-sticky load-balancing and with a single node. What is the use case for a single node?

Is there any benefit to Tomcat guaranteeing no more than one concurrent request per session for a single node. If so, how and why is that a benefit for your use case?
Comment 10 Mark Thomas 2023-06-01 10:14:13 UTC
I have applied the single Tomcat node fix to :
- 11.0.x for 11.0.0-M7 onwards
- 10.1.x for 10.1.10 onwards
-  9.0.x for  9.0.76 onwards
-  8.5.x for  8.5.90 onwards

A final decision on deprecation & removal is pending answers to the questions in comment #9
Comment 11 Vincent Liautaud 2023-06-01 13:37:05 UTC
Hi, 

let me try to give more explanation on the context and the issue we faced.
We are in a process where we move to the cloud (on Google Cloud Platform) some internal web application that currently run on premise, on several tomcat, all installed on different servers with a sticky session load balancer in front of them.
Because on this sticky session Load balancer we don't use persistent valve on premise (session information are stored in JVM Tomcat Memory).
When moving to GCP, we choose to containerise those applications and run them on a Google managed services named Cloud Run. Cloud Run is typically a serverless solution that run and scale containers (up and down) based on the volume of requests.
As any solution that run and scale containers, you can't fully manage the ways the underlying solution (Cloud Run or K8S....) scale up and down the containers. Saying that, even if Google propose a solution that "looks like" a sticky session load balancer, this solution can't assure you 100% that all the requests of a same user will always target the same container (typically Google destroy sometimes some container because the volume of requests decrease, even if some users are "connected" to them.
So, this kind of containerised infrastructure, need a solution to centralise and share the session informations beetween different containers, BECAUSE you can't be 100% sure that all requests of a same session will always be treated by the same container.
That's why we used Persistent Valve to solve this issue.
Now let's go back to the initial reason of raising this BZ tikcet : 
In our use case, persistent Valve fully solve our problem, because we have no more concurrent requests on the same session (no more on a single node than on multiple node).
The ticket was raised because of another little bug described above (and repeated bellow) : When configuring PersistentValve we copied and pasted part of tomcat documentation explaining how to implement a filter : 
filter=".*\.gif|.*\.js|.*\.jpeg|.*\.jpg|.*\.png|.*\.htm|.*\.html| .*\.css|.*\.txt" 
This filter introduced a malicious space character before the .*\.css !!! resulting of having several concurrent requests (servlet but also CSS - badly filtered). In this configuration, the way persistentValve update the session information (by sequencing delete and insert SQL requests) genrated a kind of "PRIMARY KEY VIOLATION" on the database. As the synchonized block was useless (because of previous information given on the initial comments), with to concurrent request we were in a situation where : 
-Request 1 (Servlet) : DELETE session information
-Request 2 (CSS) : DELETE session information
-Request 1 (Servlet) : INSERT session information
-Request 2 (CSS) : INSERT session information ******=> DUPLICATE KEY ERROR
It took some time to us to identify the problem with the filter (and the missing space) => Because no debug log can be used to identify which request is treated by persistent valve (servlet or CSS or something else)....But since we solved this issue with the filter...It works fine. We don't have any more errors and persistentValve work very well even in a containerised environnement.

Conclusion : Please don't deprecate this functionnity...
And if possible add some debug information on the kind of request treated (would help to identify concurrent requests)....
Thank's for all
Regards
Comment 12 Mark Thomas 2023-06-01 13:52:28 UTC
Thanks, that is really helpful. I'll add the debug logging. Given there is a valid use case, I don't plan to remove or deprecate the Valve.
Comment 13 Mark Thomas 2023-06-01 17:12:27 UTC
Debug logging added in:
- 11.0.x for 11.0.0-M7 onwards
- 10.1.x for 10.1.10 onwards
-  9.0.x for  9.0.76 onwards
-  8.5.x for  8.5.90 onwards