Bug 66120 - j_security_check returns 408 if j_security_check request lands on different tomcat server from original server
Summary: j_security_check returns 408 if j_security_check request lands on different t...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 9
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 9.0.30
Hardware: PC All
: P2 normal (vote)
Target Milestone: -----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-14 19:44 UTC by psakkanan
Modified: 2022-08-22 15:02 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description psakkanan 2022-06-14 19:44:24 UTC
Setup:
1. Have two tomcat instances and session back-up with Memcached for failover.
2. Use FormAuthenticator for authentication 

Scenario:
1. render login form from TC-instance-1
2. submit login form request (j_security_check) to TC-instance-2 ( to simulate tomcat fail-over or load balancer routes the request to other instance for any reason )

Observation: 
TC-instance-2 returns 408 

Addition information:
From commit https://github.com/apache/tomcat/commit/fd381e94f222831fd2bee697deb6246d417b8f33 form authenticator expects session id from session-note, 
Session note being transient, it’s not serialized, not backed up by backup manager. This result into session is set to expire/null and cascading 408 error

With modern infrastructure failure is expected (like pod/node eviction [Kubernetes HPA trashing] or load balancers consistent hashing algorithm changes sticky ness ) so the failover is more frequent
Comment 1 Mark Thomas 2022-06-20 13:39:45 UTC
Do we want to support this? It would mean finding a way to serialize:
- the expected session ID (part of the CSRF protection)
- the saved request 

This looks to be doable although it would some effort to ensure that the serialization changes were done in a backwards compatible manner. We would also need to keep in mind that there may be further changes in serialization format in the future.
Comment 2 psakkanan 2022-06-20 16:52:57 UTC
Please remember that this issue would pop as random login failure ( on Kubernetes or similar cloud infra) mostly on prod/deployed env. 
It took weeks for us to deduce this issue from random login error to definitive reproduceable steps. This would be the case for others as well ( last thing anyone suspect is tomcat).

So considering impact on user experience (credentials are stored in browser) and difficulties to reproduce/debug, I wish this to be fixed and documented until its fixed so users (of tomcat) are aware of this and saves bit of time
Comment 3 Mark Thomas 2022-07-13 17:08:30 UTC
My current thinking is that make this behaviour optional depending on the setting of the "persistAuthentication" attribute of the Manager.

If we do this that way, the change to the session serialization format can be handled in a backwards compatible manner.

If there are no objections, I intend to implement this in time for the August release round.
Comment 4 psakkanan 2022-07-14 18:25:49 UTC
 I like making it optional
Comment 5 Mark Thomas 2022-08-21 14:43:04 UTC
Having started work on this it is more complex that it first appears.

The main reason is needing to make sure a cluster can perform a rolling upgrade. Getting this to work in a Tribes based cluster requires creating a new message type. Currently, receiving a new message type will trigger an error. This means users need to upgrade in two stages. First to a version that understands (or at least doesn't trigger an error) for the new message type. Second to a version that uses the new message type to transfer the session note.

We may decide to wait more than one release from implementing the first stage to the second.

The fix for non-cluster managers (which is what the original request was for) looks to be simpler and should be possible to provide in the next release.
Comment 6 Mark Thomas 2022-08-22 14:02:23 UTC
Slight change of plan.

10.1.x will just fix this (the 10.1.x releases are still milestones).

10.0.x and earlier will fix this but sending the new messages will be controlled by configuration and disabled by default. To upgrade from a version before this fix to using this fix would require two stages: 1) upgrade all nodes to a version with this fix 2) enable the configuration option to send the new messages on each node.
Comment 7 Mark Thomas 2022-08-22 15:02:18 UTC
Fixed in:
- 10.1.x for 10.1.0-M18 onwards
- 10.0.x for 10.0.24 onwards
-  9.0.x for  9.0.66 onwards
-  8.5.x for  8.5.83 onwards