Bug 63550

Summary: LDAP non standard port leads to JNDIRealm erratic behaviour
Product: Tomcat 9 Reporter: Eugène Adell <eugene.adell>
Component: CatalinaAssignee: Tomcat Developers Mailing List <dev>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: 9.0.x   
Target Milestone: -----   
Hardware: All   
OS: All   

Description Eugène Adell 2019-07-07 07:23:04 UTC
When configuring a JNDIRealm with a non default port, and without alternate URL :

   <Realm className="org.apache.catalina.realm.JNDIRealm" debug="10"
    useStartTls="true"
    connectionURL="ldap://X.X.X.X:1389"
    userPattern="cn={0},ou=users,dc=mycorp,dc=com"
    roleBase="ou=groups,dc=mycorp,dc=com"
    roleSubtree="true"
    roleNested="true"
    roleName="cn"
    roleSearch="(uniqueMember={0})" />

We can see Tomcat still trying to open connections to the default port (389), moreover on localhost although the realm is configured with a non localhost IP.

Here is the network capture showing this behaviour, the two last packets are themselves the problem :

01:23:26.672885 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [S], seq 1196755961, win 43690, options [mss 65495,sackOK,TS val 2631880516 ecr 0,nop,wscale 7], length 0
01:23:26.672966 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [S.], seq 4093902768, ack 1196755962, win 43690, options [mss 65495,sackOK,TS val 2631880516 ecr 2631880516,nop,wscale 7], length 0
01:23:26.673035 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [.], ack 1, win 342, options [nop,nop,TS val 2631880516 ecr 2631880516], length 0
01:23:26.680284 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [P.], seq 1:61, ack 1, win 342, options [nop,nop,TS val 2631880524 ecr 2631880516], length 60
01:23:26.680319 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [.], ack 61, win 342, options [nop,nop,TS val 2631880524 ecr 2631880524], length 0
01:23:26.680614 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [P.], seq 1:15, ack 61, win 342, options [nop,nop,TS val 2631880524 ecr 2631880524], length 14
01:23:26.680814 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [.], ack 15, win 342, options [nop,nop,TS val 2631880524 ecr 2631880524], length 0
01:23:26.957182 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [P.], seq 61:233, ack 15, win 342, options [nop,nop,TS val 2631880800 ecr 2631880524], length 172
01:23:26.959576 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [P.], seq 15:1010, ack 233, win 350, options [nop,nop,TS val 2631880803 ecr 2631880800], length 995
01:23:26.959748 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [.], ack 1010, win 357, options [nop,nop,TS val 2631880803 ecr 2631880803], length 0
01:23:27.073508 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [P.], seq 233:500, ack 1010, win 357, options [nop,nop,TS val 2631880917 ecr 2631880803], length 267
01:23:27.113251 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [.], ack 500, win 359, options [nop,nop,TS val 2631880957 ecr 2631880917], length 0
01:23:27.113291 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [P.], seq 500:506, ack 1010, win 357, options [nop,nop,TS val 2631880957 ecr 2631880957], length 6
01:23:27.113305 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [.], ack 506, win 359, options [nop,nop,TS val 2631880957 ecr 2631880957], length 0
01:23:27.128492 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [P.], seq 506:591, ack 1010, win 357, options [nop,nop,TS val 2631880972 ecr 2631880957], length 85
01:23:27.128512 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [.], ack 591, win 359, options [nop,nop,TS val 2631880972 ecr 2631880972], length 0
01:23:27.128927 IP X.X.X.X.1389 > X.X.X.X.11486: Flags [P.], seq 1010:1101, ack 591, win 359, options [nop,nop,TS val 2631880972 ecr 2631880972], length 91
01:23:27.129081 IP X.X.X.X.11486 > X.X.X.X.1389: Flags [.], ack 1101, win 357, options [nop,nop,TS val 2631880972 ecr 2631880972], length 0
01:23:27.141642 IP 127.0.0.1.58499 > 127.0.0.1.389: Flags [S], seq 2317515753, win 43690, options [mss 65495,sackOK,TS val 2631880985 ecr 0,nop,wscale 7], length 0
01:23:27.141675 IP 127.0.0.1.389 > 127.0.0.1.58499: Flags [R.], seq 0, ack 2317515754, win 0, length 0

Tomcat logs the corresponding connection reset like this

Jul 06, 2019 8:27:17 PM org.apache.catalina.realm.JNDIRealm authenticate
SEVERE: Exception performing authentication
javax.naming.CommunicationException: localhost:389 [Root exception is java.net.ConnectException: Connection refused (Connection refused)]
  at com.sun.jndi.ldap.Connection.<init>(Connection.java:216)
  at com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137)
  at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1614)
  at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2746)
  at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:319)
  at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:70)
  at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
  at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313)
  at javax.naming.InitialContext.init(InitialContext.java:244)
  at javax.naming.ldap.InitialLdapContext.<init>(InitialLdapContext.java:154)
  at org.apache.catalina.realm.JNDIRealm.createTlsDirContext(JNDIRealm.java:2585)
  at org.apache.catalina.realm.JNDIRealm.createDirContext(JNDIRealm.java:2487)
  at org.apache.catalina.realm.JNDIRealm.open(JNDIRealm.java:2471)
  at org.apache.catalina.realm.JNDIRealm.authenticate(JNDIRealm.java:1322)
  at org.apache.catalina.authenticator.BasicAuthenticator.authenticate(BasicAuthenticator.java:127)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:566)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
  at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
  at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1137)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
  at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:589)
  at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at com.sun.jndi.ldap.Connection.createSocket(Connection.java:350)
  at com.sun.jndi.ldap.Connection.<init>(Connection.java:203)
  ... 28 more

None of my config files mention port 389, it's coming from nowhere.

I tested both Tomcat 7 and 9 and got the the same result.
Comment 1 Eugène Adell 2019-07-07 07:38:48 UTC
JNDIRealm.java :

        if (connectionURL != null && connectionAttempt == 0)
            env.put(Context.PROVIDER_URL, connectionURL);
        else if (alternateURL != null && connectionAttempt > 0)
            env.put(Context.PROVIDER_URL, alternateURL);

This code doesn't cover the case when alternateURL == null && connectionAttempt == 1

It doesn't seem difficult to provide a patch but it's necessary to know the expected behaviour in such case. Probably retry a connection with the connectionURL parameters ?
Comment 2 Michael Osipov 2019-07-07 08:17:15 UTC
(In reply to Eugène Adell from comment #1)
> JNDIRealm.java :
> 
>         if (connectionURL != null && connectionAttempt == 0)
>             env.put(Context.PROVIDER_URL, connectionURL);
>         else if (alternateURL != null && connectionAttempt > 0)
>             env.put(Context.PROVIDER_URL, alternateURL);
> 
> This code doesn't cover the case when alternateURL == null &&
> connectionAttempt == 1
> 
> It doesn't seem difficult to provide a patch but it's necessary to know the
> expected behaviour in such case. Probably retry a connection with the
> connectionURL parameters ?

Don't use the altenative URL. Oracle's default implementation for LDAP will handle multiple space-separated URLs properly.

The issue you see is simply a bug in the realm. localhost:389 is queried when no provider URL is given.
Comment 3 Eugène Adell 2019-07-07 10:52:41 UTC
(In reply to Michael Osipov from comment #2)
> Don't use the altenative URL. Oracle's default implementation for LDAP will
> handle multiple space-separated URLs properly.

Is it documented anywhere ? I don't see anything at least here : https://tomcat.apache.org/tomcat-8.5-doc/config/realm.html#JNDI_Directory_Realm_-_org.apache.catalina.realm.JNDIRealm

> The issue you see is simply a bug in the realm. localhost:389 is queried
> when no provider URL is given.

Exactly, this is what I tried to show with the code excerpt.

How to reproduce : by trying an unavailable server in connectionURL, or trying a server which is available but fail for any other reason (typically SSL handshake reasons : certificate not trusted by Tomcat, incompatible Cipher Suites,.. )
Comment 4 Michael Osipov 2019-07-07 13:17:57 UTC
(In reply to Eugène Adell from comment #3)
> (In reply to Michael Osipov from comment #2)
> > Don't use the altenative URL. Oracle's default implementation for LDAP will
> > handle multiple space-separated URLs properly.
> 
> Is it documented anywhere ? I don't see anything at least here :
> https://tomcat.apache.org/tomcat-8.5-doc/config/realm.
> html#JNDI_Directory_Realm_-_org.apache.catalina.realm.JNDIRealm

That's the wrong spot, you need to look here: https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-ldap-gl.html#url
Comment 5 Eugène Adell 2019-07-07 18:28:53 UTC
(In reply to Michael Osipov from comment #4)
> That's the wrong spot, you need to look here:
> https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/jndi-ldap-gl.
> html#url

Thank you Michael, that doc is interesting.
In fact you're suggesting that alternateURL is redundant with the built-in failover mechanism of JNDI LDAP (correct me if I'm wrong, as you didn't say that explicitely).

I then made some tests to check the behaviour.
With the built-in failover (for example connectionURL="ldap://X.X.X.X:1389 ldap://X.X.X.X:2389"), I still get the very same problem when the first LDAP is up and I introduce handshake failures. It doesn't switch to the 2nd server, instead it is trying on localhost:389. Here's the capture :

20:02:20.775274 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [S], seq 2174431655, win 43690, options [mss 65495,sackOK,TS val 2699014618 ecr 0,nop,wscale 7], length 0
20:02:20.775327 IP X.X.X.X.1389 > X.X.X.X.38685: Flags [S.], seq 989781603, ack 2174431656, win 43690, options [mss 65495,sackOK,TS val 2699014619 ecr 2699014618,nop,wscale 7], length 0
20:02:20.775364 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [.], ack 1, win 342, options [nop,nop,TS val 2699014619 ecr 2699014619], length 0
20:02:20.776028 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [P.], seq 1:61, ack 1, win 342, options [nop,nop,TS val 2699014619 ecr 2699014619], length 60
20:02:20.776095 IP X.X.X.X.1389 > X.X.X.X.38685: Flags [.], ack 61, win 342, options [nop,nop,TS val 2699014619 ecr 2699014619], length 0
20:02:20.776590 IP X.X.X.X.1389 > X.X.X.X.38685: Flags [P.], seq 1:15, ack 61, win 342, options [nop,nop,TS val 2699014620 ecr 2699014619], length 14
20:02:20.776630 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [.], ack 15, win 342, options [nop,nop,TS val 2699014620 ecr 2699014620], length 0
20:02:20.787654 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [P.], seq 61:233, ack 15, win 342, options [nop,nop,TS val 2699014631 ecr 2699014620], length 172
20:02:20.788704 IP X.X.X.X.1389 > X.X.X.X.38685: Flags [P.], seq 15:1010, ack 233, win 350, options [nop,nop,TS val 2699014632 ecr 2699014631], length 995
20:02:20.827471 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [P.], seq 233:240, ack 1010, win 357, options [nop,nop,TS val 2699014671 ecr 2699014632], length 7
20:02:20.828574 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [F.], seq 240, ack 1010, win 357, options [nop,nop,TS val 2699014672 ecr 2699014632], length 0
20:02:20.829042 IP X.X.X.X.1389 > X.X.X.X.38685: Flags [F.], seq 1010, ack 241, win 350, options [nop,nop,TS val 2699014672 ecr 2699014671], length 0
20:02:20.829076 IP X.X.X.X.38685 > X.X.X.X.1389: Flags [.], ack 1011, win 357, options [nop,nop,TS val 2699014672 ecr 2699014672], length 0
20:02:20.833809 IP 127.0.0.1.29197 > 127.0.0.1.389: Flags [S], seq 2615831214, win 43690, options [mss 65495,sackOK,TS val 2699014677 ecr 0,nop,wscale 7], length 0
20:02:20.833857 IP 127.0.0.1.389 > 127.0.0.1.29197: Flags [R.], seq 0, ack 2615831215, win 0, length 0

When using the alternateURL, it's working well. The capture shows traffic on the first port (the one with the handshake failure), then the second :

20:12:52.426094 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [S], seq 1912007028, win 43690, options [mss 65495,sackOK,TS val 2699646269 ecr 0,nop,wscale 7], length 0
20:12:52.426144 IP X.X.X.X.1389 > X.X.X.X.38947: Flags [S.], seq 38313939, ack 1912007029, win 43690, options [mss 65495,sackOK,TS val 2699646269 ecr 2699646269,nop,wscale 7], length 0
20:12:52.426179 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [.], ack 1, win 342, options [nop,nop,TS val 2699646269 ecr 2699646269], length 0
20:12:52.433394 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [P.], seq 1:61, ack 1, win 342, options [nop,nop,TS val 2699646277 ecr 2699646269], length 60
20:12:52.433457 IP X.X.X.X.1389 > X.X.X.X.38947: Flags [.], ack 61, win 342, options [nop,nop,TS val 2699646277 ecr 2699646277], length 0
20:12:52.433743 IP X.X.X.X.1389 > X.X.X.X.38947: Flags [P.], seq 1:15, ack 61, win 342, options [nop,nop,TS val 2699646277 ecr 2699646277], length 14
20:12:52.433856 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [.], ack 15, win 342, options [nop,nop,TS val 2699646277 ecr 2699646277], length 0
20:12:52.682783 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [P.], seq 61:233, ack 15, win 342, options [nop,nop,TS val 2699646526 ecr 2699646277], length 172
20:12:52.683356 IP X.X.X.X.1389 > X.X.X.X.38947: Flags [P.], seq 15:1010, ack 233, win 350, options [nop,nop,TS val 2699646527 ecr 2699646526], length 995
20:12:52.683394 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [.], ack 1010, win 357, options [nop,nop,TS val 2699646527 ecr 2699646527], length 0
20:12:52.756065 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [P.], seq 233:240, ack 1010, win 357, options [nop,nop,TS val 2699646599 ecr 2699646527], length 7
20:12:52.756677 IP X.X.X.X.1389 > X.X.X.X.38947: Flags [F.], seq 1010, ack 240, win 350, options [nop,nop,TS val 2699646600 ecr 2699646599], length 0
20:12:52.757877 IP X.X.X.X.38947 > X.X.X.X.1389: Flags [F.], seq 240, ack 1011, win 357, options [nop,nop,TS val 2699646601 ecr 2699646600], length 0
20:12:52.757907 IP X.X.X.X.1389 > X.X.X.X.38947: Flags [.], ack 241, win 350, options [nop,nop,TS val 2699646601 ecr 2699646601], length 0
20:12:52.765130 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [S], seq 2345339635, win 43690, options [mss 65495,sackOK,TS val 2699646608 ecr 0,nop,wscale 7], length 0
20:12:52.765169 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [S.], seq 2395971082, ack 2345339636, win 43690, options [mss 65495,sackOK,TS val 2699646608 ecr 2699646608,nop,wscale 7], length 0
20:12:52.765278 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [.], ack 1, win 342, options [nop,nop,TS val 2699646609 ecr 2699646608], length 0
20:12:52.766062 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [P.], seq 1:61, ack 1, win 342, options [nop,nop,TS val 2699646609 ecr 2699646608], length 60
20:12:52.766121 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [.], ack 61, win 342, options [nop,nop,TS val 2699646609 ecr 2699646609], length 0
20:12:52.766492 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [P.], seq 1:15, ack 61, win 342, options [nop,nop,TS val 2699646610 ecr 2699646609], length 14
20:12:52.766577 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [.], ack 15, win 342, options [nop,nop,TS val 2699646610 ecr 2699646610], length 0
20:12:52.781002 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [P.], seq 61:233, ack 15, win 342, options [nop,nop,TS val 2699646624 ecr 2699646610], length 172
20:12:52.781815 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [P.], seq 15:1018, ack 233, win 350, options [nop,nop,TS val 2699646625 ecr 2699646624], length 1003
20:12:52.821367 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [.], ack 1018, win 357, options [nop,nop,TS val 2699646665 ecr 2699646625], length 0
20:12:52.948434 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [P.], seq 233:500, ack 1018, win 357, options [nop,nop,TS val 2699646792 ecr 2699646625], length 267
20:12:52.988266 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [.], ack 500, win 359, options [nop,nop,TS val 2699646832 ecr 2699646792], length 0
20:12:52.988316 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [P.], seq 500:506, ack 1018, win 357, options [nop,nop,TS val 2699646832 ecr 2699646832], length 6
20:12:52.988333 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [.], ack 506, win 359, options [nop,nop,TS val 2699646832 ecr 2699646832], length 0
20:12:53.024034 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [P.], seq 506:591, ack 1018, win 357, options [nop,nop,TS val 2699646867 ecr 2699646832], length 85
20:12:53.024100 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [.], ack 591, win 359, options [nop,nop,TS val 2699646867 ecr 2699646867], length 0
20:12:53.024535 IP X.X.X.X.2389 > X.X.X.X.18452: Flags [P.], seq 1018:1109, ack 591, win 359, options [nop,nop,TS val 2699646868 ecr 2699646867], length 91
20:12:53.024565 IP X.X.X.X.18452 > X.X.X.X.2389: Flags [.], ack 1109, win 357, options [nop,nop,TS val 2699646868 ecr 2699646868], length 0

From all this, I'm still thinking the same, I mean that Tomcat shouldn't try to connect on localhost:389 if it's not explicitely asked to (and this is the idea behind connectionURL + alternateURL), instead of that it should try to reconnect to the only server configured. One comment in the code says the same from my understanding :

            // reset it in case the connection times out.
            // the primary may come back.

If I'm correct, maybe the documentation should mention alternateURL is the preferred way for resiliency instead of relying on the built-in method.
Comment 6 Michael Osipov 2019-07-07 18:47:17 UTC
My proposal works if you use JNDI with InitialDirContext directly. I don't use the JNDIRealm, I use a custom ActiveDirectoryRealm which is perfectly coded for AD only. Our connetion URL contains either multiple hostnames or the domain name only wich is used for DNS SRV lookup.

One needs to envestigate why this happens with JNDIRealm.
Comment 7 Mark Thomas 2019-07-26 14:34:28 UTC
I opted to fix this by only using the alternateURL if one was specified.

My thinking regarding retrying the connectionURL was that some users might not want a retry and those that do can use the same value for alternateURL as connectionURL.

Fixed in:
- master for 9.0.23 onwards
- 8.5.x for 8.5.44 onwards
- 7.0.x for 7.0.97 onwards