|Summary:||Tomcat cluster - "Unable to receive message through TCP Channel"|
|Product:||Tomcat 5||Reporter:||Anabel <aleonben>|
|Component:||Catalina:Cluster||Assignee:||Tomcat Developers Mailing List <dev>|
Description Anabel 2005-04-27 15:25:28 UTC
We have a cluster with two Tomcat servers. When we restart one of the nodes, without restarting the other one, it seems to be a problem in the communication between them. This is the log trace in the node that restarts, when it starts: [main] INFO org.apache.catalina.cluster.session.DeltaManager - Starting clustering manager...:/TEST [main] WARN org.apache.catalina.cluster.session.DeltaManager - Manager [/TEST], requesting session state from org.apache.catalina.cluster.mcast.McastMember [tcp://XXX.XXX.XXX.XXX:4001,XXX.XXX.XXX.XXXX,4001, alive=14436991]. This operation will timeout if no session state has been received within 60 seconds [main] ERROR org.apache.catalina.cluster.session.DeltaManager - Manager [/TEST], No session state received, timing out. org.apache.jk.common.ChannelSocket init And the trace log in the node that remains alive: org.apache.catalina.cluster.tcp.SimpleTcpCluster memberDisappeared INFO: Received member disappeared:org.apache.catalina.cluster.mcast.McastMember [tcp://YYY.YYY.YYY.YYY:4001,YYY.YYY.YYY.YYY,4001, alive=6147693] org.apache.catalina.cluster.tcp.SimpleTcpCluster memberAdded INFO: Replication member added:org.apache.catalina.cluster.mcast.McastMember [tcp://YYY.YYY.YYY.YYY:4001,YYY.YYY.YYY.YYY,4001, alive=2] [org.apache.catalina.cluster.tcp.TcpReplicationThread] ERROR org.apache.catalina.cluster.session.DeltaManager - Unable to receive message through TCP channel java.lang.NullPointerException at java.io.ObjectOutputStream$BlockDataOutputStream.getUTFLength (ObjectOutputStream.java:1898) at java.io.ObjectOutputStream$BlockDataOutputStream.writeUTF (ObjectOutputStream.java:1769) at java.io.ObjectOutputStream.writeUTF(ObjectOutputStream.java:787) at org.apache.catalina.cluster.session.SerializablePrincipal.writePrincipal (SerializablePrincipal.java:180) at org.apache.catalina.cluster.session.DeltaSession.writeObject (DeltaSession.java:1457) at org.apache.catalina.cluster.session.DeltaSession.writeObjectData (DeltaSession.java:930) at org.apache.catalina.cluster.session.DeltaManager.doUnload (DeltaManager.java:539) at org.apache.catalina.cluster.session.DeltaManager.messageReceived (DeltaManager.java:854) at org.apache.catalina.cluster.session.DeltaManager.messageDataReceived (DeltaManager.java:762) at org.apache.catalina.cluster.tcp.SimpleTcpCluster.messageDataReceived (SimpleTcpCluster.java:576) at org.apache.catalina.cluster.io.ObjectReader.execute (ObjectReader.java:70) at org.apache.catalina.cluster.tcp.TcpReplicationThread.drainChannel (TcpReplicationThread.java:129) at org.apache.catalina.cluster.tcp.TcpReplicationThread.run (TcpReplicationThread.java:67) I saw another bug similar to this one: 32280, but it finishes without a clear solution. Thanks in advance.
Comment 1 Filip Hanik 2005-04-27 17:53:53 UTC
The strack trace indicates that you have a principal (you are logged in) but the login name is null. Could you give us a small test case if you can create one and reproduce the error?
Comment 2 Anabel 2005-04-28 08:49:56 UTC
(In reply to comment #1) > The strack trace indicates that you have a principal (you are logged in) but > the login name is null. Could you give us a small test case if you can create > one and reproduce the error? Hi Filip. I have just done a test. I stopped one of the nodes in the cluster and started it again... as there wasn't any active session in the moment, no problem reported when node starts. The two nodes found each other without any problem. Then, I have logged in with an user. It seems to be no problem with the logon, and I can correctly work with the application. When I stopped again one of the nodes, I can continue working whith the application because the other node takes the control. But, when I started again the node that was down, the situation that I have explained in my first post, it is repeated. In that moment, the solution to communicate correctly the two nodes again is to stop and start both of them. This is a big problem, because I don't have a real cluster... I only have load balancing and failover for the first time, because if one node fails, I can't do the cluster again... it is only possible if I restarts the two nodes!!!! Best regards.
Comment 3 Anabel 2005-05-04 10:56:28 UTC
(In reply to comment #1) > The strack trace indicates that you have a principal (you are logged in) but > the login name is null. Could you give us a small test case if you can create > one and reproduce the error? Hi Filip... Have you any information about the bug reported? Thanks in advance, Anabel.