Bug 46935 - Problem with and Patch for Using the Correct Multicast Address in Tomcat 5.5.x
Summary: Problem with and Patch for Using the Correct Multicast Address in Tomcat 5.5.x
Status: RESOLVED DUPLICATE of bug 43641
Alias: None
Product: Tomcat 5
Classification: Unclassified
Component: Catalina:Cluster (show other bugs)
Version: 5.5.27
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-30 04:48 UTC by Oliver Hillmann
Modified: 2009-03-30 11:54 UTC (History)
0 users



Attachments
Patch for binding the multicast socket to the correct address in 5.5.27 (1010 bytes, patch)
2009-03-30 04:48 UTC, Oliver Hillmann
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Hillmann 2009-03-30 04:48:34 UTC
Created attachment 23425 [details]
Patch for binding the multicast socket to the correct address in 5.5.27

Hi,

when trying session replication using Apache Tomcat 5.5.27 and below, I came across some problems, not unlike other people that tried to use the cluster multicast membership service and TCP-based replication, as suggested in the Clustering/Session Replication HOWTO (http://tomcat.apache.org/tomcat-5.5-doc/cluster-howto.html). I found numerous reports from people having problems with clustering in 5.5, and although the usual response to their inquiry for help was telling them to check their configuration, I think there is a bug concerning clustering in 5.5 code which has survived up to 5.5.27.

Although the 5.5.x Tomcat series is now somewhat obsolete, and this very problem has been successfully addresses in 6.0.x (but obviously never been backported to 5.5), I wanted to share what I found, because it might spare some headache for users and might reconsile others with Tomcat session replication altogether. :)

I am aware of the networking prerequisites for TCP Replication, most notably caused by the Multicast Membership Service:

- multicast support both in the operating system's networking stack and the network infrastructure altogether (like mentioned in The Clustering FAQ, see http://wiki.apache.org/tomcat/FAQ/Clustering#Q9 , among many other web locations)

- occassional problems with multicast routing on GNU/Linux (the OS of choice for said setups)

- specific problems with GNU/Linux, Java, multicast and IPv6 support (as discussed partially in http://java.sun.com/j2se/1.5.0/docs/guide/net/ipv6_guide/index.html, although I did not at all rely on IPv6 in my setup)

I tried several configurations in different network environments, always making double-sure that multicast works (both using Java software and non-Java software), and although the "Simple Cluster Configuration" from the Replication HOWTO seemed to work for a while, more sophisticated setups regularly failed. However, session replication in Tomcat 6.0.18 also worked in similar setups that made 5.5.27 break. The usual symptoms where:

- multicast membership packages sent through the network (and also reaching the network interfaces, although apparently not being received from the application)

- no replicated sessions at all

- frequent exceptions in catalina.out like:
        INFO Cluster-MembershipRecovery org.apache.catalina.cluster.mcast.McastService - Membership recovery was successful.
        WARN Cluster-MembershipReceiver org.apache.catalina.cluster.mcast.McastService - Error receiving mcast package (errorCounter=10). Try Recovery!
        java.net.SocketTimeoutException: Receive timed out
                at java.net.PlainDatagramSocketImpl.receive0(Native Method)
                at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
                at java.net.DatagramSocket.receive(DatagramSocket.java:712)
                at org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:238)
                at org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run
        (McastServiceImpl.java:330)
        INFO Cluster-MembershipRecovery org.apache.catalina.cluster.mcast.McastService - Cluster membership, running recovery thread, multicasting is not functional.
        WARN Cluster-MembershipSender org.apache.catalina.cluster.mcast.McastService - Sender Thread ends with errorCounter=0.

I finally compared the code snippets in Tomcat 6.0.18 and 5.5.27 that take care of the membership service respectively and found this:

in org/apache/catalina/cluster/mcast/McastServiceImpl.java:167ff:
   protected void setupSocket() throws IOException {
        if (mcastBindAddress != null) socket = new MulticastSocket(new java.net.
            InetSocketAddress(mcastBindAddress, port));
        else socket = new MulticastSocket(port);
            socket.setLoopbackMode(false); //hint that we don't need loop back messages

and in org/apache/catalina/tribes/membership/McastServiceImpl.java:185ff:
    protected void setupSocket() throws IOException {
        if (mcastBindAddress != null) {
            try {
                log.info("Attempting to bind the multicast socket to "+address+":"+port);
                socket = new MulticastSocket(new InetSocketAddress(address,port));
            } catch (BindException e) {
                /*
                 * On some plattforms (e.g. Linux) it is not possible to bind
                 * to the multicast address. In this case only bind to the
                 * port.
                 */
                log.info("Binding to multicast address, failed. Binding to port only.");
                socket = new MulticastSocket(port);
            }
        } else {
            socket = new MulticastSocket(port);
        }

So, provided a mcastBindAddress property has been specified, 6.0.18 uses the (multicast) address to create the InetSocketAddress to bind to, while 5.5.27 uses the mcastBindAddress - which causes the socket not to see any multicast packages at all, since being bound to the wrong address, hence the exceptions about receives timing out.

Therefore, I suggest the following patch to alter the Tomcat5 multicast binding behaviour to be similar to Tomcat6:


diff -u -r apache-tomcat-5.5.27-src.orig/container/modules/cluster/src/share/org
/apache/catalina/cluster/mcast/McastServiceImpl.java apache-tomcat-5.5.27-src/co
ntainer/modules/cluster/src/share/org/apache/catalina/cluster/mcast/McastService
Impl.java
--- apache-tomcat-5.5.27-src.orig/container/modules/cluster/src/share/org/apache
/catalina/cluster/mcast/McastServiceImpl.java   2008-08-29 05:13:58.000000000 +0
200
+++ apache-tomcat-5.5.27-src/container/modules/cluster/src/share/org/apache/cata
lina/cluster/mcast/McastServiceImpl.java        2008-11-27 01:29:04.905529298 +0
100
@@ -166,7 +166,7 @@

     protected void setupSocket() throws IOException {
         if (mcastBindAddress != null) socket = new MulticastSocket(new java.net
.
-            InetSocketAddress(mcastBindAddress, port));
+            InetSocketAddress(address, port));
         else socket = new MulticastSocket(port);
            socket.setLoopbackMode(false); //hint that we don't need loop back m
essages
         if (mcastBindAddress != null) {


With the above patch, Tomcat 5.5.27 worked for me as expected - and documented.

A comment in Tomcat6 mentions that binding to a multicast address on GNU/Linux might fail, but I did not see any of the log messages in Tomcat6 about this kind of failure, and neither did I find similar Exceptions in the logs for Tomcat5. Either way, the above issue prevails, it just needs to additionally be addressed in a way similar to Tomcat6, that is, catching the Exception and using the MulticastSocket constructor with port as sole argument.

I would be glad about any kind of feedback to this, I hope I didn't miss any considerable information on this whole topic that would justify a loud RTFM in my face, and I hope that this could be my humble part of improving the already excellent Apache Tomcat that we all so love. :)

For the record:

This has been tested on SuSE OpenLinux 10.1 32bit and 10.2 64bit with JDK 1.6.0 (1.6.0_07-b06), both 32bit and 64bit versions.

Best regards,

Oliver
Comment 1 Rainer Jung 2009-03-30 05:29:03 UTC
Looks like I proposed the patch for backport 2 weeks ago :)

Please have a look at

http://svn.apache.org/viewvc?view=rev&revision=755351
http://svn.apache.org/viewvc?view=rev&revision=759692
http://svn.apache.org/viewvc?view=rev&revision=755310
http://svn.apache.org/viewvc?view=rev&revision=755312

The proposal needs one more vote for getting committed.

If you think that this issue is not a duplicate of BZ43641, please comment.

Grüße an die Neofonie ;)

Rainer
Comment 2 Rainer Jung 2009-03-30 05:29:24 UTC

*** This bug has been marked as a duplicate of bug 43641 ***
Comment 3 Oliver Hillmann 2009-03-30 11:54:57 UTC
Too bad I have missed both the original Bug #43641, and your recent backport proposal. And although it is a pity I wasn't in first, I like seeing that this is (hopefully) getting into Tomcat 5.5.. ;)

I don't want to get too much into nitpicking mode, but is this technically a duplicate? I see it has been addressed in 6.0, but this isn't really redundant, as long as it hasn't found its way into 5.5 - although I see it is basically the same issue.

Looking forward to see the original backport included in 5.5.28.

Regards - und viele Grüße zurück! :)

Oliver