Bug 41828 - mod_jk file locking (flock) causes kernel panic
Summary: mod_jk file locking (flock) causes kernel panic
Status: RESOLVED FIXED
Alias: None
Product: Tomcat Connectors
Classification: Unclassified
Component: Common (show other bugs)
Version: unspecified
Hardware: Other Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-13 00:53 UTC by Reiko Ohtsuka
Modified: 2008-10-05 03:10 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Reiko Ohtsuka 2007-03-13 00:53:35 UTC
We have encountered several kernel panics in our benchmark tests using
Apache bench (ab).

Test environments are:
  RedHat Enterprise Linux 4.0 update 4
  Apache 2.0.52
  mod_jk 1.2.20
  Tomcat 5.5.17

It turned out to be a kernel bug (see
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=230976),
and we also found that mod_jk file locking problem in jk_shm.c is a main
cause of the kernel panic.

First, Apache mod_jk parent process opens shared memory file 
"jk-runtime-status" and its lock file "jk-runtime-status.lock".
Next, forked child processes inherit the parent data including file
descriptors (fd) of these files. 
The child processes don't re-open these files.
It then results that all mod_jk processes are locking/unlocking the
same file with same fd.

The problem is that flock system call with LOCK_EX (exclusive lock) 
from a process (A) has no effect if another process (B) has already
locked the same file with same fd.  The flock does not block process A
and returns without an error.  This is how flock works.  
And if process (A) calls flock with LOCK_UN (unlock) to the file, 
the file is unlocked even if process (B) is locking the file, which is
a main cause of the kernel panic.

The child process should re-open the lock file and use a new fd for
flock system call, in order to lock a file strictly.

I have verified that the following code fragments which re-opens a lock 
file for child processes has some effect and does not cause a kernel panic.

jk_shm.c:do_shm_open

    if (jk_shmem.hdr) {
        /* Probably a call from vhost */
        if (JK_IS_DEBUG_LEVEL(l))
            jk_log(l, JK_LOG_DEBUG,
                    "Shared memory is already open");
/* start of temporary code */
        if( attached ) {
          if ((rc = do_shm_open_lock(fname, attached, l))) {
              munmap((void *)jk_shmem.hdr, jk_shmem.size);
              close(jk_shmem.fd);
              jk_shmem.hdr = NULL;
              jk_shmem.fd  = -1;
              JK_TRACE_EXIT(l);
              return rc;
          }
        }
/* end of temporary code */
        return 0;
    }

In such situation, we are afraid that the shared memory is not safe
among multi-processes or multi-threads.

Thank you for your help.
Comment 1 Mladen Turk 2007-03-13 03:43:04 UTC
Fixed.
Can you check the current SVN HEAD?
Comment 2 Rainer Jung 2007-03-13 17:10:55 UTC
I prepared a tarball for you under

http://people.apache.org/~rjung/mod_jk-dev/

which includes all the usual distribution files (configure etc.).

Could you please test?
Comment 3 Reiko Ohtsuka 2007-03-13 22:48:25 UTC
Thank you for your very quick action!

I tested the tar ball and found a problem.
[Wed Mar 14 13:01:44 2007] [27149:50880] [error] jk_child_init::mod_jk.c (2593):
 Attaching shm:/etc/httpd/logs/jk-runtime-status errno=13
[Wed Mar 14 13:01:44 2007] [27151:50880] [error] jk_child_init::mod_jk.c (2593):
 Attaching shm:/etc/httpd/logs/jk-runtime-status errno=13

In RedHat or CentOS, /etc/httpd/logs is symbolic linked to /var/log/httpd
which mode is,

drwx------  2 root root 4096  3月 14 13:01 /var/log/httpd

The parent process is running as root, but child processes are running as
apache who cannot open file under /var/log/httpd directory. This is why
errno=13(EACCES) occurs.

I modified the owner of /var/log/httpd to apache and the problem was
resolved.
Comment 4 Mladen Turk 2007-03-13 23:09:07 UTC
Right,
But this is no good. It decreases the security.
Let me see if I can came up with something smarter.

Regards.
Comment 5 Mladen Turk 2007-03-16 03:41:08 UTC
Fixed in the SVN.

This is obviously kernel bug, so if it happens that you have such a kernel
you can compile the mod_jk by adding -DJK_SHM_LOCK_REOPEN to CFLAGS.

This will cause the lock file to be reopened inside each child instead
inherited. The bd thing is that it creates lock file with -rw-rw-rw permission,
and that might rise security concerns.

I would suggest that anyone affected patch the kernel.
Comment 6 Reiko Ohtsuka 2007-03-17 03:48:37 UTC
Thank you for your lots of works.

My concern is that the shared memory is not locked exclusively among all the
processes, rather than kernel panic.

If the shared memory is not locked properly, other processes may overwrite
the shared data when someone is updating the shared data.  Is this OK?

By the way, the kernel panic can also be avoided by using fcntl instead of
flock.  It's flock bug.
This is done by setting HAVE_FLOCK to 0 in portable.h before doing make, 
though it may not be recommended.

Thanks.
Comment 7 Mladen Turk 2007-03-17 04:20:41 UTC
Hmm,

It might sense to use the fcntl by default if present.
Further more flock doesn't lock on NFS volumes, so it makes sense to try
using fcntl if present.