We have encountered several kernel panics in our benchmark tests using Apache bench (ab). Test environments are: RedHat Enterprise Linux 4.0 update 4 Apache 2.0.52 mod_jk 1.2.20 Tomcat 5.5.17 It turned out to be a kernel bug (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=230976), and we also found that mod_jk file locking problem in jk_shm.c is a main cause of the kernel panic. First, Apache mod_jk parent process opens shared memory file "jk-runtime-status" and its lock file "jk-runtime-status.lock". Next, forked child processes inherit the parent data including file descriptors (fd) of these files. The child processes don't re-open these files. It then results that all mod_jk processes are locking/unlocking the same file with same fd. The problem is that flock system call with LOCK_EX (exclusive lock) from a process (A) has no effect if another process (B) has already locked the same file with same fd. The flock does not block process A and returns without an error. This is how flock works. And if process (A) calls flock with LOCK_UN (unlock) to the file, the file is unlocked even if process (B) is locking the file, which is a main cause of the kernel panic. The child process should re-open the lock file and use a new fd for flock system call, in order to lock a file strictly. I have verified that the following code fragments which re-opens a lock file for child processes has some effect and does not cause a kernel panic. jk_shm.c:do_shm_open if (jk_shmem.hdr) { /* Probably a call from vhost */ if (JK_IS_DEBUG_LEVEL(l)) jk_log(l, JK_LOG_DEBUG, "Shared memory is already open"); /* start of temporary code */ if( attached ) { if ((rc = do_shm_open_lock(fname, attached, l))) { munmap((void *)jk_shmem.hdr, jk_shmem.size); close(jk_shmem.fd); jk_shmem.hdr = NULL; jk_shmem.fd = -1; JK_TRACE_EXIT(l); return rc; } } /* end of temporary code */ return 0; } In such situation, we are afraid that the shared memory is not safe among multi-processes or multi-threads. Thank you for your help.
Fixed. Can you check the current SVN HEAD?
I prepared a tarball for you under http://people.apache.org/~rjung/mod_jk-dev/ which includes all the usual distribution files (configure etc.). Could you please test?
Thank you for your very quick action! I tested the tar ball and found a problem. [Wed Mar 14 13:01:44 2007] [27149:50880] [error] jk_child_init::mod_jk.c (2593): Attaching shm:/etc/httpd/logs/jk-runtime-status errno=13 [Wed Mar 14 13:01:44 2007] [27151:50880] [error] jk_child_init::mod_jk.c (2593): Attaching shm:/etc/httpd/logs/jk-runtime-status errno=13 In RedHat or CentOS, /etc/httpd/logs is symbolic linked to /var/log/httpd which mode is, drwx------ 2 root root 4096 3月 14 13:01 /var/log/httpd The parent process is running as root, but child processes are running as apache who cannot open file under /var/log/httpd directory. This is why errno=13(EACCES) occurs. I modified the owner of /var/log/httpd to apache and the problem was resolved.
Right, But this is no good. It decreases the security. Let me see if I can came up with something smarter. Regards.
Fixed in the SVN. This is obviously kernel bug, so if it happens that you have such a kernel you can compile the mod_jk by adding -DJK_SHM_LOCK_REOPEN to CFLAGS. This will cause the lock file to be reopened inside each child instead inherited. The bd thing is that it creates lock file with -rw-rw-rw permission, and that might rise security concerns. I would suggest that anyone affected patch the kernel.
Thank you for your lots of works. My concern is that the shared memory is not locked exclusively among all the processes, rather than kernel panic. If the shared memory is not locked properly, other processes may overwrite the shared data when someone is updating the shared data. Is this OK? By the way, the kernel panic can also be avoided by using fcntl instead of flock. It's flock bug. This is done by setting HAVE_FLOCK to 0 in portable.h before doing make, though it may not be recommended. Thanks.
Hmm, It might sense to use the fcntl by default if present. Further more flock doesn't lock on NFS volumes, so it makes sense to try using fcntl if present.