Bug 57801 - Tomcat catalina.sh fails to start after machine is reboot because there is another process have the same PID as Tomcat before reboot
Summary: Tomcat catalina.sh fails to start after machine is reboot because there is a...
Status: RESOLVED FIXED
Alias: None
Product: Tomcat 6
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 6.0.43
Hardware: All Linux
: P2 enhancement (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-09 08:48 UTC by jiaoyk
Modified: 2016-03-11 19:10 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jiaoyk 2015-04-09 08:48:31 UTC
We have set CATALINA_PID in setenv.sh.

After the machine is reboot, The PID file is still there and tomcat fails to start. 

The error message is: 
"Existing PID file found during start.
Tomcat appears to still be running with PID 3387. Start aborted."

After checking, there is another process have the same PID: 3387.

After checking the code catalina.sh, the following logic has issue:


  if [ ! -z "$CATALINA_PID" ]; then
    if [ -f "$CATALINA_PID" ]; then
      if [ -s "$CATALINA_PID" ]; then
        echo "Existing PID file found during start."
        if [ -r "$CATALINA_PID" ]; then
          PID=`cat "$CATALINA_PID"`
          ps -p $PID >/dev/null 2>&1
          if [ $? -eq 0 ] ; then
            echo "Tomcat appears to still be running with PID $PID. Start aborted."
            exit 1
          else


Tomcat should not treat tomcat is still alive.
The script should work anyway.
Comment 1 Rainer Jung 2015-04-09 09:13:51 UTC
Either you should integrate your Tomcat stop (and probably start) into your system shutdown/startup (rc scripts or whatever methodology your system uses) or you rely on doing it by hand.

In the latter case, Tomcat does not get any info about the system shutdown and can not react on it. Trying to find out whether the found process after reboot actually is a Tomcat process or something else is not the task of the start script.Integrating it will be error prone and hard to maintain cross platform.

If you start by hand and get the cited error, you need to check the other process (like you did) and if it is something else and Tomcat is not running, purge the old PID file.

We could probably make the message

"Existing PID file found during start.
Tomcat appears to still be running with PID 3387. Start aborted."

a bit better

"Existing PID file found during start.
Tomcat appears to still be running with PID 3387. Start aborted.
If the process with PID 3387 is not a Tomcat process, remove the PID file NAME_OF_PID_FILE_HERE and try again."
Comment 2 Rainer Jung 2015-04-09 09:43:30 UTC
I added the message

"If this process is not a Tomcat process, remove the PID file and try again."

to the output. The name of the PID file is already being output earlier during the script run.

Added to trunk in r1672272, tc 8 in r1672273 (will be part of 8.0.22), tc 7 in r1672274 (will be part of 7.0.62) and proposed for TC 6.
Comment 3 Rainer Jung 2015-04-09 10:36:18 UTC
Also added "ps" output for the process with the PID in r1672284 (trunk), r1672285 (tc8) and r1672286 (tc7).
Comment 4 jiaoyk 2015-04-10 10:52:46 UTC
Thanks Rainer.

If the machine is shutdown by power off, the rc script may don't have chance to execute.
After use ps to check if the PID is alive, could you also extract the path of the process, and compare it with the home path of tomcat? If the path is the same, the process should be tomcat, otherwise, it's other process. Then, the script could rm the PID file and continue to start.

Sometimes, the tomcat process is just hung up. could you provide a force start option? Even the tomcat process is there, just kill it and start anyway.
Comment 5 Rainer Jung 2015-04-10 11:23:13 UTC
(In reply to jiaoyk from comment #4)
> Thanks Rainer.
> 
> If the machine is shutdown by power off, the rc script may don't have chance
> to execute.

OK, but that's a really exceptional case. Then you might have the same problem with lots of unix daemons. Either they ignore the PID file, or they don't start.

> After use ps to check if the PID is alive, could you also extract the path
> of the process, and compare it with the home path of tomcat? If the path is
> the same, the process should be tomcat, otherwise, it's other process. Then,
> the script could rm the PID file and continue to start.

I doubt, that this is possible in a platform independent but maintainable way. The script is used on lots of platforms, like various Linuxes, BSD, Solaris, Cygwin, OS-X, and probably AIX, HP-UX, etc. etc.

Some of these platforms do not provide the full process command using "ps" but they truncate it. IMHO there is no platform independent way to retrieve all process args, e.g. the -Dcatalina.base=... that the script sets. I don't plan to invest more into this, because it happens very rarely and the solution will be fragile. Automatic problem resolution needs to be robust, otherwise it triggers more problems than it solves.

If anyone likes to tackle this, patches will be welcome, but must be multi-platform.

> Sometimes, the tomcat process is just hung up. could you provide a force
> start option? Even the tomcat process is there, just kill it and start
> anyway.

We could support the existing"-force" for "start" as well and let -force ignore any PID file problems. An existing other process is only one such problem. There are more cases where the script currently aborts. Do you think all these cases should be ignored with -force? Please have a look at "abort" in bin/catalina.sh.
Comment 6 Christopher Schultz 2015-04-10 15:03:30 UTC
(In reply to Rainer Jung from comment #5)
> (In reply to jiaoyk from comment #4)
> > Thanks Rainer.
> > 
> > If the machine is shutdown by power off, the rc script may don't have chance
> > to execute.
> 
> OK, but that's a really exceptional case. Then you might have the same
> problem with lots of unix daemons. Either they ignore the PID file, or they
> don't start.

Or they

a) Put the PID file in ephemeral storage (ramdisk)
b) Put the PID file in /tmp, which should be emptied on boot
c) Otherwise arrange to have their PID files removed on boot
Comment 7 jiaoyk 2015-04-13 10:39:11 UTC
Thanks Rainer and Christopher

It's not worth to set up a ramdisk to store the PID file.
If the PID file is stored in /tmp, the PID file may be rm-ed by someone.

The script should work in worst case even it's rare. The script should not depend on the last state, it should be stateless.

The key point is catalina.sh treat the wrong process to be tomcat process.

Do we really need PID file to save the PID?

If we have issues with cross platform to verify the home path, maybe we could first get the home path of the tomcat and then use this path to grep the right PID in the result of ps(assuming the key commands in the following functions exist in multiple unix/linux like OS).

Maybe we could get the PID of tomcat by some function like the following?


function get_tomcat_pid()
{
  declare NORMALIZED=$(echo $CATALINA_HOME | tr -s / /)
  declare NORMALIZED_PATH=$(readlink -f $CATALINA_HOME)
  if [ "$NORMALIZED" != "$NORMALIZED_PATH" ]; then
    NORMALIZED=$NORMALIZED_PATH
  fi
  if [ -z "$NORMALIZED" -a "${NORMALIZED+x}" = "x" ] ; then
                
                return 1
  fi
  
  declare pid=`ps -ef | grep  $NORMALIZED | grep -v grep | awk '{print $2}'`
  if [ -z "$pid" -a "${pid+x}" = "x" ] ; then
               
                return 1
  fi
  echo $pid
}


Thanks for supporting the force start.

The "abort" cases  such as can't remove or write the PID file should be abort, it looks that it does't have permission. Maybe user use the wrong user to run the process.

The "abort" cases such as "PID file found but no matching process was found. Stop aborted." , "$CATALINA_PID was set but the specified file does not exist."  should be a warning.
Comment 8 Konstantin Kolinko 2015-05-08 14:17:23 UTC
(In reply to Rainer Jung from comment #3)
> Also added "ps" output for the process with the PID in r1672284 (trunk),
> r1672285 (tc8) and r1672286 (tc7).

Backported to Tomcat 6 in r1678326 and will be in 6.0.44 onwards.
Comment 9 Mark Thomas 2016-03-11 19:10:50 UTC
This issue is as fixed as it is going to get.

Using /tmp is the way to. The OS will set appropriate permissions so only root and the user Tomcat is running as can delete the file.