Many people run Tomcat from systemd, and will be using 'journalctl' to manage Tomcat's stdout. journalctl has a not-easily-fixed bug that causes the last few stdout/stderr lines to be 'lost'. This is a problem, as the last few lines are usually the most important. Here I suggest a small workaround in bin/catalina.sh that I feel would be safe and worthwhile. A simple example: let's deploy a Tomcat instance running as a dedicated user: # useradd -s /bin/bash -d /tmp testtomcat # su - testtomcat curl -sLO http://www-us.apache.org/dist/tomcat/tomcat-8/v8.5.11/bin/apache-tomcat-8.5.11.tar.gz tar xf apache-tomcat-8.5.11.tar.gz cd apache-tomcat-8.5.11 If we run 'bin/catalina.sh run' it starts normally. testtomcat@jturner-desktop:~/apache-tomcat-8.5.11$ ./bin/catalina.sh run Using CATALINA_BASE: /tmp/apache-tomcat-8.5.11 Using CATALINA_HOME: /tmp/apache-tomcat-8.5.11 Using CATALINA_TMPDIR: /tmp/apache-tomcat-8.5.11/temp Using JRE_HOME: /usr/lib/jvm/java-8-oracle Using CLASSPATH: /tmp/apache-tomcat-8.5.11/bin/bootstrap.jar:/tmp/apache-tomcat-8.5.11/bin/tomcat-juli.jar ... Let's introduce an error. Say we got JAVA_HOME wrong. We get an error message as expected: testtomcat@jturner-desktop:~/apache-tomcat-8.5.11$ JAVA_HOME=/incorrect/path ./bin/catalina.sh run Using CATALINA_BASE: /tmp/apache-tomcat-8.5.11 Using CATALINA_HOME: /tmp/apache-tomcat-8.5.11 Using CATALINA_TMPDIR: /tmp/apache-tomcat-8.5.11/temp Using JRE_HOME: /incorrect/path Using CLASSPATH: /tmp/apache-tomcat-8.5.11/bin/bootstrap.jar:/tmp/apache-tomcat-8.5.11/bin/tomcat-juli.jar ./bin/catalina.sh: 1: exec: /incorrect/path/bin/java: not found testtomcat@jturner-desktop:~/apache-tomcat-8.5.11$ All find and dandy. Now let's run Tomcat via systemd. Create a minimal service file as root and load it: # cat > /etc/systemd/system/testtomcat.service <<EOF [Service] Type=simple User=testtomcat ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run EOF # systemctl daemon-reload (an aside: see https://jdebp.eu/FGA/systemd-house-of-horror/tomcat.html for why we call catalina.sh run from systemd instead of startup.sh) Start it, and see the status: # systemctl start testtomcat # systemctl status testtomcat -n5 ● testtomcat.service Loaded: loaded (/etc/systemd/system/testtomcat.service; static; vendor preset: enabled) Active: active (running) since Tue 2017-01-24 12:43:18 AEDT; 47s ago Main PID: 30648 (java) Tasks: 33 Memory: 109.2M CPU: 3.979s CGroup: /system.slice/testtomcat.service └─30648 /usr/bin/java -Djava.util.logging.config.file=/tmp/apache-tomcat-8.5.11/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2 Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.173 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory /tmp/apache-to Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.182 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory /tmp/apach Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.184 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler [http-nio-8180] Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.187 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler [ajp-nio-8109] Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.188 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 649 ms Happiness. All working normally. Now let's introduce an error. Any error that prevents Tomcat startup will do (e.g. incorrect permissions), but let's go with our invalid JAVA_HOME example: cat > /etc/systemd/system/testtomcat.service <<EOF [Service] Type=simple User=testtomcat Environment=JAVA_HOME=/incorrect/path ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run EOF # systemctl daemon-reload # systemctl restart testtomcat # systemctl status testtomcat -n5 ● testtomcat.service Loaded: loaded (/etc/systemd/system/testtomcat.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2017-01-24 12:44:40 AEDT; 6s ago Process: 30846 ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run (code=exited, status=127) Main PID: 30846 (code=exited, status=127) Jan 24 12:44:40 jturner-desktop systemd[1]: Started testtomcat.service. Jan 24 12:44:40 jturner-desktop systemd[1]: testtomcat.service: Main process exited, code=exited, status=127/n/a Jan 24 12:44:40 jturner-desktop systemd[1]: testtomcat.service: Unit entered failed state. Jan 24 12:44:40 jturner-desktop systemd[1]: testtomcat.service: Failed with result 'exit-code'. Unhappiness, as expected, but note one important detail: the actual error "./bin/catalina.sh: 1: exec: /incorrect/path/bin/java: not found" is missing in the logs. Again, the above is just an example. The last log message from ANY failure, generated by scripts or Tomcat itself, will be lost. Here's what's going on. The logs are not actually 'lost', but just not "tagged" as belonging to the testtomcat unit (i.e. you can see them if you 'journalctl -f' before start). This is due to a systemd bug marked cant-fix: https://github.com/systemd/systemd/issues/1347 which I was referred to when questioning this behaviour on stackoverflow: https://superuser.com/questions/1169906/systemd-bug-stdout-not-tagged-as-belonging-to-unit-when-non-root-user-specifie The systemd bug isn't going to be fixed soon, as it relies on kernel additions. Here's what Tomcat can do. A simple workaround is to 'sleep' for a short period after emitting our last output, to give systemd a chance to tag it. Put this near the top of bin/catalina.sh: # If running in systemd, give it a chance to tag the last output. https://github.com/systemd/systemd/issues/1347 trap 'sleep 0.1' INT TERM EXIT and now when we 'systemctl status testtomcat' (or 'journalctl -u testtomcat') we see the final error message: # systemctl restart testtomcat # systemctl status testtomcat ● testtomcat.service Loaded: loaded (/etc/systemd/system/testtomcat.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2017-01-24 12:58:39 AEDT; 6s ago Process: 32326 ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run (code=exited, status=127) Main PID: 32326 (code=exited, status=127) Jan 24 12:58:39 jturner-desktop systemd[1]: Started testtomcat.service. Jan 24 12:58:39 jturner-desktop catalina.sh[32326]: /tmp/apache-tomcat-8.5.11/bin/catalina.sh: 1: exec: /incorrect/path/bin/java: not found Jan 24 12:58:39 jturner-desktop systemd[1]: testtomcat.service: Main process exited, code=exited, status=127/n/a Jan 24 12:58:39 jturner-desktop systemd[1]: testtomcat.service: Unit entered failed state. Jan 24 12:58:39 jturner-desktop systemd[1]: testtomcat.service: Failed with result 'exit-code'. I'm not a POSIX guru but AFAIK 'trap' and '/bin/sleep' are pretty vanilla. The choice of a 0.1s delay is not very scientific. It fixes the problem on my system.
FYI I can't add an attachment - keep getting 'Permission denied' from attachment.cgi. Here's the suggested patch: --- bin/catalina.sh.orig 2017-01-24 12:54:39.181880382 +1100 +++ bin/catalina.sh 2017-01-24 12:55:56.862144647 +1100 @@ -100,6 +100,9 @@ # case the default is "true" # ----------------------------------------------------------------------------- +# If running in systemd, give it a chance to tag the last output. https://github.com/systemd/systemd/issues/1347 +trap 'sleep 0.1' INT TERM EXIT + # OS specific support. $var _must_ be set to either true or false. cygwin=false darwin=false
I'd prefer to see this enabled with an environment variable (that should also set the length of the sleep). That way the default behaviour doesn't change, those that need to fix it have an option to do so and the delay can be set to an appropriate value for each system.
> That way the default behaviour doesn't change, those that need to fix it have an option to do so and the delay can be set to an appropriate value for each system. Okay, how about: # If started with systemd, add 'Environment=CATALINA_EXIT_DELAY=0.1' to your service file to prevent systemd losing the last few log messages. See https://github.com/systemd/systemd/issues/1347 if [ -n "${CATALINA_EXIT_DELAY:-}" ]; then trap 'sleep ${CATALINA_EXIT_DELAY}' INT EXIT TERM fi
Thanks for the suggestion. The new environment variable will be included in: - trunk for 9.0.0.M18 onwards - 8.5.x for 8.5.12 onwards - 8.0.x for 8.0.42 onwards - 7.0.x for 7.0.76 onwards
Re: r1782805 +if [ -n "${CATALINA_EXIT_DELAY:-}" ]; then + trap 'sleep ${CATALINA_EXIT_DELAY}' INT EXIT TERM +fi 1. I suspect that "trap" won't see the value of CATALINA_EXIT_DELAY variable. I think that you either need double quotes there (so that variable expansion occurs), or add a line to export the variable, e.g. export CATALINA_EXIT_DELAY 2. I think that whatever sets the CATALINA_EXIT_DELAY variable, can also execute a "trap" command. E.g. you can do it in bin/setenv.sh In the systemd example, I suspect that you can write several commands (connected with '&&') in the ExecStart line. The original issue concerns all programs, started with systemd, so you have to patch all other services that you are starting, not only Tomcat? Why a Tomcat-specific patch? Overall, an interesting feature, but - if someone configures a significant delay, it will cause concerns - it is a bit of a hack 3. +# CATALINA_EXIT_DELAY +# (Optional) If started with systemd, add +# 'Environment=CATALINA_EXIT_DELAY=0.1' to your service file +# to prevent systemd losing the last few log messages. +# See https://github.com/systemd/systemd/issues/1347 1) POSIX says that sleep expects an integer number. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sleep.html GNU implementation of sleep (in Linux) allows a floating point number and allows suffixes (s m h d). 2) Maybe reference this bug as well. At least it this bug has an example of how Tomcat can be run with systemd.
Konstantin, thanks for reviewing. You caused me to test and think it over some more. This patch is indeed useless, and should be removed (sorry Mark!). The reasons are: - This patch doesn't help for logs generated by Java/Tomcat/webapps (I thought it would, but never actually tested). That's because 'catalina.sh run' calls 'exec', which replaces the entire bash process, and so the trap is never called. Besides which, it seems that the worry of systemd losing Java output is theoretical, at least on my system. In practice JVM shutdown is sufficiently slow that all output, even from shutdown hooks, is actually caught by systemd [1]. - Making the delay opt-in means most systemd users won't benefit from it automatically. If something must be explicitly done, why not, as Konstantin points out, apply the workaround directly in the systemd service file: ExecStart=/bin/sh -c '/tmp/apache-tomcat-8.5.11/bin/catalina.sh run; sleep 0.1' Regards, Jeff [1] To verify that the last Java stdout is actually caught by systemd: cd /tmp cat > PrintExit.java <<EOF public class PrintExit { public static void main(String[] args) throws Throwable { Runtime.getRuntime().addShutdownHook(new Thread() { public void run() { System.out.println("This is the very last thing printed before shutdown"); } }); java.util.concurrent.TimeUnit.MINUTES.sleep(1000); System.out.println("Application Terminating ..."); } } EOF javac PrintExit.java cat > /etc/systemd/system/printexit.service <<EOF [Service] Type=simple User=root WorkingDirectory=/tmp ExecStart=/usr/bin/java PrintExit EOF systemctl daemon-reload journalctl -u printexit -f & systemctl start printexit # Feb 15 14:32:27 jturner-desktop systemd[1]: Started printexit.service. systemctl stop printexit # Feb 15 14:32:30 jturner-desktop systemd[1]: Stopping printexit.service... # Feb 15 14:32:30 jturner-desktop java[30981]: This is the very last thing printed before shutdown # Feb 15 14:32:30 jturner-desktop systemd[1]: printexit.service: Main process exited, code=exited, status=143/n/a # Feb 15 14:32:30 jturner-desktop systemd[1]: Stopped printexit.service. # Feb 15 14:32:30 jturner-desktop systemd[1]: printexit.service: Unit entered failed state. # Feb 15 14:32:30 jturner-desktop systemd[1]: printexit.service: Failed with result 'exit-code'. (In reply to Konstantin Kolinko from comment #5) > Re: r1782805 >
(In reply to Jeff Turner from comment #6) > This patch is indeed useless, and should be removed (sorry Mark!). No problem. I'll revert those changes and then close this as WONTFIX.
Changes reverted. There were never included in a release.