Bug 60632 - Last logs lost when running Tomcat with systemd and journald
Summary: Last logs lost when running Tomcat with systemd and journald
Status: RESOLVED WONTFIX
Alias: None
Product: Tomcat 8
Classification: Unclassified
Component: Catalina (show other bugs)
Version: 8.5.x-trunk
Hardware: PC Linux
: P2 enhancement (vote)
Target Milestone: ----
Assignee: Tomcat Developers Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-24 02:10 UTC by Jeff Turner
Modified: 2017-02-15 09:43 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeff Turner 2017-01-24 02:10:36 UTC
Many people run Tomcat from systemd, and will be using 'journalctl' to manage Tomcat's stdout. journalctl has a not-easily-fixed bug that causes the last few stdout/stderr lines to be 'lost'. This is a problem, as the last few lines are usually the most important. Here I suggest a small workaround in bin/catalina.sh that I feel would be safe and worthwhile.

A simple example: let's deploy a Tomcat instance running as a dedicated user:

# useradd -s /bin/bash -d /tmp testtomcat
# su - testtomcat
curl -sLO http://www-us.apache.org/dist/tomcat/tomcat-8/v8.5.11/bin/apache-tomcat-8.5.11.tar.gz
tar xf apache-tomcat-8.5.11.tar.gz
cd apache-tomcat-8.5.11

If we run 'bin/catalina.sh run' it starts normally.

testtomcat@jturner-desktop:~/apache-tomcat-8.5.11$ ./bin/catalina.sh run
Using CATALINA_BASE:   /tmp/apache-tomcat-8.5.11
Using CATALINA_HOME:   /tmp/apache-tomcat-8.5.11
Using CATALINA_TMPDIR: /tmp/apache-tomcat-8.5.11/temp
Using JRE_HOME:        /usr/lib/jvm/java-8-oracle
Using CLASSPATH:       /tmp/apache-tomcat-8.5.11/bin/bootstrap.jar:/tmp/apache-tomcat-8.5.11/bin/tomcat-juli.jar
...

Let's introduce an error. Say we got JAVA_HOME wrong. We get an error message as expected:

testtomcat@jturner-desktop:~/apache-tomcat-8.5.11$ JAVA_HOME=/incorrect/path ./bin/catalina.sh run
Using CATALINA_BASE:   /tmp/apache-tomcat-8.5.11
Using CATALINA_HOME:   /tmp/apache-tomcat-8.5.11
Using CATALINA_TMPDIR: /tmp/apache-tomcat-8.5.11/temp
Using JRE_HOME:        /incorrect/path
Using CLASSPATH:       /tmp/apache-tomcat-8.5.11/bin/bootstrap.jar:/tmp/apache-tomcat-8.5.11/bin/tomcat-juli.jar
./bin/catalina.sh: 1: exec: /incorrect/path/bin/java: not found
testtomcat@jturner-desktop:~/apache-tomcat-8.5.11$ 

All find and dandy.

Now let's run Tomcat via systemd. Create a minimal service file as root and load it:

# cat > /etc/systemd/system/testtomcat.service <<EOF
[Service]
Type=simple
User=testtomcat
ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run
EOF
# systemctl daemon-reload

(an aside: see https://jdebp.eu/FGA/systemd-house-of-horror/tomcat.html for why we call catalina.sh run from systemd instead of startup.sh) 

Start it, and see the status:

# systemctl start testtomcat
# systemctl status testtomcat -n5
● testtomcat.service
   Loaded: loaded (/etc/systemd/system/testtomcat.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2017-01-24 12:43:18 AEDT; 47s ago
 Main PID: 30648 (java)
    Tasks: 33
   Memory: 109.2M
      CPU: 3.979s
   CGroup: /system.slice/testtomcat.service
           └─30648 /usr/bin/java -Djava.util.logging.config.file=/tmp/apache-tomcat-8.5.11/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2

Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.173 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory /tmp/apache-to
Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.182 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory /tmp/apach
Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.184 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler [http-nio-8180]
Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.187 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler [ajp-nio-8109]
Jan 24 12:43:20 jturner-desktop catalina.sh[30648]: 24-Jan-2017 12:43:20.188 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 649 ms


Happiness. All working normally.

Now let's introduce an error. Any error that prevents Tomcat startup will do (e.g. incorrect permissions), but let's go with our invalid JAVA_HOME example:

cat > /etc/systemd/system/testtomcat.service <<EOF
[Service]
Type=simple
User=testtomcat
Environment=JAVA_HOME=/incorrect/path
ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run
EOF
# systemctl daemon-reload
# systemctl restart testtomcat
# systemctl status testtomcat -n5
● testtomcat.service
   Loaded: loaded (/etc/systemd/system/testtomcat.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2017-01-24 12:44:40 AEDT; 6s ago
  Process: 30846 ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run (code=exited, status=127)
 Main PID: 30846 (code=exited, status=127)

Jan 24 12:44:40 jturner-desktop systemd[1]: Started testtomcat.service.
Jan 24 12:44:40 jturner-desktop systemd[1]: testtomcat.service: Main process exited, code=exited, status=127/n/a
Jan 24 12:44:40 jturner-desktop systemd[1]: testtomcat.service: Unit entered failed state.
Jan 24 12:44:40 jturner-desktop systemd[1]: testtomcat.service: Failed with result 'exit-code'.


Unhappiness, as expected, but note one important detail: the actual error "./bin/catalina.sh: 1: exec: /incorrect/path/bin/java: not found" is missing in the logs.

Again, the above is just an example. The last log message from ANY failure, generated by scripts or Tomcat itself, will be lost.

Here's what's going on. The logs are not actually 'lost', but just not "tagged" as belonging to the testtomcat unit (i.e. you can see them if you 'journalctl -f' before start). This is due to a systemd bug marked cant-fix:

https://github.com/systemd/systemd/issues/1347

which I was referred to when questioning this behaviour on stackoverflow:

https://superuser.com/questions/1169906/systemd-bug-stdout-not-tagged-as-belonging-to-unit-when-non-root-user-specifie

The systemd bug isn't going to be fixed soon, as it relies on kernel additions.

Here's what Tomcat can do. A simple workaround is to 'sleep' for a short period after emitting our last output, to give systemd a chance to tag it. Put this near the top of bin/catalina.sh:

# If running in systemd, give it a chance to tag the last output. https://github.com/systemd/systemd/issues/1347
trap 'sleep 0.1' INT TERM EXIT

and now when we 'systemctl status testtomcat' (or 'journalctl -u testtomcat') we see the final error message:

# systemctl restart testtomcat
# systemctl status testtomcat
● testtomcat.service
   Loaded: loaded (/etc/systemd/system/testtomcat.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2017-01-24 12:58:39 AEDT; 6s ago
  Process: 32326 ExecStart=/tmp/apache-tomcat-8.5.11/bin/catalina.sh run (code=exited, status=127)
 Main PID: 32326 (code=exited, status=127)

Jan 24 12:58:39 jturner-desktop systemd[1]: Started testtomcat.service.
Jan 24 12:58:39 jturner-desktop catalina.sh[32326]: /tmp/apache-tomcat-8.5.11/bin/catalina.sh: 1: exec: /incorrect/path/bin/java: not found
Jan 24 12:58:39 jturner-desktop systemd[1]: testtomcat.service: Main process exited, code=exited, status=127/n/a
Jan 24 12:58:39 jturner-desktop systemd[1]: testtomcat.service: Unit entered failed state.
Jan 24 12:58:39 jturner-desktop systemd[1]: testtomcat.service: Failed with result 'exit-code'.

I'm not a POSIX guru but AFAIK 'trap' and '/bin/sleep' are pretty vanilla. The choice of a 0.1s delay is not very scientific. It fixes the problem on my system.
Comment 1 Jeff Turner 2017-01-24 02:13:01 UTC
FYI I can't add an attachment - keep getting 'Permission denied' from attachment.cgi. Here's the suggested patch:


--- bin/catalina.sh.orig	2017-01-24 12:54:39.181880382 +1100
+++ bin/catalina.sh	2017-01-24 12:55:56.862144647 +1100
@@ -100,6 +100,9 @@
 #                   case the default is "true"
 # -----------------------------------------------------------------------------
 
+# If running in systemd, give it a chance to tag the last output. https://github.com/systemd/systemd/issues/1347
+trap 'sleep 0.1' INT TERM EXIT
+
 # OS specific support.  $var _must_ be set to either true or false.
 cygwin=false
 darwin=false
Comment 2 Mark Thomas 2017-01-25 20:39:39 UTC
I'd prefer to see this enabled with an environment variable (that should also set the length of the sleep). That way the default behaviour doesn't change, those that need to fix it have an option to do so and the delay can be set to an appropriate value for each system.
Comment 3 Jeff Turner 2017-01-26 12:09:32 UTC
> That way the default behaviour doesn't change, those that need to fix it have an option to do so and the delay can be set to an appropriate value for each system.

Okay, how about:

# If started with systemd, add 'Environment=CATALINA_EXIT_DELAY=0.1' to your service file to prevent systemd losing the last few log messages.  See https://github.com/systemd/systemd/issues/1347
if [ -n "${CATALINA_EXIT_DELAY:-}" ]; then
        trap 'sleep ${CATALINA_EXIT_DELAY}' INT EXIT TERM
fi
Comment 4 Mark Thomas 2017-02-13 16:41:21 UTC
Thanks for the suggestion.

The new environment variable will be included in:
- trunk for 9.0.0.M18 onwards
- 8.5.x for 8.5.12 onwards
- 8.0.x for 8.0.42 onwards
- 7.0.x for 7.0.76 onwards
Comment 5 Konstantin Kolinko 2017-02-14 11:42:58 UTC
Re: r1782805

+if [ -n "${CATALINA_EXIT_DELAY:-}" ]; then
+    trap 'sleep ${CATALINA_EXIT_DELAY}' INT EXIT TERM
+fi

1. I suspect that "trap" won't see the value of CATALINA_EXIT_DELAY variable.

I think that you either need double quotes there (so that variable expansion occurs), or add a line to export the variable, e.g.

export CATALINA_EXIT_DELAY


2. I think that whatever sets the CATALINA_EXIT_DELAY variable, can also execute a "trap" command.

E.g. you can do it in bin/setenv.sh

In the systemd example, I suspect that you can write several commands (connected with '&&') in the ExecStart line.

The original issue concerns all programs, started with systemd, so you have to patch all other services that you are starting, not only Tomcat?

Why a Tomcat-specific patch?


Overall, an interesting feature, but
- if someone configures a significant delay, it will cause concerns
- it is a bit of a hack

3.
+#   CATALINA_EXIT_DELAY
+#                   (Optional) If started with systemd, add
+#                   'Environment=CATALINA_EXIT_DELAY=0.1' to your service file
+#                   to prevent systemd losing the last few log messages.
+#                   See https://github.com/systemd/systemd/issues/1347

1) POSIX says that sleep expects an integer number.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sleep.html

GNU implementation of sleep (in Linux) allows a floating point number and allows suffixes (s m h d).

2) Maybe reference this bug as well.
At least it this bug has an example of how Tomcat can be run with systemd.
Comment 6 Jeff Turner 2017-02-15 03:43:29 UTC
Konstantin, thanks for reviewing. You caused me to test and think it over some more. This patch is indeed useless, and should be removed (sorry Mark!).

The reasons are:

- This patch doesn't help for logs generated by Java/Tomcat/webapps (I thought it would, but never actually tested). That's because 'catalina.sh run' calls 'exec', which replaces the entire bash process, and so the trap is never called. Besides which, it seems that the worry of systemd losing Java output is theoretical, at least on my system. In practice JVM shutdown is sufficiently slow that all output, even from shutdown hooks, is actually caught by systemd [1].

- Making the delay opt-in means most systemd users won't benefit from it automatically. If something must be explicitly done, why not, as Konstantin points out, apply the workaround directly in the systemd service file:

ExecStart=/bin/sh -c '/tmp/apache-tomcat-8.5.11/bin/catalina.sh run; sleep 0.1'


Regards,
Jeff


[1] To verify that the last Java stdout is actually caught by systemd:

cd /tmp
cat > PrintExit.java <<EOF
public class PrintExit {
	public static void main(String[] args) throws Throwable {

		Runtime.getRuntime().addShutdownHook(new Thread() {
			public void run() {
				System.out.println("This is the very last thing printed before shutdown"); 
			}
		});
		java.util.concurrent.TimeUnit.MINUTES.sleep(1000);
		System.out.println("Application Terminating ...");
	}
}
EOF
javac PrintExit.java
cat > /etc/systemd/system/printexit.service <<EOF
[Service]
Type=simple
User=root
WorkingDirectory=/tmp
ExecStart=/usr/bin/java PrintExit
EOF
systemctl daemon-reload
journalctl -u printexit -f &
systemctl start printexit
# Feb 15 14:32:27 jturner-desktop systemd[1]: Started printexit.service.
systemctl stop printexit
# Feb 15 14:32:30 jturner-desktop systemd[1]: Stopping printexit.service...
# Feb 15 14:32:30 jturner-desktop java[30981]: This is the very last thing printed before shutdown
# Feb 15 14:32:30 jturner-desktop systemd[1]: printexit.service: Main process exited, code=exited, status=143/n/a
# Feb 15 14:32:30 jturner-desktop systemd[1]: Stopped printexit.service.
# Feb 15 14:32:30 jturner-desktop systemd[1]: printexit.service: Unit entered failed state.
# Feb 15 14:32:30 jturner-desktop systemd[1]: printexit.service: Failed with result 'exit-code'.


(In reply to Konstantin Kolinko from comment #5)
> Re: r1782805
>
Comment 7 Mark Thomas 2017-02-15 09:21:48 UTC
(In reply to Jeff Turner from comment #6)
> This patch is indeed useless, and should be removed (sorry Mark!).

No problem. I'll revert those changes and then close this as WONTFIX.
Comment 8 Mark Thomas 2017-02-15 09:43:09 UTC
Changes reverted. There were never included in a release.