|Summary:||pthread mutexes are leaking?|
|Component:||Core||Assignee:||Apache HTTPD Bugs Mailing List <bugs>|
Description mark 2018-12-20 09:49:52 UTC
Using the Weblogic Webserver plugin as a proxy module to the Weblogic Webserver plugin, we frequently see the following error in the logs, from the module. Mon Dec 17 08:24:13.857204 2018] [weblogic:error] [pid 3241716:tid 140206280066816] [client 18.104.22.168:0] couldn't acquire p_lock [Mon Dec 17 08:24:13.857238 2018] [weblogic:error] [pid 3241716:tid 140206280066816] [client 22.214.171.124:0] <3241716154503505326814> *******Exception type [NO_RESOURCES] (countn't acquire p_lock) raised at line 2872 of ApacheProxy.cpp [Mon Dec 17 08:24:13.857314 2018] [weblogic:error] [pid 3241716:tid 140206280066816] [client 126.96.36.199:0] ap_proxy: trying GET /<redacted>/!@8ec1ade37d9b45a9e45d9d0c5a7c3618! at backend host 188.8.131.52/24101; got exception 'NO_RESOURCES: [os error=115, line 2872 of ApacheProxy.cpp]: countn't acquire p_lock'; state: preparing request headers (wrote? N read? N); not failing over Oracle have fixed this error in their OHS (Oracle HTTPD server) which is a fork of Apache (presumably 2.4) with the following reference. https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=446068623843222&parent=BUG_MATRIX&sourceId=28214157&id=2466345.1&_afrWindowMode=0&_adf.ctrl-state=3o8lctsrg_155 and the following cryptic note: "There was an issue attaching the process lock to each child thread this has been corrected by the following bug: Bug 28214157 - OHS STOPS PROCESSING WEBSOCKET REQUESTS WITH ERROR (COUNTN'T ACQUIRE P_LOCK)" This fix was published very recently, 5 Nov 2018. Now that we are facing this problem, the Oracle support team are suggesting the Apache maintainers will need to apply this same fix. Unfortunately, I don't know what it is in any details. This error is characteristic of a pthread mutex leak, acquiring another lock on the same mutex with each thread entry. A little test C program shows that this error is only achieved if the same mutex is locked 2^32-1 times and then once more, so clearly there's a 32-bit integer tracking locks. I honestly cannot tell is this leak is in the plugin or in the apache core. It seems more reasonable to assume the leak is in the plugin, but Oracle support have directed me to the Apache maintainers. I will attempt to clarify where they did the fix, but I thought I would raise this issue here as well for information, if nothing else.
Comment 1 mark 2018-12-20 09:53:04 UTC
And to clarify, does anybody on the httpd team understand that comment "There was an issue attaching the process lock to each child thread" and is it obvious where you might review the httpd child code?
Comment 2 Yann Ylavic 2018-12-20 10:15:30 UTC
(In reply to mark from comment #1) > > "There was an issue attaching the process lock to each child thread" > > and is it obvious where you might review the httpd child code? I "think" it relates to a missing apr_proc_mutex_child_init(), presumably in their module. Possibly (still) Oracle applied r1738793 (from APR), and while previously apr_proc_mutex_child_init() was a noop for pthread mutexes, it is now required for refcounting to work. Plenty of assumptions here, I know nothing of weblogic and your report only shows things related to it...
Comment 3 mark 2018-12-20 10:34:38 UTC
Great feedback and fast, thank you very much. I will relay your comments to Oracle.
Comment 4 mark 2019-01-21 10:46:51 UTC
FWIW, we have resolved our issues with excess CPU consumption and mutex leaks by appling Oracle update Patch 27762852: TRACKING BUG FOR BUG 27207688 TO PROVIDE PATCH ON APACHE WLPLUGIN