Bug 54415 - Please tell the root cause of mutex and scoreboard generation failure!
Summary: Please tell the root cause of mutex and scoreboard generation failure!
Status: NEW
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: Core (show other bugs)
Version: 2.4.3
Hardware: PC Linux
: P2 enhancement (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2013-01-14 09:16 UTC by Jackie Zhang
Modified: 2018-01-17 14:38 UTC (History)
2 users (show)



Attachments
To pinpoint the root cause of the scoreboard generation error (1.94 KB, patch)
2013-01-16 06:20 UTC, Jackie Zhang
Details | Diff
Articulate the root cause of mutex generation failure (507 bytes, patch)
2013-01-16 06:29 UTC, Jackie Zhang
Details | Diff
Articulate the root cause of mutex generation failure (updated!) (662 bytes, patch)
2013-01-17 18:47 UTC, Jackie Zhang
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jackie Zhang 2013-01-14 09:16:30 UTC
Hi, Apache httpd,

I experienced the following mutex generation error:

[core:emerg] [pid 30875:tid 140406146557696] (28)No space left on device: AH00023: Couldn't create the rewrite-map mutex
285 AH00016: Configuration Failed

It's weird because I don't have "rewrite-map" related configuration in my httpd.conf, I even do not know what is a rewrite-map mutex, and what it's used for. Also, I checked my disk space and filesystem quota but I have plenty of space.

I tried hard to modify my httpd.conf to make it work but failed. Then, I searched on the Internet and figured out it's because of orphan semaphores due to unclean shutdown.

But, I was surprised to see so many people experienced the same problem. Just randomly grab some of them:

http://comments.gmane.org/gmane.linux.uml.user/14270
http://rackerhacker.com/2007/08/24/apache-no-space-left-on-device-couldnt-create-accept-lock/
http://forum.directadmin.com/showthread.php?t=43938&page=1
http://blog.mohammadzadeh.info/index.php/apache-no-space-left-on
http://linuxwindowsmaster.com/fixing-apache-%E2%80%9Cno-space-left-on-device-couldn%E2%80%99t-create-accept-lock%E2%80%9D-errors/
(and many many others)
 
Even users suspected this's a bug and filed a bugzilla report:
https://issues.apache.org/bugzilla/show_bug.cgi?id=26265

Reading these posts, we can find that the problem confused even misled users to check "disk space", "quota limit", etc.(the same as I), and "took me several hours", "completely stumped", etc (also the same).

I strongly suggest to make the message more explicit and useful to users, and let users check their semaphore limit directly. Most of users including me do not know what does a mutex failure mean.

---------

The same thing applies to scoreboard creation failures. I used to get the error message which takes me a lot of time to fix. The message is like:

[core:crit] [pid 15657:tid 140370438330112] (12)Cannot allocate memory: AH00004: Unable to create or access scoreboard (anonymous shared memory failure)

Similarly, I never configured scoreboard related stuff, and I don't know what a scoreboard is and what is it used for.

(Searching on Internet, you can also find many users have the same problem!) 

It's not intuitive to users why setting a upper limit would cause shared memory problem, if they do not understand the scoreboard mechanism used for IPC.

So, I suggest to tell users something like "Reduce the ServerLimit and ThreadLimit settings, or clean the shared memory to increase the limit" in addition to the scoreboard creation failure message. 

Thanks you very much!

Best regards,
Jackie Zhang
Comment 1 Jackie Zhang 2013-01-14 19:07:05 UTC
Hi, Eric, 

I noticed the severity is changed from "major" to "Enhancement". 

Ok, let me see whether I have time to help the cases. I guess it's easy to modify the log message but have to make sure its correct under all the abnormal cases.

Best,
Jackie
Comment 2 Jackie Zhang 2013-01-16 06:20:26 UTC
Created attachment 29856 [details]
To pinpoint the root cause of the scoreboard generation error
Comment 3 Jackie Zhang 2013-01-16 06:21:22 UTC
Hi, 

This's the patch for the "scoreboard" problem.

The basic idea is to remind users of checking the two relevant configuration options, i.e., "ServerLimt" and "ThreadLimit". These two directives decides the scoreboard size, c.f., "AP_DECLARE(int) ap_calc_scoreboard_size(void)", in "server/scoreboard.c".

Also, I intentional notify the different between anonymous scoreboard which is in memory and file-based scoreboard which is on disk.

Thanks,
Jackie
Comment 4 Jackie Zhang 2013-01-16 06:29:00 UTC
Created attachment 29857 [details]
Articulate the root cause of mutex generation failure
Comment 5 Jackie Zhang 2013-01-16 06:29:49 UTC
Hi, this patch is for mutex one. Hope it makes sense.

Thanks,
Jackie
Comment 6 Mike Rumph 2013-01-16 22:07:14 UTC
Hello Jackie,

It looks like the patch to "Articulate the root cause of mutex generation failure" will need a small correction:

Because of the added field in line 394, the format string in line 390 will require an additional %s.

Thanks,

Mike
Comment 7 Jackie Zhang 2013-01-17 18:47:06 UTC
Created attachment 29863 [details]
Articulate the root cause of mutex generation failure (updated!)

Yes, exactly, Mike!
Thanks a lot for the sharp shot. GCC keeps silent this time :-(

Here's the refined one.

Thanks a lot!
Jackie
Comment 8 nada 2018-01-17 14:38:39 UTC
The logs are really misleading. Applying the proposed patches would be highly appreciated.