Bug 55897 - [PATCH]patch with SO_REUSEPORT support
Summary: [PATCH]patch with SO_REUSEPORT support
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: All (show other bugs)
Version: 2.5-HEAD
Hardware: All All
: P2 normal (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords: FixedInTrunk
: 56279 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-12-17 16:01 UTC by Yingqi.Lu
Modified: 2015-09-28 08:27 UTC (History)
2 users (show)



Attachments
[PATCH]prefork_mpm patch with SO_REUSEPORT support (25.00 KB, text/plain)
2013-12-17 16:01 UTC, Yingqi.Lu
Details
[PATCH]prefork_mpm patch with SO_REUSEPORT support (25.08 KB, patch)
2013-12-17 16:42 UTC, Yingqi.Lu
Details | Diff
[PATCH]prefork_mpm patch with SO_REUSEPORT support (25.17 KB, patch)
2013-12-17 17:39 UTC, Yingqi.Lu
Details | Diff
APR patch for SO_REUSEPORT support (2.65 KB, patch)
2013-12-17 17:43 UTC, Yingqi.Lu
Details | Diff
[PATCH]prefork_mpm patch with SO_REUSEPORT support (27.68 KB, patch)
2014-01-06 09:59 UTC, Yingqi.Lu
Details | Diff
[PATCH]prefork_mpm patch with SO_REUSEPORT support (27.55 KB, patch)
2014-01-24 22:03 UTC, Yingqi.Lu
Details | Diff
[PATCH]prefork_mpm patch with SO_REUSEPORT support (18.55 KB, patch)
2014-03-17 20:26 UTC, Yingqi.Lu
Details | Diff
patch with SO_REUSEPORT support (51.22 KB, patch)
2014-05-13 19:50 UTC, Yingqi.Lu
Details | Diff
patch with SO_REUSEPORT support (51.18 KB, patch)
2014-05-16 18:47 UTC, Yingqi.Lu
Details | Diff
[PATCH]patch with SO_REUSEPORT support (committed 2014-06-03, see comment 14) (56.15 KB, patch)
2014-06-02 08:17 UTC, Yingqi.Lu
Details | Diff
[PATCH]patch with SO_REUSEPORT support (6.73 KB, patch)
2014-10-05 06:02 UTC, Yingqi.Lu
Details | Diff
[PATCH]patch with SO_REUSEPORT support (6.73 KB, patch)
2014-10-05 18:13 UTC, Yingqi.Lu
Details | Diff
[PATCH]incremental patch with SO_REUSEPORT support (to be applied on top of attachment 31681) (14.47 KB, patch)
2014-10-05 21:40 UTC, Yingqi.Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yingqi.Lu 2013-12-17 16:01:25 UTC
Created attachment 31124 [details]
[PATCH]prefork_mpm patch with SO_REUSEPORT support

Issues we have found:
Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel Xeon 2600 series systems, using an open source three tier social networking web server workload, revealed performance scaling issues.  In current software single listen statement (listen 80) provides better scalability due to un-serialized accept. However, when system is under very high load, this can lead to big number of child processes stuck in D state. On the other hand, the serialized accept approach cannot scale with the high load either.  In our analysis, a 32-thread system, with 2 listen statements specified, could scale to just 70% utilization, and a 64-thread system, with signal listen statement specified (listen 80, 4 network interfaces), could scale to only 60% utilization. 

What we have changed:
Based on those findings, we created a prototype patch for prefork mpm which extends performance and thread utilization. In Linux kernel newer than 3.9, SO_REUSEPORT is enabled. This feature allows multiple sockets listen to the same IP:port and automatically round robins connections. We use this feature to create 4 duplicated listener records of the original one and partition the child processes into 4 buckets. Each bucket listens to 1 IP:port. A mutex is being used to guard only 1 child wakes up when there is a request comes in. In case of old kernel which does not have the SO_REUSEPORT enabled, we modified the "multiple listen statement case" by creating 1 listen record for each listen statement and partitioning the child processes into different buckets. Each bucket listens to 1 IP:port. 

In the current work, we added the SO_REUSEPORT enablement into APR 1.5.0 and filed a small patch with bugzilla ID 55894 (patch for SO_REUSEPORT) for this. To review the patch, please use the APR patch as well. 

Testing results:
Quick tests of the patch, running the same workload, demonstrated a 22% throughput increase with 32-threads and 2 listen statements (Linux kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, without SO_REUSEPORT), 10% performance gain was measured. With 64 threads, a 60% throughput increase was achieved with just 1 listen statement (listen 80) and 1 active IP1 (Linux Kernel 3.12.5). We also observed big reduction in response time, in addition to the throughput improvement gained1 in our tests.

This is our first patch to the Apache community(besides the small APR change). Please help us review it and let us know if there is anything we might revise to improve it. Your feedback is very much appreciated.

Configuration:
<IfModule prefork.c>
    ListenBacklog 105384
    ServerLimit 105000
    MaxClients 1024
    MaxRequestsPerChild 0
    StartServers 64
    MinSpareServers 8
    MaxSpareServers 16
</IfModule>

1. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.  Configurations: Xeon 2-socket 2680 server with 16x8GB DDR3-1333 and 4-socket Xeon 4650 server with 48x4GB DDR3-1066 (running at 1.3GHz), shown with workload as tested by Intel November 2013.
Comment 1 Yingqi.Lu 2013-12-17 16:42:40 UTC
Created attachment 31125 [details]
[PATCH]prefork_mpm patch with SO_REUSEPORT support
Comment 2 Yingqi.Lu 2013-12-17 17:39:55 UTC
Created attachment 31126 [details]
[PATCH]prefork_mpm patch with SO_REUSEPORT support

The patch file is updated as this version.
Comment 3 Yingqi.Lu 2013-12-17 17:43:29 UTC
Created attachment 31127 [details]
APR patch for SO_REUSEPORT support

This is the APR patch adding SO_REUSEPORT to apr_socket_opt_set(). It can also be found at Bug 55894. 

Thanks!
Comment 4 Yingqi.Lu 2014-01-06 09:59:11 UTC
Created attachment 31171 [details]
[PATCH]prefork_mpm patch with SO_REUSEPORT support

This is the most recent version of the patch.

In order to use the httpd patch, please apply the apr patch first.

Thanks!
Comment 5 Jeff Trawick 2014-01-06 12:18:53 UTC
Was this change tested with event or worker MPMs?  Is the prefork MPM desirable in this scenario due to the use of mod_php or some other third-party module, or is there an issue with bundled modules that necessitates the use of prefork, or is there some other reason for prefork?

FWIW, you don't actually need an APR change to make the desired setsockopt call.  Call apr_os_sock_get() to get the file descriptor and call setsockopt directly.  That would make it easier for others to use the patch with existing builds of APR.
Comment 6 Yingqi.Lu 2014-01-06 17:14:48 UTC
Hi Jeff,

Thanks very much for your response! 

Yes, we chose to use prefork mpm due to the use of libphp5.so (non-zts). We tested the zts version as well, but it showed some performance issues before. then, we decided to try the patch on prefork mpm first. 

We can surely extend this patch to worker and event mpm. We will use fcgi instead of libphp5.so for testing. Also, We will follow your suggestion to call apr_os_sock_get() in the follow up version of the patch.

We will update this thread soon.

Thanks!

Yingqi



(In reply to Jeff Trawick from comment #5)
> Was this change tested with event or worker MPMs?  Is the prefork MPM
> desirable in this scenario due to the use of mod_php or some other
> third-party module, or is there an issue with bundled modules that
> necessitates the use of prefork, or is there some other reason for prefork?
> 
> FWIW, you don't actually need an APR change to make the desired setsockopt
> call.  Call apr_os_sock_get() to get the file descriptor and call setsockopt
> directly.  That would make it easier for others to use the patch with
> existing builds of APR.
Comment 7 Yingqi.Lu 2014-01-24 22:03:56 UTC
Created attachment 31253 [details]
[PATCH]prefork_mpm patch with SO_REUSEPORT support
Comment 8 Yingqi.Lu 2014-01-24 22:11:06 UTC
In this newer version of the patch, we removed dependency of APR change by using apr_os_sock_get() and setsockopt() directly from listen.c file. Now, you only need this attached httpd patch file to test the patch. Jeff, thanks very much for your suggestion!

After last update, we also spent some time testing this prefork patch and we observed over 2X performance improvements on Intel modern dual socket platforms.

We are still actively working on extending this patch to worker and event mpms. Meanwhile, we would like to gather your feedback and comments on the current prefork patch. Please take some time test it and let us know how it works in your environment.

Our configurations are:
Configuration:
<IfModule prefork.c>
    ListenBacklog 105384
    ServerLimit 105000
    MaxClients 1024
    MaxRequestsPerChild 0
    StartServers 64
    MinSpareServers 8
    MaxSpareServers 16
</IfModule>

1. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Thanks,
Yingqi

(In reply to Yingqi.Lu from comment #6)
> Hi Jeff,
> 
> Thanks very much for your response! 
> 
> Yes, we chose to use prefork mpm due to the use of libphp5.so (non-zts). We
> tested the zts version as well, but it showed some performance issues
> before. then, we decided to try the patch on prefork mpm first. 
> 
> We can surely extend this patch to worker and event mpm. We will use fcgi
> instead of libphp5.so for testing. Also, We will follow your suggestion to
> call apr_os_sock_get() in the follow up version of the patch.
> 
> We will update this thread soon.
> 
> Thanks!
> 
> Yingqi
> 
> 
> 
> (In reply to Jeff Trawick from comment #5)
> > Was this change tested with event or worker MPMs?  Is the prefork MPM
> > desirable in this scenario due to the use of mod_php or some other
> > third-party module, or is there an issue with bundled modules that
> > necessitates the use of prefork, or is there some other reason for prefork?
> > 
> > FWIW, you don't actually need an APR change to make the desired setsockopt
> > call.  Call apr_os_sock_get() to get the file descriptor and call setsockopt
> > directly.  That would make it easier for others to use the patch with
> > existing builds of APR.
Comment 9 Yingqi.Lu 2014-03-17 20:25:39 UTC
Dear all,

Based on the feedback we received, we modified this patch. Here is the most recent version. 

Below are the changes we made into this new version:

1. We separate the original patch between with and without SO_REUSEPORT into two separated patches. The SO_REUSEPORT (current patch) patch does not change the original listen sockets, it just duplicate the original one into multiple ones. Since the listen sockets are identical, there is no need to change the idle_server_maintenance. The bucket patch (without SO_REUSEPORT, bugzilla #56279), on the other hand, it breaks down the original listen record (if there are multiple listen socks) to multiple listen record linked lists. In this case, idle_server_maintenance is implemented at bucket level to address the situation that imbalanced traffic occurs among different listen sockets/children buckets. In the bucket patch, the polling in the child process is removed since each child only listens to 1 sock. 

2. We make the “detection of SO_REUSEPORT” at run time. 

3. The current patch is generated against the httpd-trunk.

Again, thanks very much for all the comments and feedback. Please let us know if there are more changes we need to complete to make them accepted.

Thanks,
Yingqi Lu
Comment 10 Yingqi.Lu 2014-03-17 20:26:44 UTC
Created attachment 31397 [details]
[PATCH]prefork_mpm patch with SO_REUSEPORT support
Comment 11 Yingqi.Lu 2014-05-13 19:50:21 UTC
Created attachment 31616 [details]
patch with SO_REUSEPORT support

This newer version of the patch extends from original prefork mpm to all three mpms for Linux OS (prefork, worker and event)
Comment 12 Yingqi.Lu 2014-05-16 18:47:43 UTC
Created attachment 31632 [details]
patch with SO_REUSEPORT support

Based on the feedback from the developer community, this version of the patch checks if sysconf(_SC_NPROCESSORS_ONLN) is supported on the system.
Comment 13 Yingqi.Lu 2014-06-02 08:17:18 UTC
Created attachment 31681 [details]
[PATCH]patch with SO_REUSEPORT support (committed 2014-06-03, see comment 14)

Based on the feedback received from the developer community, I keep the ap_mpm_pod_signal() and ap_mpm_pod_killpg() exactly the same as the original ones. I modify dummy_connection() instead.
Comment 14 Kaspar Brand 2014-10-05 05:40:35 UTC
Attachment 31681 [details] was committed to trunk with minor tweaks as r1599531 (in early June 2014, see https://mail-archives.apache.org/mod_mbox/httpd-dev/201406.mbox/<D8167D13-996A-40FA-8EF7-281E71507118@jaguNET.com>).
Comment 15 Yingqi.Lu 2014-10-05 06:02:38 UTC
Created attachment 32079 [details]
[PATCH]patch with SO_REUSEPORT support

Attached patch is the fix to address the restart/graceful restart issues. Thanks very much for Kaspar Brand's feedback!

The patch is based on httpd trunk r1629441. The changes are:

1. Fix the graceful restart issue for prefork/worker/event MPM. 
2. Fix the "server seems busy" and "scoreboard is full" issue on restart for both worker and event MPM. Prefork does not have this issue.
3. Guard the ap_daemons_to_start >= num_buckets. 
4. Change CPU thread count check from _SC_NPROCESSORS_ONLN to _SC_NPROCESSORS_CONF. This makes sure num_buckets to be a constant as long as the system is running. This change addresses the use case like: A user offline some of the CPU threads and then restart httpd. In this case, I think we need to make sure num_buckets does not change during the restart. 

Can some please review the patch and help add it into trunk?

Thanks,
Yingqi Lu
Comment 16 Yingqi.Lu 2014-10-05 18:13:22 UTC
Created attachment 32081 [details]
[PATCH]patch with SO_REUSEPORT support

This is the most recent version of the fix. It fixes a small issue for event mpm in the version I sent out yesterday. Please use this one as the final fix.

Thanks,
Yingqi
Comment 17 Yingqi.Lu 2014-10-05 21:40:27 UTC
Created attachment 32082 [details]
[PATCH]incremental patch with SO_REUSEPORT support (to be applied on top of attachment 31681 [details])

Addressed the comments from Yann Ylavic, here is another update on the code based on trunk version 1629441.

Thanks, 
Yingqi
Comment 18 Kaspar Brand 2014-10-06 04:51:12 UTC
Comment on attachment 31681 [details]
[PATCH]patch with SO_REUSEPORT support (committed 2014-06-03, see comment 14)

Removing obsolete flag from the patch committed in early June (see comment 14).
Comment 19 Kaspar Brand 2014-10-06 04:54:30 UTC
Comment on attachment 32082 [details]
[PATCH]incremental patch with SO_REUSEPORT support (to be applied on top of attachment 31681 [details])

Clarifying patch description.
Comment 20 Yingqi.Lu 2014-10-06 04:57:17 UTC
Hi Kaspar,

Thank you very much for your help here!

Yingqi
Comment 21 Kaspar Brand 2014-10-11 09:31:26 UTC
Comment on attachment 32082 [details]
[PATCH]incremental patch with SO_REUSEPORT support (to be applied on top of attachment 31681 [details])

This incremental patch seems to have been committed with r1629909 (and followups), at least partly. Marking as obsolete.

For further discussion see:

https://mail-archives.apache.org/mod_mbox/httpd-dev/201410.mbox/%3CCAKQ1sVP6uSaMJw7%3Dj0893w24w%3D1%3DRmTSXexRDDLGNnt5Y-%3DiTA%40mail.gmail.com%3E

To reduce duplication/confusion, I suggest to no longer attach patches here and instead post and discuss new code/patches on the dev mailing list only (as was done with https://mail-archives.apache.org/mod_mbox/httpd-dev/201410.mbox/%3C9ACD5B67AAC5594CB6268234CF29CF9AA37D3409@ORSMSX113.amr.corp.intel.com%3E yesterday).
Comment 22 Yann Ylavic 2015-01-23 10:27:02 UTC
*** Bug 56279 has been marked as a duplicate of this bug. ***
Comment 23 Yann Ylavic 2015-01-23 10:28:37 UTC
Backport to 2.4.x proposed in r1651967.
Comment 24 Yann Ylavic 2015-09-28 08:27:48 UTC
Backported to 2.4.17 in r1705492.