Bug 45464 - WebDav filesystem module is extremely slow
Summary: WebDav filesystem module is extremely slow
Status: RESOLVED FIXED
Alias: None
Product: Apache httpd-2
Classification: Unclassified
Component: mod_dav (show other bugs)
Version: 2.2.9
Hardware: PC Windows XP
: P3 major with 1 vote (vote)
Target Milestone: ---
Assignee: Apache HTTPD Bugs Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-23 00:46 UTC by Vitaly Polonetsky
Modified: 2008-10-15 12:58 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vitaly Polonetsky 2008-07-23 00:46:42 UTC
The mod_dav_fs module is very slow when listing 500+ files in a directory.

We were using 2.0.55, after an upgrade to 2.2.8, the dav module became extemely slow. By 10+ times compared to 2.0.55.

I've tried the latest 2.0.63 but the slowness remained. After a bit of checks the first version with this "slowness" is 2.0.61 (on 2.0.59 there is no problem).
This bug remained up to 2.2.8 and 2.2.9 which are the latest stable releases.

Some info about the tests:
I've tested on Windows XP and Windows 2003 Server.
I've tried disabling all the modules which are not needed.
These are 500+ xml files that should be listed.

The dav module makes no change. I suppose there was a faulty change in 2.0.61 or a change that was not synchronized with mod_dav or mod_dav_fs module which causes this problem.

I have network traces for 2.0.55 and 2.2.8 but they look almost the same, only the network speed (number of packets in second) is much slower, as if something blocks apache from giving the fast results.

Please check the changes in 2.0.61 and their affect on mod_dav and mod_dav_fs module.
Comment 1 Jeff Trawick 2008-07-24 08:31:23 UTC
I compared 2059 vs. 2061 on OpenSolaris and didn't find any difference in CPU usage or apparent throughput.  I have a collection "col1" with 5000 entries and had cadaver run "ls col1" 20 times.

In case my testcase is bad, could you try to reproduce the problem with cadaver as I did?

cadaver http://your-dav-url
PROMPT>ls your-collection

I see a Windows-specific change in apr between those levels that might cause more system overhead: http://svn.apache.org/viewvc?view=rev&revision=425624

I didn't examine the code in enough detail to see if that extra system overhead would affect listing a collection.

Comment 2 Jeff Trawick 2008-07-24 11:26:42 UTC
For now, scratch my previous request to try with cadaver and try this patch instead:

Index: modules/dav/fs/repos.c
===================================================================
--- modules/dav/fs/repos.c	(revision 679477)
+++ modules/dav/fs/repos.c	(working copy)
@@ -1480,9 +1480,8 @@
         dav_buffer_place_mem(pool, &fsctx->path1, dirent.name, len + 1, 0);
 
 
-        /* ### Optimize me, dirent can give us what we need! */
         status = apr_stat(&fsctx->info1.finfo, fsctx->path1.buf,
-                          APR_FINFO_NORM | APR_FINFO_LINK, pool);
+                          APR_FINFO_TYPE | APR_FINFO_LINK, pool);
         if (status != APR_SUCCESS && status != APR_INCOMPLETE) {
             /* woah! where'd it go? */
             /* ### should have a better error here */

testing further with 2.0.61, I found a call to apr_*stat which would trigger the extra Windows system overhead in 2.0.61+:

wanted=7582064 in the apr_stat call means retrieve lots of information about the file, even though this code only needs to know the type (dir, regular file, etc.)

for each member of collection!

(dbx) where
current thread: t@1
=>[1] apr_stat(finfo = 0x8047594, fname = 0x82251a0 "/export/home/trawick/davfs/col1/testfile.3148", wanted = 7582065, pool = 0x81eb0b8), line 234 in "filestat.c"
  [2] apr_lstat(finfo = 0x8047594, fname = 0x82251a0 "/export/home/trawick/davfs/col1/testfile.3148", wanted = 7582064, pool = 0x81eb0b8), line 287 in "filestat.c"
  [3] dav_fs_walker(fsctx = 0x8047550, depth = 1), line 1458 in "repos.c"
  [4] dav_fs_internal_walk(params = 0x80476f0, depth = 1, is_move = 0, root_dst = (nil), response = 0x80476ec), line 1729 in "repos.c"
  [5] dav_fs_walk(params = 0x80476f0, depth = 1, response = 0x80476ec), line 1738 in "repos.c"
  [6] dav_method_propfind(r = 0x81eb0f0), line 2045 in "mod_dav.c"
  [7] dav_handler(r = 0x81eb0f0), line 4626 in "mod_dav.c"
  [8] ap_run_handler(0x81eb0f0, 0x0), at 0x8077866 
  [9] ap_invoke_handler(r = 0x81eb0f0), line 364 in "config.c"
  [10] ap_process_request(r = 0x81eb0f0), line 249 in "http_request.c"
  [11] ap_process_http_connection(c = 0x817b358), line 253 in "http_core.c"
  [12] ap_run_process_connection(0x817b358, 0x0), at 0x8084ff6 
  [13] ap_process_connection(c = 0x817b358, csd = 0x817b280), line 176 in "connection.c"
  [14] child_main(child_num_arg = 5), line 610 in "prefork.c"
  [15] make_child(s = 0x80d0168, slot = 5), line 704 in "prefork.c"
  [16] ap_mpm_run(_pconf = 0x80ce388, plog = 0x80fa438, s = 0x80d0168), line 1001 in "prefork.c"
  [17] main(argc = 3, argv = 0x8047978), line 623 in "main.c"

I tested essentially the same patch onto 2.0.61 on Open Solaris, but the extra overhead wasn't required on Unix anyway and performance didn't change noticeably.

There are likely some other apr_*stat() calls in dav/fs/repos.c that can be optimized.  For now please see how much this helps on Windows.
Comment 3 Vitaly Polonetsky 2008-07-25 10:16:20 UTC
First of all I have to say that this bug affects only Windows systems. Or at least my tests on ubuntu with their version of apache 2.2 it doesn't happen.

I've downloaded the source of 2.0.63. That file (modules/dav/fs/repos.c) was slightly older. In any case, I've changed the line:
        /* ### Optimize me, dirent can give us what we need! */
        status = apr_lstat(&fsctx->info1.finfo, fsctx->path1.buf, 
                           APR_FINFO_NORM, pool);

Changed the same APR_FINFO_NORM to APR_FINFO_TYPE.

Compiled twice, once with the original value, tested, and then compiled after the change and tested again.

Now the test and results:
I've set "Dav On" on the icons directory that comes with apache and copied the files there twice to multiply them (Copy of... , Copy of Copy of...).
That gave me a directory with 620 files.

With the original version of mod_dav_fs (comes with 2.0.63): 9sec.
With the original module recompiled (debug mode): 10sec.
With the patched module (debug mode): 3sec.
With the patched module (release mode): 0.5sec.

The tests were run with this command:
echo ls | time cadaver http://path to apache folder with 620 files/
The last result line is:
0.06user 0.01system 0:00.51elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+664minor)pagefaults 0swaps
Notice the 0:00.51 elapsed time.

If the change I've made is equivalent to your suggested change (I suppose so), then this is the fix for my bug.

If you need another tests just let me know.

Thank you.
Comment 4 Jeff Trawick 2008-07-25 11:31:32 UTC
>If the change I've made is equivalent to your suggested change (I suppose so)
Yes, that's the exact change I tested with 2.0.61.

>then this is the fix for my bug.
Great!  Thanks so much for narrowing down the release where the issue was introduced, and then testing the proposed fix on the affected platform.

I'll get the fix committed to trunk in the next couple of days then see how far back folks want to backport.
Comment 5 Jeff Trawick 2008-08-12 07:13:40 UTC
This is fixed in trunk and proposed for backport to 2.2.x.
Comment 6 Ruediger Pluem 2008-10-15 12:58:42 UTC
Backported to 2.2.10.