Apache OpenOffice (AOO) Bugzilla – Issue 32526
some findings about seeks with osl file functions
Last modified: 2010-07-26 19:47:47 UTC
Attachings some findings and a plausible patch for comment
Created attachment 16924 [details] strace -c of normal sal
Created attachment 16925 [details] strace -c of modified sal
Created attachment 16926 [details] what happens if sal uses stdio buffering
The patch (last attachment) modifies osl to use stdio buffering rather than direct file descriptor read/write which provides an apparent modest improvement during reading of bootstrap files (see strace of pre and post change which logs syscalls used from start to frame visible on screen). Currently the osl readline implementation has no buffering between calls resulting in a lot of potentially unnecessary seeking. cmc->sb: Do you think an approach like this provides a benefit ? mmeeks/dcbw: Might be of interest to you.
Caolan - the strace statistics (perhaps) hide the very common pattern [ caused - I think by daftnesses in various SvStream type APIs ], (wrt. EOS testing) of: read (3, "fooo", 4); lseek (beggining); lseek (end); lseek (old_location); read (4, "fooo", 23); This is really dim, particularly since the kernel will only start doing readahead when it gets 3 consecutive reads [or similar dumb algorithm] with no seeks in between - so this blows away any readahead at a potentially large cost.
sb->hro: Please take care of this.
meeks: yes. Removing that wild seeking is what I had in mind to improve for the case of the osl file read at least, and the patch attempt to address that.
Reassigned.
While I admit that a reduction of system calls like unnecessary lseek's is desirable I would like to see real profiler data proving that this is a real bottleneck everthing else is just speculation. Every fix especially in sal is very sensitive to the whole office any may cause unpredictable side effects. I'd like to avoid such changes in late phases of the development process if not solid data advice otherwise. I once fixed the osl_readLine code to read ahead bytes from a file only to find that the startup performance gain was marginal. I'm happy to introduce the patch in the beginning of the development of the next major version. Delayed due to limited resources.
Reassigned for change of responsibilities sake.
You might want to come to a decision how to move forward with the suggested patch. Changing target to "not determined"...
kso->all: hro is currently sick and will not be available before January 2007. That's why there is currenly no activity from hro here.
The patch looks good but we need to do deep testing to make sure there are no side effects.
Still planned for 2.3
*** Issue 21792 has been marked as a duplicate of this issue. ***
Changed target.
Added to CWS hro10
cmc: sorry, no time left to integrate into 2.4 codeline.
Patch will be tested within 3.0 timeline.
No time to apply for 3.0 as influences on platforms which have problems with buffered IO have to be tested.
At least the patch causes an out of file handles problem on Solaris Sparc 32 Bit (Solaris X86 not tested).
Ping
Postponed
reassigning to myself ...
mhu: Thank you for inviting me to this issue. Here is a wiki page describing what cmc mentioned a few years ago. :-) http://wiki.services.openoffice.org/wiki/Performance/Buffered_File_IO
This is now being worked on (cws mhu20) => accepting Raising priority to P3, adding keyword "performance" ...
osl file I/O functions are being implemented as buffered file I/O; work in progress on cws mhu20 => targeting to integrate into OOo 3.2
implementation finished on cws mhu20 => resolved fixed.
@cmc: Do you want to take a look at the changes to see if the meet your expectations?
I'm sure its all good :-)
Set verified
close issue