Apache OpenOffice (AOO) Bugzilla – Issue 56783
cold-start speedup - file touch redux ...
Last modified: 2017-05-20 11:33:13 UTC
So - the configmgr likes to access/stat all files in various directories repeatedly; this patch reduces that and (controversially) uses the directory mtime instead of the individual file mtimes. Not certain how that constraint works on Win32 etc. but - should work well for Unix. With an strace log: strace ... | grep '\.xcu' | sed 's/^.*("//g' | sed 's/".*//' | sort | uniq | wc -l I get: before 208 hits, after: 118 hits Numbers not ~that useful since we go accessing a bunch of files that are not there which should be quick anyway - anyhow this saves an estimated 90 seeks == 900ms of cold-start time. The various traces causing the grief: #0 0x40fcf836 in opendir () from /lib/tls/libc.so.6 #1 0x408e3995 in osl_openDirectory () from ./libuno_sal.so.3 #2 0x455b9e7d in osl::Directory::open () at Reference.h:244 #3 0x455f3c8d in configmgr::localbe::LocalMultiStratum::listLayerIds (this=0x80f6300, aComponent=@0xbfcf0884, aEntity=@0xbfcf079c) at /nld/opt/suse-OpenOffice/src680-m133/configmgr/source/localbe/localmultistratum.cxx:115 #4 0x455e274b in configmgr::backend::MultiStratumBackend::searchSupportingStrata () at Reference.h:244 #5 0x455e2a7a in configmgr::backend::MultiStratumBackend::listLayers () at Reference.h:244 #6 0x455e0ce6 in configmgr::backend::MultiStratumBackend::listOwnLayers () at Reference.h:244 #7 0x455d0e27 in configmgr::backend::BackendAccess::getLayers () at Reference.h:244 #8 0x455d2a59 in configmgr::backend::BackendAccess::getNodeData () at Reference.h:244 #9 0x455a0bca in configmgr::backend::CacheController::loadDirectly () at Reference.h:244 More stats from here: #0 0x408e742e in osl_getFileStatus () from ./libuno_sal.so.3 #1 0x455b9e5c in osl::DirectoryItem::getFileStatus () at Reference.h:244 #2 0x455f4d31 in configmgr::localbe::implNormalizeURL () from ./configmgr2.uno.so #3 0x455f509e in configmgr::localbe::normalizeURL () from ./configmgr2.uno.so #4 0x455f0a26 in configmgr::localbe::LocalSchemaSupplier::initialize () at Reference.h:244 #5 0x4087aa64 in cppu::OSingleFactoryHelper::createInstanceWithArgumentsAndContext () from ./libuno_cppuhelpergcc3.so.3 #0 0x408e742e in osl_getFileStatus () from ./libuno_sal.so.3 #1 0x455b9eac in osl::DirectoryItem::getFileStatus () at Reference.h:244 #2 0x455b9d11 in configmgr::FileHelper::getModifyStatus () at Reference.h:244 #3 0x455e921c in configmgr::localbe::BasicLocalFileLayer::getTimestamp () at Reference.h:244 #4 0x455f15a7 in configmgr::localbe::LocalStratumBase::isMoreRecent () at Reference.h:244 #5 0x455f18fb in configmgr::localbe::LocalStratumBase::getLayer () at Reference.h:244 #6 0x455f22bc in configmgr::localbe::LocalSingleStratumBase::getLayer () at Reference.h:244 #7 0x455e28c7 in configmgr::backend::MultiStratumBackend::searchSupportingStrata () at Reference.h:244 #8 0x455e2aca in configmgr::backend::MultiStratumBackend::listLayers () at Reference.h:244 #9 0x455e0d36 in configmgr::backend::MultiStratumBackend::listOwnLayers () at Reference.h:244 #10 0x455d0e77 in configmgr::backend::BackendAccess::getLayers () at Reference.h:244 #11 0x455d2aa9 in configmgr::backend::BackendAccess::getNodeData () at Reference.h:244 #12 0x455a0c1a in configmgr::backend::CacheController::loadDirectly () at Reference.h:244 #13 0x455a2863 in configmgr::backend::CacheController::loadComponent () at Reference.h:244 #14 0x455a7406 in configmgr::TreeManager::requestSubtree () at Reference.h:244 #15 0x45634645 in configmgr::OProviderImpl::initSession () from ./configmgr2.uno.so #16 0x456317ff in configmgr::OProvider::implConnect () from ./configmgr2.uno.so #17 0x4563610d in configmgr::OConfigurationProvider::connect () from ./configmgr2.uno.so HTH.
Created attachment 30914 [details] patch
In terms of testing, touching/removing files in both ~/.ooo and system prefix had the expected refreshing effect (of the whole directory I guess [a slight down-side ;-]). And of course, changing user settings through the UI works nicely too.
.
comments from jb: > diff -u -p -u -r1.3 localfilehelper.cxx > --- configmgr/source/localbe/localfilehelper.cxx 16 Feb 2005 16:46:51 -0000 1.3 > +++ configmgr/source/localbe/localfilehelper.cxx 27 Oct 2005 11:21:18 -0000 > @@ -126,6 +128,10 @@ namespace configmgr if (_sURL.getLength() == 0) > return false; > > + // This method has no right to be so under-performing to > + // achieve so, so little of any usefulness. > + return true; > + > DirectoryItem aDirItem; This is function normalizeURL(). AFAICT we call it only for directories where we need to compare directory names. It is needed there (for example) when non-case-sensitive filesystems are involved (need to ask the sal/osl crowd what other reasons can cause a different file URL to be reported). I believe we can eliminate some redundant stats here with additional bookkeeping - but we can't get rid of that in general. We might also add a configmgrrc setting to skip this, if it is known that no file systems where osl_getFileStatus can produce a URL that is different from the input are involved, but the default for deploying to unknown environments would have to stay unchanged. > diff -u -p -u -r1.5 localmultistratum.cxx > --- configmgr/source/localbe/localmultistratum.cxx 8 Sep 2005 04:04:46 -0000 1.5 > +++ configmgr/source/localbe/localmultistratum.cxx 27 Oct 2005 11:21:18 -0000 > @@ -107,7 +107,7 @@ uno::Sequence< rtl::OUString > SAL_CALL rtl::OUString const aComponentUrl = aLayerUrl + componentToPath(aComponent); > using namespace osl; > - const sal_uInt32 k_STATUS_FIELDS = FileStatusMask_Type | FileStatusMask_FileName; + const sal_uInt32 k_STATUS_FIELDS = FileStatusMask_FileName; Directory aComponentDirectory(aComponentUrl); > DirectoryItem aItem; > std::vector< rtl::OUString > aResult; > @@ -132,13 +132,12 @@ uno::Sequence< rtl::OUString > SAL_CALL OSL_TRACE("Reading Component Directory - Error (%u) getting status of directory item.\n", unsigned(errcode)); > break; > } > - > - OSL_ENSURE( aItemDescriptor.isValid(FileStatusMask_Type), "Could not get type of directory item"); > - if (aItemDescriptor.getFileType() != FileStatus::Regular) > - continue; > OSL_ENSURE( aItemDescriptor.isValid(FileStatusMask_FileName), "Could not get Name of component found"); > OUString const aFileName = aItemDescriptor.getFileName(); > + > + // It is reasonable to assume a .xcu file is not a directory & => > + // not stat each directory entry at considerable cost. > if (!aFileName.endsWithIgnoreAsciiCaseAsciiL(RTL_CONSTASCII_STRINGPARAM(kLocalDataSuffix))) Fair enough. The .xcu check should be sufficient. This has the added effect that .xcu files may be symlinks - which also sounds reasonable. > diff -u -p -u -r1.16 filehelper.cxx > --- configmgr/source/misc/filehelper.cxx 8 Sep 2005 04:09:44 -0000 1.16 > +++ configmgr/source/misc/filehelper.cxx 27 Oct 2005 11:21:18 -0000 > @@ -143,9 +143,22 @@ namespace configmgr static const TimeValue k_NullTime = {0,0}; > sal_uInt64 aSize = 0; > rModifyTime = k_NullTime; > + rtl::OUString aURL; > > - DirectoryItem aItem; > - if (osl::FileBase::E_None == DirectoryItem::get(_sURL, aItem)) > + DirectoryItem aItem; > + > +#ifndef STAT_LOTS_OF_UNUSED_FILES > + // Statting every file takes way to long - we can use the directory > + // time-stamp & size instead, if we need to provoke an update > + // a simple cat > foo; rm foo will do that. > + // The dir size is (hopefully) non-0 and a good enough proxy value. > + > + aURL = FileHelper::getParentDir(_sURL); > +#else > + aURL = _sURL; > +#endif > + > + if (osl::FileBase::E_None == DirectoryItem::get(aURL, aItem)) > { > FileStatus aStatus(osl_FileStatus_Mask_ModifyTime|osl_FileStatus_Mask_Type|osl_FileStatus_Mask_FileSize); > if (osl::FileBase::E_None == aItem.getFileStatus(aStatus)) I don't think that is acceptable. Firstly, it violates the semantics of that function. If anybody really wanted to do this, the parent directory substitution should be in the caller (i.e. in the getTimestamp() method in localfilelayer.cxx). Secondly it does not work everywhere. The semantics of that 'timestamp' is that it must change, if the file changes. You can't guarantee that. Without actively provoking an update, changes might go completely unnoticed on most filesystems. Generally I'm wary of directory timestamps. Is their behaviour really portable across all kinds of file systems and OSs? For directory sizes that certainly isn't the case. For example a quick check on solaris revealed that there (at least on ufs and nfs) directory sizes are integral multiples of 512. And I can't tell quickly how Windows behaves in that regard ("dir" doesn't even show file size for directories). BTW: The size check was added to make sure that even the case where files are touched to a fixed modification time (done by some installers) is caught. My experience is that requiring explicit actions to invalidate or refresh caches after modifying the source data is error-prone. You can only convey that to users through documentation - which many won't find or read. There is a safe patch that ought to reduce the number of getTimestamp calls significantly (see below). But I don't think it helps as much as one could hope, as it probably won't reduce the number of unique stats significantly. The stats done for isMoreRecent are unnecessary most of the time, but the same files will be stat'ed by the binary cache ... Index: localstratumbase.cxx =================================================================== RCS file: /cvs/util/configmgr/source/localbe/localstratumbase.cxx,v retrieving revision 1.5 diff -u -r1.5 localstratumbase.cxx --- localstratumbase.cxx 8 Sep 2005 04:07:18 -0000 1.5 +++ localstratumbase.cxx 27 Oct 2005 14:37:19 -0000 @@ -149,6 +149,10 @@ sal_Bool LocalStratumBase::isMoreRecent(const rtl::OUString& aFileUrl, const rtl::OUString& aTimestamp) { + // if reference timestamp is empty, every layer needs to be read + if (aTimestamp.getLength() == 0) + return true; + rtl::OUString layerUrl ; rtl::OUString subLayerUrl ;
Joerg - the patch you suggest: + // if reference timestamp is empty, every layer needs to be read + if (aTimestamp.getLength() == 0) + return true; Surely is helpful only for the very-1st start which is so slow as to be painful anyway ? surely when it has been run once already we'll have a timestamp to compare and this will confer no advantage ?
No. This patch will reduce stat calls significantly. That code currently only ever gets called with empty timestamp. The timestamp parameter here would be used for caching remote data for offline use. That was used for some StarOffice7 feature, but never in OOo. Unfortunately the same files will be stat'ed for the binary cache a little later. Thus this patch won't reduce *unique* stats.
One additional remark: The isMoreRecent fix should completely eliminate the need to stat the .xcu files in ~/.ooo/user/registry/data. But that won't help all that much either, because these files will even be opened and read ....
As it seems that the proposed patches in this issue do not yet lead to a clean (robust, for all platforms) fix that significantly reduces startup cost, I postpone this to OOo 3.0.
...more appropriately OOo 2.x, as OOo 3.0 is only for issues that will go into 3.0 exactly.
Changed from PATH to ENHANCEMENT to indicate that the attached patch is a starting point for some more work that is still necessary.
I hope Michael Meeks patch considered to be integrated soon, after some changes necessary. it is not good if a patch which exist since 2005, need more and more works because of time, and code changes. This patch will reduce user complains of slow OOo. just my 2c
according to release status meeting -> target 3.x
Reset assigne to the default "issues@openoffice.apache.org".