Issue 56783 - cold-start speedup - file touch redux ...
Summary: cold-start speedup - file touch redux ...
Status: ACCEPTED
Alias: None
Product: utilities
Classification: Unclassified
Component: code (show other issues)
Version: 680m133
Hardware: All All
: P3 Trivial with 2 votes (vote)
Target Milestone: OOo 3.x
Assignee: AOO issues mailing list
QA Contact: issues@tools
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2005-10-27 12:45 UTC by mmeeks
Modified: 2017-05-20 11:33 UTC (History)
5 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
patch (3.73 KB, patch)
2005-10-27 12:46 UTC, mmeeks
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description mmeeks 2005-10-27 12:45:25 UTC
So - the configmgr likes to access/stat all files in various directories
repeatedly; this patch reduces that and (controversially) uses the directory
mtime instead of the individual file mtimes. Not certain how that constraint
works on Win32 etc. but - should work well for Unix.

With an strace log: strace ... | grep '\.xcu'  | sed 's/^.*("//g' | sed
's/".*//' | sort | uniq | wc -l

I get:
before 208 hits,
after: 118 hits

Numbers not ~that useful since we go accessing a bunch of files that are not
there which should be quick anyway - anyhow this saves an estimated 90 seeks ==
900ms of cold-start time.

The various traces causing the grief:

#0  0x40fcf836 in opendir () from /lib/tls/libc.so.6
#1  0x408e3995 in osl_openDirectory () from ./libuno_sal.so.3
#2  0x455b9e7d in osl::Directory::open () at Reference.h:244
#3  0x455f3c8d in configmgr::localbe::LocalMultiStratum::listLayerIds
(this=0x80f6300, aComponent=@0xbfcf0884, aEntity=@0xbfcf079c)
    at
/nld/opt/suse-OpenOffice/src680-m133/configmgr/source/localbe/localmultistratum.cxx:115
#4  0x455e274b in
configmgr::backend::MultiStratumBackend::searchSupportingStrata () at
Reference.h:244
#5  0x455e2a7a in configmgr::backend::MultiStratumBackend::listLayers () at
Reference.h:244
#6  0x455e0ce6 in configmgr::backend::MultiStratumBackend::listOwnLayers () at
Reference.h:244
#7  0x455d0e27 in configmgr::backend::BackendAccess::getLayers () at Reference.h:244
#8  0x455d2a59 in configmgr::backend::BackendAccess::getNodeData () at
Reference.h:244
#9  0x455a0bca in configmgr::backend::CacheController::loadDirectly () at
Reference.h:244

More stats from here:

#0  0x408e742e in osl_getFileStatus () from ./libuno_sal.so.3
#1  0x455b9e5c in osl::DirectoryItem::getFileStatus () at Reference.h:244
#2  0x455f4d31 in configmgr::localbe::implNormalizeURL () from ./configmgr2.uno.so
#3  0x455f509e in configmgr::localbe::normalizeURL () from ./configmgr2.uno.so
#4  0x455f0a26 in configmgr::localbe::LocalSchemaSupplier::initialize () at
Reference.h:244
#5  0x4087aa64 in
cppu::OSingleFactoryHelper::createInstanceWithArgumentsAndContext () from
./libuno_cppuhelpergcc3.so.3


#0  0x408e742e in osl_getFileStatus () from ./libuno_sal.so.3
#1  0x455b9eac in osl::DirectoryItem::getFileStatus () at Reference.h:244
#2  0x455b9d11 in configmgr::FileHelper::getModifyStatus () at Reference.h:244
#3  0x455e921c in configmgr::localbe::BasicLocalFileLayer::getTimestamp () at
Reference.h:244
#4  0x455f15a7 in configmgr::localbe::LocalStratumBase::isMoreRecent () at
Reference.h:244
#5  0x455f18fb in configmgr::localbe::LocalStratumBase::getLayer () at
Reference.h:244
#6  0x455f22bc in configmgr::localbe::LocalSingleStratumBase::getLayer () at
Reference.h:244
#7  0x455e28c7 in
configmgr::backend::MultiStratumBackend::searchSupportingStrata () at
Reference.h:244
#8  0x455e2aca in configmgr::backend::MultiStratumBackend::listLayers () at
Reference.h:244
#9  0x455e0d36 in configmgr::backend::MultiStratumBackend::listOwnLayers () at
Reference.h:244
#10 0x455d0e77 in configmgr::backend::BackendAccess::getLayers () at Reference.h:244
#11 0x455d2aa9 in configmgr::backend::BackendAccess::getNodeData () at
Reference.h:244
#12 0x455a0c1a in configmgr::backend::CacheController::loadDirectly () at
Reference.h:244
#13 0x455a2863 in configmgr::backend::CacheController::loadComponent () at
Reference.h:244
#14 0x455a7406 in configmgr::TreeManager::requestSubtree () at Reference.h:244
#15 0x45634645 in configmgr::OProviderImpl::initSession () from ./configmgr2.uno.so
#16 0x456317ff in configmgr::OProvider::implConnect () from ./configmgr2.uno.so
#17 0x4563610d in configmgr::OConfigurationProvider::connect () from
./configmgr2.uno.so

HTH.
Comment 1 mmeeks 2005-10-27 12:46:32 UTC
Created attachment 30914 [details]
patch
Comment 2 mmeeks 2005-10-27 13:51:03 UTC
In terms of testing, touching/removing files in both ~/.ooo and system prefix
had the expected refreshing effect (of the whole directory I guess [a slight
down-side ;-]). And of course, changing user settings through the UI works
nicely too.
Comment 3 Stephan Bergmann 2005-10-27 14:56:55 UTC
.
Comment 4 Stephan Bergmann 2005-10-28 08:23:11 UTC
comments from jb:

> diff -u -p -u -r1.3 localfilehelper.cxx
> --- configmgr/source/localbe/localfilehelper.cxx    16 Feb 2005 16:46:51 -0000
   1.3
> +++ configmgr/source/localbe/localfilehelper.cxx    27 Oct 2005 11:21:18 -0000
> @@ -126,6 +128,10 @@ namespace configmgr          if (_sURL.getLength() == 0)
>              return false;
>  
> +        // This method has no right to be so under-performing to
> +        // achieve so, so little of any usefulness.
> +        return true;
> +
>          DirectoryItem aDirItem;

This is function normalizeURL(). AFAICT we call it only for directories where we
need to compare directory names. It is needed there (for example) when
non-case-sensitive filesystems are involved (need to ask the sal/osl crowd what
other reasons can cause a different file URL to be reported).

I believe we can eliminate some redundant stats here with additional bookkeeping
- but we can't get rid of that in general. We might also add a configmgrrc
setting to skip this, if it is known that no file systems where
osl_getFileStatus can produce a URL that is different from the input are
involved, but the default for deploying to unknown environments would have to
stay unchanged.

> diff -u -p -u -r1.5 localmultistratum.cxx
> --- configmgr/source/localbe/localmultistratum.cxx    8 Sep 2005 04:04:46
-0000    1.5
> +++ configmgr/source/localbe/localmultistratum.cxx    27 Oct 2005 11:21:18 -0000
> @@ -107,7 +107,7 @@ uno::Sequence< rtl::OUString > SAL_CALL      rtl::OUString
const aComponentUrl = aLayerUrl + componentToPath(aComponent);
>           using namespace osl;
> -    const sal_uInt32 k_STATUS_FIELDS =  FileStatusMask_Type |
FileStatusMask_FileName; +    const sal_uInt32 k_STATUS_FIELDS =
FileStatusMask_FileName;      Directory aComponentDirectory(aComponentUrl);
>      DirectoryItem aItem;
>      std::vector< rtl::OUString > aResult;
> @@ -132,13 +132,12 @@ uno::Sequence< rtl::OUString > SAL_CALL                
 OSL_TRACE("Reading Component Directory - Error (%u) getting status of directory
item.\n", unsigned(errcode));
>                  break;
>              }
> -
> -            OSL_ENSURE( aItemDescriptor.isValid(FileStatusMask_Type), "Could
not get type of directory item");
> -            if (aItemDescriptor.getFileType() != FileStatus::Regular)
> -                continue;
>                                   OSL_ENSURE(
aItemDescriptor.isValid(FileStatusMask_FileName), "Could not get Name of
component found");
>              OUString const aFileName = aItemDescriptor.getFileName();
> +
> +            // It is reasonable to assume a .xcu file is not a directory & =>
> +            // not stat each directory entry at considerable cost.
>              if
(!aFileName.endsWithIgnoreAsciiCaseAsciiL(RTL_CONSTASCII_STRINGPARAM(kLocalDataSuffix)))

Fair enough. The .xcu check should be sufficient. This has the added effect that
.xcu files may be symlinks - which also sounds reasonable.

> diff -u -p -u -r1.16 filehelper.cxx
> --- configmgr/source/misc/filehelper.cxx    8 Sep 2005 04:09:44 -0000    1.16
> +++ configmgr/source/misc/filehelper.cxx    27 Oct 2005 11:21:18 -0000
> @@ -143,9 +143,22 @@ namespace configmgr          static const TimeValue
k_NullTime = {0,0};       
>          sal_uInt64 aSize = 0;
>          rModifyTime = k_NullTime;
> +        rtl::OUString aURL;
>  
> -        DirectoryItem aItem;       
> -        if (osl::FileBase::E_None == DirectoryItem::get(_sURL, aItem))
> +        DirectoryItem aItem;
> +
> +#ifndef STAT_LOTS_OF_UNUSED_FILES
> +        // Statting every file takes way to long - we can use the directory
> +        // time-stamp & size instead, if we need to provoke an update
> +        // a simple cat > foo; rm foo will do that.
> +        // The dir size is (hopefully) non-0 and a good enough proxy value.
> +
> +        aURL = FileHelper::getParentDir(_sURL);
> +#else
> +        aURL = _sURL;
> +#endif
> +
> +        if (osl::FileBase::E_None == DirectoryItem::get(aURL, aItem))
>          {
>              FileStatus
aStatus(osl_FileStatus_Mask_ModifyTime|osl_FileStatus_Mask_Type|osl_FileStatus_Mask_FileSize);
          
>              if (osl::FileBase::E_None == aItem.getFileStatus(aStatus))

I don't think that is acceptable.

Firstly, it violates the semantics of that function. If anybody really wanted to
do this, the parent directory substitution should be in the caller (i.e. in the
getTimestamp() method in localfilelayer.cxx).

Secondly it does not work everywhere. The semantics of that 'timestamp' is that
it must change, if the file changes. You can't guarantee that. Without actively
provoking an update, changes might go completely unnoticed on most filesystems.

Generally I'm wary of directory timestamps. Is their behaviour really portable
across all kinds of file systems and OSs?

For directory sizes that certainly isn't the case. For example a quick check on
solaris revealed that there (at least on ufs and nfs) directory sizes are
integral multiples of 512. And I can't tell quickly how Windows behaves in that
regard ("dir" doesn't even show file size for directories).

BTW: The size check was added to make sure that even the case where files are
touched to a fixed modification time (done by some installers) is caught.

My experience is that requiring explicit actions to invalidate or refresh caches
after modifying the source data is error-prone. You can only convey that to
users through documentation - which many won't find or read.

There is a safe patch that ought to reduce the number of getTimestamp calls
significantly (see below). But I don't think it helps as much as one could hope,
as it probably won't reduce the number of unique stats significantly. The stats
done for isMoreRecent are unnecessary most of the time, but the same files will
be stat'ed by the binary cache ...

Index: localstratumbase.cxx
===================================================================
RCS file: /cvs/util/configmgr/source/localbe/localstratumbase.cxx,v
retrieving revision 1.5
diff -u -r1.5 localstratumbase.cxx
--- localstratumbase.cxx        8 Sep 2005 04:07:18 -0000       1.5
+++ localstratumbase.cxx        27 Oct 2005 14:37:19 -0000
@@ -149,6 +149,10 @@

 sal_Bool LocalStratumBase::isMoreRecent(const rtl::OUString& aFileUrl,
                                               const rtl::OUString& aTimestamp) {
+    // if reference timestamp is empty, every layer needs to be read
+    if (aTimestamp.getLength() == 0)
+       return true;
+
     rtl::OUString layerUrl ;
     rtl::OUString subLayerUrl ;
Comment 5 mmeeks 2005-10-28 11:13:08 UTC
Joerg - the patch you suggest:

+    // if reference timestamp is empty, every layer needs to be read
+    if (aTimestamp.getLength() == 0)
+       return true;

Surely is helpful only for the very-1st start which is so slow as to be painful
anyway ? surely when it has been run once already we'll have a timestamp to
compare and this will confer no advantage ?
Comment 6 joerg.barfurth 2005-10-28 12:18:58 UTC
No. This patch will reduce stat calls significantly. That code currently only
ever gets called with empty timestamp. The timestamp parameter here would be
used for caching remote data for offline use. That was used for some StarOffice7
feature, but never in OOo.

Unfortunately the same files will be stat'ed for the binary cache a little
later. Thus this patch won't reduce *unique* stats.
Comment 7 joerg.barfurth 2005-10-28 16:02:02 UTC
One additional remark: The isMoreRecent fix should completely eliminate the need
to stat the .xcu files in ~/.ooo/user/registry/data. But that won't help all
that much either, because these files will even be opened and read ....
Comment 8 Stephan Bergmann 2006-02-14 12:57:25 UTC
As it seems that the proposed patches in this issue do not yet lead to a clean
(robust, for all platforms) fix that significantly reduces startup cost, I
postpone this to OOo 3.0.
Comment 9 Stephan Bergmann 2006-06-13 14:08:45 UTC
...more appropriately OOo 2.x, as OOo 3.0 is only for issues that will go into
3.0 exactly.
Comment 10 Stephan Bergmann 2007-03-30 12:44:21 UTC
Changed from PATH to ENHANCEMENT to indicate that the attached patch is a
starting point for some more work that is still necessary.
Comment 11 utomo99 2007-06-10 10:26:59 UTC
I hope Michael Meeks patch considered to be integrated soon, after some 
changes necessary. 
it is not good if a patch which exist since 2005, need more and more works 
because of time, and code changes. 

This patch will reduce user complains of slow OOo. 

just my 2c
 
Comment 12 Mathias_Bauer 2007-12-04 14:47:45 UTC
according to release status meeting -> target 3.x
Comment 13 Marcus 2017-05-20 11:33:13 UTC
Reset assigne to the default "issues@openoffice.apache.org".