Issue 124065 - [Performance] Low performance opening attached .ods
Summary: [Performance] Low performance opening attached .ods
Status: CLOSED FIXED
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: 4.1.0-dev
Hardware: All All
: P2 Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-21 08:26 UTC by fanyuzhen
Modified: 2017-05-20 10:35 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample ODS file (82.47 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-01-21 08:26 UTC, fanyuzhen
no flags Details
Fix (3.22 KB, text/plain)
2014-01-21 08:41 UTC, wujinlong
no flags Details
Fix2 (3.76 KB, patch)
2014-01-24 07:47 UTC, wujinlong
no flags Details | Diff
New Fix (5.02 KB, patch)
2014-01-28 08:50 UTC, wujinlong
awf.aoo: review+
Details | Diff
Final Fix (5.11 KB, patch)
2014-02-10 02:33 UTC, wujinlong
wujinlong: review+
Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description fanyuzhen 2014-01-21 08:26:33 UTC
Created attachment 82329 [details]
Sample ODS file

AOO opens the attached file >1 minute time.

It is based on Symphony defect Sym3_5598.
Comment 1 wujinlong 2014-01-21 08:36:05 UTC
[ROOT CAUSE]
When there are too many Name Ranges in the ODF file, it takes a very long time to import the document.

By using Purify, we found most of the time is used by function ScRangeData::Clone(), which is called by function ScCollection::= while constructing a new ScRangeName object.

Function:	ScRangeData::Clone
Calls:	1,511,522
Function time:	140,541.25 (3.08% of Focus)
F+D time:	845,851.27 (18.51% of Focus)

In my test, there is 1004 name ranges. In funtion ScXMLImport::SetNamedRanges(), it will first add 1004 new Name Ranges, then it will loop the 1004 entries to update the content and grammer. You can see that 1004 Name Ranges will lead to 1.5 Million (roughly 1004*1004) calls of ScRangeData::Clone(), which is quite time consuming, thus resulting a slow importing of the document.

Function:	ScXMLImport::SetNamedRanges
Calls:	1

Descendant
ScNamedRangesObj::addNewByName	                calls:1004
ScNamedRangeObj::SetContentWithGrammar	calls:1004

1) create new name ranges
Function:	ScNamedRangesObj::addNewByName
Calls:	1,004

Descendant
ScNamedRangesObj::ImplAddNewByScopeAndName	calls: 1004


2) update the name ranges
Function:	ScNamedRangeObj::SetContentWithGrammar
Calls:	1,004

Descendant
ScNamedRangeObj::Modify_Impl	calls: 1004

[RESOLUTION]
To improve the performance,  we will not create a copy of the ScNameRange object, instead we will just use the existing ScNameRange to update.
Comment 2 wujinlong 2014-01-21 08:41:30 UTC
Created attachment 82330 [details]
Fix
Comment 3 Andre 2014-01-21 13:40:32 UTC
Can you explain why creating the clones is not necessary and why it is safe to share the underlying ScRangeName?
Comment 4 wujinlong 2014-01-22 08:46:03 UTC
(In reply to Andre from comment #3)
> Can you explain why creating the clones is not necessary and why it is safe
> to share the underlying ScRangeName?

Hi Andre,

1. Firstly, the clone is quite time consuming, so we should try to avoid it as possible as we can.

Takes 1000 name ranges for example, there will be 1000*1000 steps to process.

outer loop /* 1000 entries */
  clone (inner loop /* 1000 entries */)


virtual	ScDataObject*     Clone(ScDocument* pDocP) const
{ return new ScRangeName(*this, pDocP); }


ScRangeName::ScRangeName(const ScRangeName& rScRangeName, ScDocument* pDocument) :
				ScSortedCollection ( rScRangeName ),
				pDoc ( pDocument ),
				nSharedMaxIndex (rScRangeName.nSharedMaxIndex)
{
	for (USHORT i = 0; i < nCount; i++)
	{
		((ScRangeData*)At(i))->SetDocument(pDocument);
		((ScRangeData*)At(i))->SetIndex(((ScRangeData*)rScRangeName.At(i))->GetIndex());
	}
}


2. For this case, we have got all the name ranges, and it is one of the document's member variables. We can just manipulate it directly, and there is no need to clone a new ScRangeName object. All the operations done on the name ranges object is within the document object, and there is no other side effect. So I think it is safe to do that.

 ScRangeName* pNames = pDoc->GetRangeName();
Comment 5 Andre 2014-01-23 09:55:27 UTC
I totally agree that avoiding the cloning would be a good idea.
I am just trying to make sure that there are no side effects when the cloning is removed.

The method GetRangeName is defined in sc/inc/document.hxx as
    public:
    ...
    SC_DLLPUBLIC ScRangeName*	GetRangeName();
so it is not only public but also accessible from outside of the sc module.  That means that when the returned ScRangeName object is modified because it is not cloned anymore and local copies are modified, then that can cause subtle side effects in and outside of sc.

If that is just a hypothetical problem (because nobody calls GetRangeName() at the moment, or does not store a pointer to the returned ScRangenName object or does not expect it not to change) then it might be best to avoid future problems in this area by
- Documenting clearly in sc/in/document.hxx that the returned ScRangeName is volatile, ie can change at any time.
- Removing the SC_DLLPUBLIC specifier (if possible)
- Making it private (if possible)
Comment 6 wujinlong 2014-01-24 07:45:35 UTC
Hi Andre,

- Documenting clearly in sc/in/document.hxx that the returned ScRangeName is volatile, ie can change at any time.

I've added comments in document.hxx for this. And the patch is updated.

- Removing the SC_DLLPUBLIC specifier (if possible)
This might be possible. My concerns are there may be some existing extensions of AOO that possibly called GetRangeName(). If we disable the DLL export, they might be broken. So I vote to preserve this in the code.

- Making it private (if possible)
This seems not possible, because there are lots calls of GetRangeName() in other modules already.
Comment 7 wujinlong 2014-01-24 07:47:09 UTC
Created attachment 82381 [details]
Fix2

This is an update of previous patch, adding comments in document.hxx to note that the Name Range get from document is volatile.
Comment 8 Andre 2014-01-24 08:14:48 UTC
(In reply to wujinlong from comment #6)
> Hi Andre,
> 
> - Documenting clearly in sc/in/document.hxx that the returned ScRangeName is
> volatile, ie can change at any time.

Checking that the calling code can handle this volatility is part of the change, right?

> 
> I've added comments in document.hxx for this. And the patch is updated.
> 
> - Removing the SC_DLLPUBLIC specifier (if possible)
> This might be possible. My concerns are there may be some existing
> extensions of AOO that possibly called GetRangeName(). If we disable the DLL
> export, they might be broken. So I vote to preserve this in the code.
> 
> - Making it private (if possible)
> This seems not possible, because there are lots calls of GetRangeName() in
> other modules already.
Comment 9 wujinlong 2014-01-27 07:23:03 UTC
(In reply to Andre from comment #8)
> (In reply to wujinlong from comment #6)
> > Hi Andre,
> > 
> > - Documenting clearly in sc/in/document.hxx that the returned ScRangeName is
> > volatile, ie can change at any time.
> 
> Checking that the calling code can handle this volatility is part of the
> change, right?
> 

I don't see this should be part of this fix. Two reasons,

1. There is such volatility issue existing in the original code for a long time. The original code uses Clone() to make a copy of the ScRangeName object, and at last the cloned ScRangeName object is set back to the Document object.

The key point is there is performance issue in the code, and we are trying to fix that by not using Clone().

2. If we use something like lock to resolve the volatility issue, it might introduce new performance issues. And we need to do more testing to make sure of that.

So in summary, if you think we need to address this volatility issue more decently, I propose to do that in a separate task.
Comment 10 Andre 2014-01-27 08:16:38 UTC
(In reply to wujinlong from comment #9)
> (In reply to Andre from comment #8)
> > (In reply to wujinlong from comment #6)
> > > Hi Andre,
> > > 
> > > - Documenting clearly in sc/in/document.hxx that the returned ScRangeName is
> > > volatile, ie can change at any time.
> > 
> > Checking that the calling code can handle this volatility is part of the
> > change, right?
> > 
> 
> I don't see this should be part of this fix. Two reasons,

It should be if the fix changes existing behavior, and I am thinking that it does.

> 
> 1. There is such volatility issue existing in the original code for a long
> time. The original code uses Clone() to make a copy of the ScRangeName
> object, and at last the cloned ScRangeName object is set back to the
> Document object.

OK, I agree that that takes care of the volatility issue.  But if I am reading the code correctly, the cloned or modified ScRangeName is set via ScDocFunc::SetNewRangeNames().  There an undo object is created.  Without cloning the ScRangeName object, it appears that the undo object references the same ScRangeName twice.  This would make the undo operation useless.
Any idea how to deal with that?

> 
> The key point is there is performance issue in the code, and we are trying
> to fix that by not using Clone().

I want to prevent that we make trade a speed gain for a loss of functionality.

> 
> 2. If we use something like lock to resolve the volatility issue, it might
> introduce new performance issues. And we need to do more testing to make
> sure of that.
> 
> So in summary, if you think we need to address this volatility issue more
> decently, I propose to do that in a separate task.
Comment 11 wujinlong 2014-01-28 08:49:31 UTC
Hi Andre,

Your finding is right. My fix will cause problem with the undo operation.

I thought a new way to handle that. Since this problem only happens when opens an ODS file, I will not clone the ScRangeName object for importing the ODS file only. This is determined by ScDocument::IsImportingXML().

I've updated the patch. Please take a look at it. 

Thanks!
Comment 12 wujinlong 2014-01-28 08:50:35 UTC
Created attachment 82409 [details]
New Fix

Don't do clone for importing of ODS file.
Comment 13 Andre 2014-01-29 10:17:30 UTC
I don't like the introduction of another special case but I can't think of a better solution.  I would just ask you to turn the lines

    if ( !pDoc->IsImportingXML() )

into something like 

    const bool bSupportUndo ( ! pDoc->IsImportingXML());
    if (bSupportUndo)

just to make the reason for the special case explicit.
Comment 14 Andre 2014-01-29 10:19:14 UTC
Comment on attachment 82409 [details]
New Fix

Review granted.
Comment 15 wujinlong 2014-02-10 02:33:42 UTC
Created attachment 82555 [details]
Final Fix
Comment 16 wujinlong 2014-02-10 02:35:58 UTC
Andre,

Thanks for your kind review and comments. I agree this solution is not a decent one, but just have no other better solution.

I have modified the code according to your last comment, and updated the patch.

Thank you!
Comment 17 SVN Robot 2014-02-14 08:44:17 UTC
"steve_y" committed SVN revision 1568219 into trunk:
Bug 124065 - [Performance] Low performance opening attached .ods
Comment 18 Steve Yin 2014-02-14 08:59:12 UTC
About root cause and solution please see comment 1 by Jinlong Wu
Comment 19 liuping 2014-04-02 08:09:57 UTC
Verified on Windows 7 on AOO410ml4(Build:9750)  -  Rev. 1583418