39446 – Very slow editing of comments in large XML documents

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 39446 - Very slow editing of comments in large XML documents

Summary: Very slow editing of comments in large XML documents

Status:	VERIFIED FIXED

Alias:	None

Product:	xml
Classification:	Unclassified
Component:	Code (show other bugs)
Version:	3.x
Hardware:	PC Linux

Importance:	P2 blocker (vote)
Assignee:	issues@xml

URL:
Keywords:	PERFORMANCE

Depends on:
Blocks:

Reported:	2004-01-31 17:19 UTC by Jesse Glick
Modified:	2005-07-15 12:21 UTC (History)
CC List:	9 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Thread dump (7.71 KB, text/plain) 2004-01-31 17:20 UTC, Jesse Glick	Details
The patch diff (830 bytes, text/plain) 2004-09-20 10:59 UTC, Marek Fukala	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jesse Glick 2004-01-31 17:19:50 UTC

I opened

  www/www/updates/alpha/dev_1.6_.xml

as text in the editor in a dev build and right
before the block:

<module codenamebase="org.netbeans.core.execution"
       
distribution="http://www.netbeans.org/download/nbms/alpha/dev/core-execution_fr.nbm"
        license="french_l10n-nbm-license.txt"
        downloadsize="3492"
>
  <l10n 
        langcode="fr"
        module_major_version="1"
        module_spec_version="1.1"
  />
</module>

I began typing a line:

<!-- XXX: -->

It was very slow; typing each character took a
couple of seconds. I took a thread dump, attaching
it. Also pasting an XML element (a few lines long)
 took several seconds. But editing of e.g.
attributes in existing elements was normal speed.

Comment 1 Jesse Glick 2004-01-31 17:20:36 UTC

Created attachment 13174 [details]
Thread dump

Comment 2 Miloslav Metelka 2004-02-04 15:50:44 UTC

Not sure why this happens - the editor fixes the lexer states.
Anyway we should fix this into 3.6.

Comment 3 Jesse Glick 2004-02-04 16:03:29 UTC

Are you able to reproduce it?

Comment 4 Antonin Nebuzelsky 2004-02-23 14:30:27 UTC

Upgrading to P2 to keep track of this as "want-to-fix-to-nb36"

Comment 5 Miloslav Metelka 2004-02-25 12:43:33 UTC

I know now what's the bottleneck and I'm going to fix it.
The problem is that during updating of the lexer state infos the text
from the last updated line till the begining of the next line is
retrieved by doc.getText() to be scanned by the Syntax lexer class.
This happens for line where the modification happened and every line
that follows it until the lexical states match (which is at the end of
the comment token). In the long comment many lines has to be updated
which cuases many doc.getText() operations. If the text being
retrieved spans the gap in the document's character buffer (which it
does because the gap is moved to the modification point) the
characters must be copied into the target segment which leads to the
slowness.
This problem can be reproduced for long java comments as well and it
exists in NB3.5 too.
I will fix it by getting the text not only till the end of the next
line but doubling the previous size of the text retrieved. This will
fix the problem as there will be at most log2(doc.getLength())
doc.getText() operations.

Comment 6 Miloslav Metelka 2004-03-03 22:51:11 UTC

I have implemented the fix that I have described. Even with comment
about 5000 lines long the typing response is about 500ms on my machine
while before the fix it was more than 5s.

Fixed in trunk:

Checking in libsrc/org/netbeans/editor/LineRootElement.java;
/cvs/editor/libsrc/org/netbeans/editor/LineRootElement.java,v  <-- 
LineRootElement.java
new revision: 1.9; previous revision: 1.8
done

Mato, please approve the patch.
Petre F. please test and verify the fix.
Thanks.

Comment 7 Martin Roskanin 2004-03-04 10:02:31 UTC

I approve the fix.

Adding visual diff also for better tracking:
http://editor.netbeans.org/source/browse/editor/libsrc/org/netbeans/editor/LineRootElement.java.diff?r1=1.8&r2=1.9

Comment 8 Miloslav Metelka 2004-03-05 13:51:04 UTC

We've checked the situation again with Petr F. and although the typing
has an acceptable performance now there is still a problem in two cases:

1) If an opening of the comment "<!--" is typed withing the time when
the line spans are recomputed (starts ~4 secs after opening of the
file and is usually finished withing 0-3 secs but may be slower on
slower machines) then it can take even more than a minute as the
recomputation involves fetching of the tokens for each lines which is
slow in this case because the single comment token that gets created
is ~550KB long (till the end of the file).

2) When the code completion gets invoked at the begining of the
comments. Again the completion is fetching the tokens leading to
fetching of the big comment token.

It's clear that we cannot close this issue but there is a workaround
to type it like "<-- blahblah -->" and then fill in the "!". Or paste
an initial "<!-- -->" from the clipboard and then edit the contents of
the comment.
Another hypotetical workaround is "Write more comments" :) If there
would be another comment then the Syntax lexer would stop the opened
comment at the closing "-->" of the next comment.

The situation will improve dramatically with the lexer module
introduction because besides the fact that it remembers the created
tokens (so no recreating of the tokens is necessary) it also has an
optimization that allows to define "important" characters for the
particular token ('<','>','!','-' in case of the xml comment) together
with "guarded boundaries" at the begining and end of the token (4 and
3 in case of xml comment). If the typed character is "unimportant" and
outside of the guarded boundaries then the token is considered
unchanged (from the lexical point of view) and the language lexer is
even not invoked at all.

After I integrate the current improvement I will downgrade to P3 if
there are no objections or otherwise we would need to wave this.

Comment 9 _ ttran 2004-03-05 15:52:22 UTC

> After I integrate the current improvement I will downgrade to P3 if
> there are no objections or otherwise we would need to wave this.

I would rather leave it as P2, perhaps waive it with a commitment from
the team to really fix it in promo-D (using or not using lexer api).
this is a serious performance killer

Comment 10 Miloslav Metelka 2004-03-05 17:20:29 UTC

Integrated into release36:
Checking in libsrc/org/netbeans/editor/LineRootElement.java;
/cvs/editor/libsrc/org/netbeans/editor/LineRootElement.java,v  <-- 
LineRootElement.java
new revision: 1.8.10.1; previous revision: 1.8

Visual diff:
http://www.netbeans.org/source/browse/editor/libsrc/org/netbeans/editor/LineRootElement.java.diff?r1=1.8&r2=1.8.10.1

Comment 11 psuk 2004-03-10 17:03:18 UTC

Requesting waiver for 3.6

Justification
---------------------
The response time was improved already (from 5s to 500ms). There are
still cases (see IZ 39446, comment from Mila 2004-03-05 05:51 PST),
when the performance is not optimal and we currently don't have a
short term solution for that.

User Impact
---------------------
Medium. It's a scalability problem, dependent on the size of the file
and on the position of user comments in the file.


Workarounds
---------------------
1. User can copy/paste whole comment statement <!-- --> from clipboard
2. User can write <-- --> and add ! later

Long term solution
------------------------------------------
It should be solved by lexer module or additional optimization of
existing code in the editor that handles tokens.

Comment 12 psuk 2004-03-12 13:28:43 UTC

Waiver Approved

Comment 13 Miloslav Metelka 2004-09-16 10:51:35 UTC

We were unable to address this issue in the promoD timeframe therefore
I would like to ask for a waiver for promoD.
 We should be able to find a fix for promoE at least the one that
would autoinsert the "-->" together with the just typed "-" after
previously typed "<!-". The o.n.editor.Abbrev class could be extended
and used for this purpose after some tweaking.

Comment 14 Miloslav Metelka 2004-09-16 13:19:26 UTC

I have tested the situation again with recent 4.0 dev build and there
are no longer any hangs e.g. the completion runs in a separate thread.
So the only problem seems to be decreased responsiveness - each
keystroke takes roughly one second on my machine (on older machine it
can be few seconds).
Added xml team on cc.

Comment 15 Marek Fukala 2004-09-20 10:58:38 UTC

I have created a possible fix for this issue. It's a workaround when
the XML comment is divided into separate tokens - one token for each
line. I have tested it - seems to be safe and the performance is
great. See attached diff.

Comment 16 Marek Fukala 2004-09-20 10:59:26 UTC

Created attachment 17749 [details]
The patch diff

Comment 17 Miloslav Metelka 2004-09-20 11:48:17 UTC

Marku, thanks for the patch but as I've already said previously I'm
not a fan of splitting of the tokens into pieces just because of
workarounding of this problem. IMO the structure of the token should
be retained. But as your team now maintains the xml module it's
generally up to you whether you decide to integrate this or not. BTW
please make sure that any utilities relying on the structure of the
comment token should be updated as well if necessary (the
BLOCK_COMMENT token id is used in XMLSyntaxSupport).

Comment 18 Miloslav Metelka 2004-09-20 14:21:34 UTC

BTW reagrding token splitting we have discussed this with Honza Lahoda
(added to cc if he wants to speak up) when this issue was discovered
at the 3.6 time. I'm trying to recollect the arguments that I had
against the splitting but I'm not sure whether I recall everything.
 Generally there can be tools relying on the comment tokens structure.
For example we can write a reformatting tool that wants to e.g.
reformat the comment to have 80 columns per line. For such a tool it
is better to have a whole token available than pieces because the tool
may need to know whether the line being reformatted is inside the
comment token or at its begining. Checking for "<!--" might be a
solution for xml where IIRC it's illegal to have double-dash and
continue the comment but e.g. not for java where you can legally have

/* block-token
/* still the same token
*/ // end-of-block-comment

so checking for "/*" at the begining of the split-comment-token does
not guarantee that it's a real begining of the
multi-line-block-comment-token.

There could be a solution to have separate token ids
BLOCK_COMMENT_START
BLOCK_COMMENT_LINE
BLOCK_COMMENT_END
besides the BLOCK_COMMENT token id that would be used for single-line
cases but having this for all multi-line-capable token ids is IMHO
annoying from the usage point of view.

Also regarding the future the lexer module will support language
embedding i.e. a token (e.g. a javadoc comment token) will support
splitting into tokens of another language (e.g. javadoc language). It
is easier and more natural to do the language embedding of the whole
javadoc token rather than its pieces.

So from my point of view I would rather fix it in other way than token
splitting. On the other hand I'm OK with such fix for a single
promotion in the xml module that we maintain.

Comment 19 Marek Fukala 2004-09-20 14:37:05 UTC

>So from my point of view I would rather fix it in other way than token
>splitting. On the other hand I'm OK with such fix for a single
>promotion in the xml module that we maintain.
This is exactly how it is meant - just a hotfix for one release or up
to it is really and clearly fixed on the editor side.

Currently I am not aware of any tools relying on the 'big XML token'
so I hope this is not a big issue. Moreover if there are any I belive
it is less pain to update them than bother users with such a poor
performance.

Comment 20 Jan Lahoda 2004-09-20 15:29:11 UTC

Hi,
   (I think I should comment :-)). I think that the splitting fix is
desirable and quite OK. If Marek has a working patch and noone knows
about anybody that has a tool depending on the structure of tokens, I
see not reason why not to apply it. Maybe asking on nbdev would be
good only to make sure noone depends on the "whole comment" token.

On the other hand, I am not convinced that the problem of opening
comment in the beggining of a 550KB file will be completely solved by
the lexer (IMO it will still need to read each character of the file,
so it will take linear time with respect of the length of the file).

Comment 21 Miloslav Metelka 2004-09-20 16:36:01 UTC

As I've already mentioned I'm OK with the fix.
To make as little confusion as possible I have filed a separate issue
to track the editor's infrastructure problem with handling of typing
within large comment tokens. It's issue 49296.

> On the other hand, I am not convinced that the problem of opening
> comment in the beggining of a 550KB file will be completely solved by
> the lexer (IMO it will still need to read each character of the file,
> so it will take linear time with respect of the length of the file).

I'm confident that it *will* fix the problem - I mean there will be no
several seconds delays between each keystroke. It should basically be
fixed in two ways:
1) There will be token validators that will enable to validate the
token without fully rescanning it.
2) Even if the token will be relexed it will only be done once per
modification which should generally take just a fraction of second
(moreover there will be no character buffers copying like it's now).

I have reassigned the issue to xml - it will be fixed there.

Comment 22 Marek Fukala 2004-09-21 10:41:15 UTC

fixed as proposed formerly - XML tokens are created for each line of
the commnet.  This fix also causes a SyntaxtElement to be created for
each line of the comment - see XMLSyntaxSupport.createElement:277

This fix should be removed after editor solve the core of the problem
in their code.

Checking in XMLDefaultSyntax.java;
/cvs/xml/text-edit/src/org/netbeans/modules/xml/text/syntax/XMLDefaultSyntax.java,v
 <--  XMLDefaultSyntax.java
new revision: 1.12; previous revision: 1.11
done

Comment 23 Miloslav Metelka 2004-09-21 15:45:15 UTC

Removed 4.0_WAIVER_REQUEST keyword.

Comment 24 Jiri Kovalsky 2005-07-15 12:21:02 UTC

Verified in development build #200507131800 of NetBeans 4.2.