43484 – DocumentLine.Set.getLines grotesquely slow for large files

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 43484 - DocumentLine.Set.getLines grotesquely slow for large files

Summary: DocumentLine.Set.getLines grotesquely slow for large files

Status:	CLOSED FIXED

Alias:	None

Product:	platform
Classification:	Unclassified
Component:	Text (show other bugs)
Version:	4.x
Hardware:	All All

Importance:	P1 blocker (vote)
Assignee:	Jaroslav Tulach

URL:
Keywords:	API_REVIEW_FAST, PERFORMANCE, REGRESSION, T9Y

Duplicates (2):	42723 43546 (view as bug list)
Depends on:
Blocks:	42270
	Show dependency tree

Reported:	2004-05-18 23:55 UTC by Jesse Glick
Modified:	2008-12-22 20:22 UTC (History)
CC List:	12 users (show)

See Also:
Issue Type:	DEFECT
Exception Reporter:

Attachments
Representative thread dump entry for EQ while IDE is frozen (4.12 KB, text/plain) 2004-05-18 23:56 UTC, Jesse Glick	Details
A patch which appears to correct the problem; note that the important part is the patch to hashCode() and maybe equals(); rest is general perf-related cleanup (8.68 KB, patch) 2004-05-18 23:57 UTC, Jesse Glick	Details \| Diff
DocumentLine.hashcode based on PositionRef (868 bytes, patch) 2004-05-19 23:53 UTC, _ ttran	Details \| Diff
diff b/w 1.10 and 1.8 (3.25 KB, patch) 2004-05-20 00:42 UTC, _ ttran	Details \| Diff
Proposed fix: Better impl of LineSet.getLines().indexOf and also new method int LineSet.getOriginal (Line) (23.61 KB, patch) 2004-06-01 17:24 UTC, Jaroslav Tulach	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jesse Glick 2004-05-18 23:55:07 UTC

I feel like I'm dreaming because I can't imagine
how no one else could have noticed this. Am I the
only one?

In a current trunk (or refactoring) build, using
JDK 1.4 or 1.5, opening a large Java source - e.g.
JComponent.java - is so slow as to be ridiculous.
Takes *several minutes* during which the IDE is
frozen.

Thread dumps and OptimizeIt show that
o.n.m.debugger.projects.ContextProviderImpl.getCurrentLineNumber
is calling DocumentLine.Set.getLines - probably
just once, or at least just once in the ~15-second
snapshot I took in OI - and this is calling
DocumentLine.equals thousands of times. I don't
know how slow each call to DL.equals is.

Inspection of DocumentLine source code showed that
its hash code is based only on the hash code of
its CloneableEditorSupport, shared with all other
lines in the same document, yet its equals method
checks the line number. This means that the
WeakHashMap Line.Set.lines has all the lines in
*one bucket* and so retrieving any line (done in
L.S.rL) is linear in the number of lines. Since
each line is registered, I guess that means the
time is *quadratic* in the number of lines.

I can't see any recent changes in this package so
I have to assume that this method has always been
stupidly slow for large files. Why would it only
be noticeable now? Well, maybe no one was calling
it until the debugger started to, fairly recently
(May 3 I think)?

(BTW the debugger's usage of the Line.Set after it
gets it is highly suspect - it seems to be
iterating every line in a linear search looking
for a given one, which is ridiculous. Maybe this
block could be deleted. Certainly it could be made
O(lg n) rather than O(n).)

Fixing the hash code to incorporate the line
number, as equals does, speeds things up by a
couple orders of magnitude or so - down to a few
seconds for JComponent. (Still much too slow, but
probably for other reasons.)

However I am nervous about this, since I am not
sure if the line number could change during the
object lifetime, and what will happen if it does.
Whatever the solution, something has to be done
quickly.

I will attach a representative thread dump excerpt
showing the call sequence; and a patch which seems
to fix things.

BTW the patch also removes the long-obsolete
support for markError etc. which has not been used
in a long time (since Annotation was added) and
just wastes space (12 bytes per line in fields)
and involves adding extra listeners that will
never do anything but slow things down.

Comment 1 Jesse Glick 2004-05-18 23:56:02 UTC

Created attachment 14958 [details]
Representative thread dump entry for EQ while IDE is frozen

Comment 2 Jesse Glick 2004-05-18 23:57:19 UTC

Created attachment 14959 [details]
A patch which appears to correct the problem; note that the important part is the patch to hashCode() and maybe equals(); rest is general perf-related cleanup

Comment 3 _ ttran 2004-05-19 23:23:40 UTC

> However I am nervous about this, since I am not
> sure if the line number could change during the
> object lifetime,

unfortunately it can.  This is from the javadoc of
org.openide.text.Line class

/** Represents one line in a text document.
 * The line number may change
* when the text is modified, but the identity of the line is retained.
It is designed to allow line-dependent
* modules of the IDE (such as the compiler and debugger) to make use
of a line consistently even as the text is modified.

Comment 4 _ ttran 2004-05-19 23:24:47 UTC

*** Issue 43546 has been marked as a duplicate of this issue. ***

Comment 5 _ ttran 2004-05-19 23:26:35 UTC

see also the duplicate issue 43546 for some more detail.

Comment 6 _ ttran 2004-05-19 23:45:18 UTC

Oh mine, it's so slowwwww.  This started with this two commits from Hanz

2004-05-18 10:45:31  jjancura

	*
	src/org/netbeans/modules/debugger/projects/ContextProviderImpl.java
	(1.10): Some JSP debugger related bugfixing: - Java debugger should
	not try to save jsp breakpoints - Java debugger should not try to
	toggle breakpoints to JSP files

2004-05-17 22:15:15  jjancura

	*
	src/org/netbeans/modules/debugger/projects/ContextProviderImpl.java
	(1.9): 42270: Wrong breakpoint placement after copy,paste activity

Comment 7 _ ttran 2004-05-19 23:53:15 UTC

Created attachment 15015 [details]
DocumentLine.hashcode based on PositionRef

Comment 8 _ ttran 2004-05-20 00:04:57 UTC

with my patch (which achieves basically the same thing as Jesse's
patch from perf point of view) opening JComponent.java is not
grotesquely slow, but still very slow.

The refactoring team is under fire trying to improve the performance
in their build.  This bug will pollute all their numbers.

I am going to revert
debuggerjpda/ant/src/org/netbeans/modules/debugger/projects/ContextProviderImpl.java
from rev 1.10 (the current HEAD) to 1.8.

Hanz, I am sorry

In any case we have to fix DocumentLine.hashcode and equals.  It's
incredible how long this defect is there

1.23         (jglick   25-Apr-00):     public int hashCode () {
1.25         (pzavadsk 31-May-00):         return
pos.getCloneableEditorSupport ().hashCode ();
1.2          (Jaroslav 29-Jan-99):     }
1.2          (Jaroslav 29-Jan-99): 
1.23         (jglick   25-Apr-00):     public boolean equals (Object o) {
1.23         (jglick   25-Apr-00):         if (o instanceof
DocumentLine) {
1.23         (jglick   25-Apr-00):             DocumentLine dl =
(DocumentLine)o;
1.25         (pzavadsk 31-May-00):             if
(dl.pos.getCloneableEditorSupport () == pos.getCloneableEditorSupport
()) {
1.23         (jglick   25-Apr-00):                 return
dl.getLineNumber () == getLineNumber ();
1.23         (jglick   25-Apr-00):             }
1.23         (jglick   25-Apr-00):         }
1.23         (jglick   25-Apr-00):         return false;
1.23         (jglick   25-Apr-00):     }
1.23         (jglick   25-Apr-00): 

Yarda, please take care of this issue.  Should I bring my copy of
"Effective Java" to the office? :-)

Comment 9 Jesse Glick 2004-05-20 00:29:50 UTC

Trung I don't believe your patch is correct. There could be multiple
PositionRef's pointing to the same place AFAIK, in which case the hash
code is wrong. Making PR.hC be based on line number just defers the
problem. Also files change their PositionRef.Manager.Kind
representation when they are loaded (or unloaded) as I understand it.
(I'm just trying to wade through all the weird code here, not sure
about it.)

The simplest thing I can think of is to patch Line.Set to store lines
differently. Currently it uses a WeakHashMap<Line,WeakReference<Line>>
(effectively a WeakSet<Line> but able to return the original match).
The root of the problem seems to be that a hash map requires a hash
code and it is very hard to come up with a stable hash code for
something with a field that can change which can affect equality if
there is no guarantee that the field itself is unique for its value.
(The offset can move too.)

Suggest leaving the hash code as it is (i.e. insensitive to line
number) and ceasing to use a hash map. (Hope no one else uses Line's
as map keys too!) Keep all lines in an ArrayList-backed heap, sorted
by line number. (Cannot have two lines per document with the same
current line number so the sort order is safe.) Inserting or
retrieving a given line would then be O(lg n) which is reasonable.
Should also fix DL.S.getLines to use some backdoor communication
channel with L.S.rL to get a direct dump of the heap (inserting
missing lines as required) so as to avoid doing extra comparisons.

Or, similarly but perhaps more easily, you might be able to leave the
hash code constant (per document) and use a
TreeSet<WeakReference<Line> implements Comparator<Line>>. That would
mean the hashCode() method is not actually called. To insert a line
iff it is new, just call s.tailSet(proposedLine); if nonempty, get the
first element; if it .equals(proposedLine), return the existing one;
else s.add(proposedLine).

We need a *lot* of unit tests for this stupid package, I think.
Currently I can't find any.

Also - regardless of how poorly implemented the code in o.o.text is,
the debugger is probably wrong to be using this call to begin with.
(TBD whether this is so - there are no comments in ContextProviderImpl
to explain why it is doing what it is doing.) Whatever bugfix was
needed in the debugger (issue #42270, apparently?) is surely not worth
slowing down every file open in the IDE tremendously. Even if it is
really necessary to call getLineSet, calling getLines and doing a
linear search is very bad - forces every DocumentLine to be
initialized (would otherwise be lazy AFAIK), plus it's a linear search
which is O(n).

Comment 10 Jesse Glick 2004-05-20 00:33:14 UTC

Of course for "s.tailSet(proposedLine)" etc. I meant "s.tailSet(new
LineKey(proposedLine))", where LineKey (ex WR im C) is the element type.

Comment 11 Jesse Glick 2004-05-20 00:35:48 UTC

However using a TreeSet means you have to add the element type to
Utilities.activeReferenceQueue and clean up... and you need to also
deal with the difficulty of implementing compare when the original
line might now be gone. (If you get the comparator wrong the tree set
could be corrupted.) Hmm, maybe better to write the binary heap after
all so you can have more control. (Need to remove elements when the
line is GC'd of course.)

Comment 12 _ ttran 2004-05-20 00:38:30 UTC

[Jesse, you keep mid-air colliding with me.  I was chatting w/ Yarda
lat week, it's likely that the whole originalLineNumbers etc is one
big API flaw]

Hanz, I just reverted ContextProviderImpl.java from rev 1.10 back
to 1.8.  Diff will be attached.

Jesse: I didn't attempt to really know how text package works.  Just a
shot in the dark.  (Please watch the continous build in the next hour
or two, I am going to do a few commits.  Commit validation is passing
on my laptop, but who knows :-)

Comment 13 _ ttran 2004-05-20 00:42:54 UTC

Created attachment 15020 [details]
diff b/w 1.10 and 1.8

Comment 14 Jesse Glick 2004-05-20 00:44:44 UTC

*You* keep mid-air colliding with *me*! :-)

Re. commit validation passing - great. It passed for me with my
original patch, too, and I'm pretty sure that was wrong. Opening files
worked too, but who knows what might be broken? I don't think our
tests cover these awful classes in any depth.

Comment 15 Jaroslav Tulach 2004-05-20 14:25:30 UTC

I'll take care of this, write a test for the behaviour and make sure
that following code is fast: lineSet.getLines().indexOf (line). 

Btw. It is true that lineSet.getLines().toArray() was always slow and
nobody noticed it because nobody used it. I delayed the actual
implementation until it is needed and this delay was five or six
years. The time has come and I am ready to provide more effective impl
(at least of the indexOf method), which is imho simpler and more
correct than trying to introduce some new method or abadoning the
LineSet completely.

Comment 16 Petr Nejedly 2004-06-01 13:34:13 UTC

*** Issue 42723 has been marked as a duplicate of this issue. ***

Comment 17 Jaroslav Tulach 2004-06-01 17:24:07 UTC

Created attachment 15400 [details]
Proposed fix: Better impl of LineSet.getLines().indexOf and also new method int LineSet.getOriginal (Line)

Comment 18 Jaroslav Tulach 2004-06-01 17:26:43 UTC

Hanzi, please verify that the new method 
public int Line.Set.getOriginal (Line)
solves your problem.

Petr N. please check that my changes are acceptable, as you probably
know the package the most.

Reviews, please review.

Comment 19 Jesse Glick 2004-06-01 18:21:19 UTC

Would suggest a distinct name for the new method to avoid confusion.
Perhaps

public int Line.Set.getOriginalLineNumber(Line line);

Comment 20 Jaroslav Tulach 2004-06-07 14:04:39 UTC

Will be integrated tomorrow.

Comment 21 Jaroslav Tulach 2004-06-07 14:05:17 UTC

Name will be changed as Jesse suggests.

Comment 22 Jan Jancura 2004-06-07 14:54:21 UTC

Looks good for my purpose.
Thanks.

Comment 23 Jaroslav Tulach 2004-06-08 08:35:27 UTC

Checking in openide-spec-vers.properties;
/cvs/openide/openide-spec-vers.properties,v  <-- 
openide-spec-vers.properties
new revision: 1.148; previous revision: 1.147
done
Processing log script arguments...
More commits to come...
Checking in api/doc/changes/apichanges.xml;
/cvs/openide/api/doc/changes/apichanges.xml,v  <--  apichanges.xml
new revision: 1.206; previous revision: 1.205
done
Processing log script arguments...
More commits to come...
Checking in src/org/openide/text/DocumentLine.java;
/cvs/openide/src/org/openide/text/DocumentLine.java,v  <-- 
DocumentLine.java
new revision: 1.55; previous revision: 1.54
done
RCS file: /cvs/openide/src/org/openide/text/LazyLines.java,v
done
Checking in src/org/openide/text/LazyLines.java;
/cvs/openide/src/org/openide/text/LazyLines.java,v  <--  LazyLines.java
initial revision: 1.1
done
Checking in src/org/openide/text/Line.java;
/cvs/openide/src/org/openide/text/Line.java,v  <--  Line.java
new revision: 1.28; previous revision: 1.27
done
Checking in src/org/openide/text/LineListener.java;
/cvs/openide/src/org/openide/text/LineListener.java,v  <-- 
LineListener.java
new revision: 1.12; previous revision: 1.11
done
Checking in src/org/openide/text/LineStruct.java;
/cvs/openide/src/org/openide/text/LineStruct.java,v  <--  LineStruct.java
new revision: 1.13; previous revision: 1.12
done
Processing log script arguments...
More commits to come...
Checking in test/unit/src/org/openide/text/LineSetTest.java;
/cvs/openide/test/unit/src/org/openide/text/LineSetTest.java,v  <-- 
LineSetTest.java
new revision: 1.2; previous revision: 1.1
done

Comment 24 Marian Mirilovic 2004-08-10 09:13:31 UTC

verified