This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 86473

Summary:	Create language embedding through Lexer API
Product:	editor	Reporter:	Miloslav Metelka <mmetelka>
Component:	Lexer	Assignee:	apireviews <apireviews>
Status:	RESOLVED FIXED
Severity:	blocker	CC:	mstevens, pjiricka
Priority:	P2	Keywords:	API_REVIEW_FAST
Version:	6.x
Hardware:	All
OS:	All
Issue Type:	ENHANCEMENT	Exception Reporter:
Bug Depends on:
Bug Blocks:	89324
Attachments:	Diff of the change List of committed files

Description Miloslav Metelka 2006-10-04 15:16:19 UTC

There are usecases that require the dynamic language embedding creation for the
API clients:
1) A java string literal may contain a e.g. a SQL statement text. As there is a
default embedding that recognizes escaped characters e.g. "\n" then this allows
for more than one embedding presence for a single token (the default embedding
should be available as well as there may be clients relying on its presence).

2) A <script> html tag does not need to specify the type of the scripting
language and the default language may be overriden by

<META http-equiv="Content-Script-Type" content="type">

Although the lexer could in theory recognize such declaration and store the
content-type in the state object of each token that follows the declaration it
is non-practical as recognizing the declaration above is a task that should be
reserved for a html parser.

3) A script that follows the <script> tag may be written as comment surrounded by
 <!--
// -->
and there may be no default embedding for the comment so the parser must request
explicit embedding creation for the comment token.


Requirements:
1) API method must be added to TokenSequence for custom embedding creation.

2) Notification model must be extended so that clients (e.g. syntax coloring)
may notice creation of new embedding.

3) Clients must also listen for the case when a new token eligible for custom
embedding gets created. Also if the token with custom embedding becomes damaged
by user's typing then the custom embedding will be lost so the clients must
recreate it.

4) If more than one embedding exist for a single token the one of the embeddings
must be used for syntax coloring purposes. As there is not yet a usecase where
there would be more than one custom embeddings the solution can be that the
syntax coloring will use custom embedding if one exists otherwise it will use
default embedding.

Comment 1 Miloslav Metelka 2006-11-28 14:41:47 UTC

The following attached diff contains implementation of this request. There are
the following changes:
1. Extracted TokenHierarchyEvent.Type inner enum into TokenHierarchyEventType
top-level enum for better readability.

2. Adding TokenSequence.createEmbedding() method was added for creation of a
custom embedding. New TokenHierarchyEventType.EMBEDDING value fired after the
custom embedding creation.

3. Affected offset area information affectedStartOffset() and
affectedEndOffset() extracted from TokenChange to TokenHierarchyEvent because
it's more useful and clear for the clients of these methods - e.g. the syntax
coloring will just query these offsets without digging into the (possibly
embedded) token change(s).

4. Removed tokenComplete parameter from LanguageHierarchy.embedding() because
it's currently unused and the token incompletness will be handled in a different
way in the future (see also issue 87014).

5. Swapped order of <code>token</code> and languagePath parameters in
LanguageProvider to be in sync with LanguageHierarchy.embedding().

6. LanguageEmbedding is now a final class (instead of abstract class) with
private constructor and static create() method. That allows better control over
the evolution of the class and it also allows to cache the created embeddings to
save memory.

7. LanguageEmbedding is now generified with the LanguageEmbedding<T extends
TokenId> which is a generification of the language which it contains.

8. TokenHierarchy.languagePaths() set contains all language paths used in the
particular token hierarchy. TokenHierarchyEventType.LANGUAGE_PATHS is          
 fired after change of the language paths set.

Comment 2 Miloslav Metelka 2006-11-28 14:42:36 UTC

Created attachment 36320 [details]
Diff of the change

Comment 3 Miloslav Metelka 2006-11-28 15:21:08 UTC

Marking for fasttrack review.

Comment 4 Jesse Glick 2006-11-28 22:34:18 UTC

BTW "diff -u" is generally more readable than "diff -c", especially in an
enormous patch like this one. Easiest to append "diff -u" to your ~/.cvsrc file.

Comment 5 Miloslav Metelka 2006-12-04 15:51:19 UTC

Created attachment 36454 [details]
List of committed files

Comment 6 Miloslav Metelka 2006-12-04 15:52:09 UTC

Committed into trunk.

Comment 7 Jesse Glick 2006-12-04 17:05:25 UTC

Uh, did you mean M6?

Comment 8 Miloslav Metelka 2006-12-04 19:39:53 UTC

Sorry, I've meant M6. Thanks, Jesse.