This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 123893 - One-pass lexing of a token and its embedded tokens
Summary: One-pass lexing of a token and its embedded tokens
Status: NEW
Alias: None
Product: editor
Classification: Unclassified
Component: Lexer (show other bugs)
Version: 6.x
Hardware: PC All
: P2 blocker (vote)
Assignee: Miloslav Metelka
Depends on:
Reported: 2007-12-12 17:22 UTC by Miloslav Metelka
Modified: 2010-09-23 07:58 UTC (History)
1 user (show)

See Also:
Exception Reporter:


Note You need to log in before you can comment on or make changes to this bug.
Description Miloslav Metelka 2007-12-12 17:22:25 UTC
Useful when a top-level language knows that there is a start of a concrete embedding but does not know an exact end of
it. For example there may be an "asm {" section terminated by "}" but the CLexer implementor does not want to inline the
asm section handling lexer into itself (nor it wants to scan the contents of the "asm { ... }" to find the matching '}'
since there may be extra comments possibly containing '}' chars etc. so it would duplicate the work of the AsmLexer
considerably). The best way would be direct switching to AsmLexer that would find the '}' by itself and returned the
handling back to the CLexer.
The solution could be to introduce

 TokenFactory.createBranchToken(TokenId id, String embeddingEndText)

that the outer lexer would call (the embeddingEndText == '}' for the asm case - would make it more generic). The present
LexerInput.readLength() would define the startSkipLength of the embedding and the embeddingEndText would define the
endSkipLength. There could be be LexerInput.getEmbeddingEndText() to allow construction of "dual" lexers. Of course
there are some more questions e.g. what about the incomplete embeddings when embeddingEndText is not reached etc. Also
there should possibly be some more infra because the stateless lexer could become state-based just because e.g. it would
need to care about '{' and '}' balance to properly identify the endEmbeddingText.