118892 – Allow Schlieman lexer to continuously lex embedded language over more tokens of its parent language

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 118892 - Allow Schlieman lexer to continuously lex embedded language over more tokens of its parent language

Summary: Allow Schlieman lexer to continuously lex embedded language over more tokens...

Status:	RESOLVED DUPLICATE of bug 117450

Alias:	None

Product:	obsolete
Classification:	Unclassified
Component:	languages (show other bugs)
Version:	6.x
Hardware:	All All

Importance:	P2 blocker (vote)
Assignee:	Jan Jancura

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-10-15 14:24 UTC by Marek Fukala
Modified:	2007-10-16 16:01 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marek Fukala 2007-10-15 14:24:39 UTC

Currently following example is incorrectly lexed by SLexer for the javascript language.

<body onload="loadPage('messages=true&restart=true', 8000);">

The problem is the entity reference inside the attribute value which divides the value to three tokens:

loadPage('messages=true   [VALUE]
&restart                  [CHARACTER]
=true', 8000);            [VALUE]

This way of tokens division is very hard to change on the html lexer level.

The SLexer however lexes these three tokens separatedly (at least the lexical tokens are different then if I put the
attribute value into a new clean javascript file. Due to this problem the text is marked as errorneous thought it isn't.

The same problem arises for <style> tag content when there is an entity reference. This problem has been workarounded
for <script> tag in HTMLLexer by introducting a new special states for scripts, but this is IMO not the right solution.
The correct solution is to lex over this division as it is one token. Lexer infrastructure supports this and it works
perfectly in handcoded languages.

Comment 1 Jan Jancura 2007-10-15 15:54:53 UTC

Schliemann engine has never supported such usecase.
Moreover, described functionality looks like bug in HTML lexer. &restart does not represent character, as far as I know
HTML.

Comment 2 Marek Fukala 2007-10-15 16:27:34 UTC

We have never supported embedded javascript in html and jsp, but we said we will in 6.0. I understand that this issue is
more enhancement than bug, but Schlieman has been choosen as the basis of the new features so from this point of view
its lack of the functionality prevents us to complete the task. Moreover I belive the change should be minimal, at least
for this case. If not, please elaborate a little.

I understand you do not want to have more P2s, but I need a way how to track it. It blocks some other P2s so that is why
I made this as P2.

As for your comment - just replace &restart by &nbsp; please.

Comment 3 Jan Jancura 2007-10-15 17:08:39 UTC

I have to repeat yourself:
1) I think that its probably bug in HTML lexer. Attribute value should be represented by one token. Second possibility
is using some preprocessor for special characters. But I do not know how to implement it - its probably HTML module
designer task...
2) Its definitely not a bug. This functionality has never been supported or planned.
3) I do not know how to implement this functionality correctly. If you have some idea how to do it, feel free to submit
a patch.

So, I am not sure how to resolve this task now. I do not want to play some ping-ponk...
Sooo - INVALID, and I will discuss it offline with Marek.

Comment 4 Marek Fukala 2007-10-15 17:25:53 UTC

I am quite good in ping-pong :-). 

Just a short comment to #1 - this is not about the particular case, it is just an ilustration of the problem. The issue
subject is selfexplanatory I belive. 

We have already discussed the same when you complained about the content of <script> tag not being just one token. As I
already written, I workarounded it for the script tag by doing some not nice changes in the lexer, but I do not want to
do the same for all other cases if it is not necessary.

So, I admit, this is "just" a lack of functionality I need. If it is not possible or very hard to fix on your side I'll
try to workaround it somehow, but please, do not close issues because with "this is your bug" justification. 

We will discuss this issue tomorrow offline, but I belive you will find some effective solution as you always done so
far when resolving my silly requests. Thanks.

Comment 5 Jan Jancura 2007-10-16 13:09:43 UTC


*** This issue has been marked as a duplicate of 117450 ***

Comment 6 Marek Fukala 2007-10-16 16:01:04 UTC

I agree with duplicating this issue to the Mila's task. The current problem is that the CSS lexer always get EOF at the
end of the the higher level html token so it cannot properly lex since it requires lookahead overlaping the end of the
section.