Apache OpenOffice (AOO) Bugzilla – Issue 10373
TITLE in HTML pages and charset?
Last modified: 2013-08-07 14:38:26 UTC
Hi, the attached HTML page contains the TITLE with characters in ISO8859-2 charset. The Content-type (META tag) contains proper definition of the charset. When this page is imported to the OOo 643C, the title is dispalyed using ISO8859-1 font. When I move the TITLE tag down (below META tag), everything is OK and the title is using iso8859-2 font.
Created attachment 4174 [details] Sample HTML page with TITLE before META
Reassigned to ES
ES->pjanik: please zip your html file before attaching it. IssueZilla destroys html attachments. Thanx.
Created attachment 4314 [details] zipped page
Here it is - just try to compare this page with the same page with only meta above title... Watch for the title of the frame in the window manager.
You can now continue.
from DEV-> It is valid to use a character encoding from the tag on where is is specified. Otherwise, the source code would have to be read two times.
Closed
HTML specification does not say so. Where did you got the information that chosen encoding should be used from the line of specification on and not before?
Reopening to solve this issue properly.
From my point - META tags should be sent by server in HTTP 1.1 header, so user expects, that all information from META tags are known to the browser before processing the page at all. In special case, where the .html is not interpreted by http server, but by some other application, the application should process it in the same way as http server.
I agree with Dan and Pavel. However this is a can of worms. For example Mozilla has (had?) various problems handling encoding in meta tags if they aren't sent by the HTTP server. The easiest solution would probably be to try to find the META tag only in a short beginning part of a file for example 4K and cache this part for the later rereading.
OOo is *not* a browser. It procersses files from the top to the bottom. If If you you don't set the relevant information at the bottom, It won't be parsed.
closed
Yes, OOo processes HTML files and as such should follow the HTML specification. It does not. Please do not close this without real reason to do so. If you do not want to implement "double scan", just say so and set target to OOo later.
See the results of validator: http://validator.w3.org/check?uri=http%3A%2F%2Fwww.janik.cz%2Ftmp%2Fq.html I do not tell that OOo is a browser, but if it tries to read files in some format, it should follow the format specification.
I don't close issues without good reasons... Now if you want this as an enhancement (for I can't decide myself what we want or not), no problem. Reassigned to BH
reassign
Still happens in SRC680_m226. See attached screenshot.
Created attachment 47784 [details] See the title bar and compare - only by moving meta below/above title