Bug 24611 - Scrape doesn't support pages encoded in other charsets (like ISO-8859-1)
Summary: Scrape doesn't support pages encoded in other charsets (like ISO-8859-1)
Alias: None
Product: Taglibs
Classification: Unclassified
Component: Scrape Taglib (show other bugs)
Version: 1.1
Hardware: Other other
: P3 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
Depends on:
Reported: 2003-11-11 16:55 UTC by Ricardo Caetano
Modified: 2004-11-16 19:05 UTC (History)
0 users

Zip file with a test case (587 bytes, application/octet-stream)
2004-03-24 04:16 UTC, Felipe Leme

Note You need to log in before you can comment on or make changes to this bug.
Description Ricardo Caetano 2003-11-11 16:55:50 UTC
If a remote page is encoded in other charsets than the default, some 
characters are changes.

The line following line in streamtochararray:
InputStreamReader input = new InputStreamReader(in); 
Could be changed to:
InputStreamReader input = new InputStreamReader(in, <charset>); 

The parameter <charset> could be passed in the <scrape> tag.
Comment 1 Felipe Leme 2004-02-29 15:05:24 UTC
Hi Ricardo,

Could you please provide a test case for this bug (like 2 JSP pages, the one
with the different charset and the other that "scrape" that one)? 


Comment 2 Ricardo Caetano 2004-03-01 15:36:43 UTC
Just try a page with some characters like: 

"informações" which become: "informações"
"reunião" which become: "reunião"
"reuniões" which become: "reuniões"
"próximas" which become: "próximas".

This bug is most visible when the code page of the "scrapper" machine is 
different from the code page of the "scraped" machine.
Comment 3 Felipe Leme 2004-03-24 04:16:37 UTC
Created attachment 10939 [details]
Zip file with a test case
Comment 4 Felipe Leme 2004-03-24 04:19:14 UTC
I committed your suggestion - it should be available in the next nightly build.
Comment 5 Felipe Leme 2004-03-25 00:25:39 UTC

Could you please try the new tag on the nightly build below:



Comment 6 Felipe Leme 2004-03-29 15:38:17 UTC
Marking as fixed...