|Summary:||Scrape doesn't support pages encoded in other charsets (like ISO-8859-1)|
|Product:||Taglibs||Reporter:||Ricardo Caetano <ricardo.caetano>|
|Component:||Scrape Taglib||Assignee:||Tomcat Developers Mailing List <dev>|
|Attachments:||Zip file with a test case|
Description Ricardo Caetano 2003-11-11 16:55:50 UTC
If a remote page is encoded in other charsets than the default, some characters are changes. The line following line in streamtochararray: InputStreamReader input = new InputStreamReader(in); Could be changed to: InputStreamReader input = new InputStreamReader(in, <charset>); The parameter <charset> could be passed in the <scrape> tag.
Comment 1 Felipe Leme 2004-02-29 15:05:24 UTC
Hi Ricardo, Could you please provide a test case for this bug (like 2 JSP pages, the one with the different charset and the other that "scrape" that one)? Regards, Felipe
Comment 2 Ricardo Caetano 2004-03-01 15:36:43 UTC
Just try a page with some characters like: "informações" which become: "informaÃ§Ãµes" "reunião" which become: "reuniÃ£o" "reuniões" which become: "reuniÃµes" "próximas" which become: "prÃ³ximas". This bug is most visible when the code page of the "scrapper" machine is different from the code page of the "scraped" machine.
Comment 3 Felipe Leme 2004-03-24 04:16:37 UTC
Created attachment 10939 [details] Zip file with a test case
Comment 4 Felipe Leme 2004-03-24 04:19:14 UTC
I committed your suggestion - it should be available in the next nightly build.
Comment 5 Felipe Leme 2004-03-25 00:25:39 UTC
Ricardo, Could you please try the new tag on the nightly build below: http://cvs.apache.org/builds/jakarta-taglibs/nightly/projects/scrape/jakarta-taglibs-scrape-20040324.zip Thanks, Felipe
Comment 6 Felipe Leme 2004-03-29 15:38:17 UTC
Marking as fixed...