If a remote page is encoded in other charsets than the default, some characters are changes. The line following line in streamtochararray: InputStreamReader input = new InputStreamReader(in); Could be changed to: InputStreamReader input = new InputStreamReader(in, <charset>); The parameter <charset> could be passed in the <scrape> tag.
Hi Ricardo, Could you please provide a test case for this bug (like 2 JSP pages, the one with the different charset and the other that "scrape" that one)? Regards, Felipe
Just try a page with some characters like: "informações" which become: "informações" "reunião" which become: "reunião" "reuniões" which become: "reuniões" "próximas" which become: "próximas". This bug is most visible when the code page of the "scrapper" machine is different from the code page of the "scraped" machine.
Created attachment 10939 [details] Zip file with a test case
I committed your suggestion - it should be available in the next nightly build.
Ricardo, Could you please try the new tag on the nightly build below: http://cvs.apache.org/builds/jakarta-taglibs/nightly/projects/scrape/jakarta-taglibs-scrape-20040324.zip Thanks, Felipe
Marking as fixed...