|Summary:||Need to rework any code that iterates over chars|
|Product:||POI||Reporter:||PJ Fanning <fanningpj>|
|Component:||POI Overall||Assignee:||POI Developers List <dev>|
|OS:||Mac OS X 10.1|
Description PJ Fanning 2017-11-20 22:48:24 UTC
If we need to iterate over chars, we should use codepoints (ints) instead of char primitives. Unicode surrogates need 2 java chars to represent one Unicode codepoint. DrawTextParagraph.java has an example where we iterate over the chars of a String. See https://stackoverflow.com/questions/1527856/how-can-i-iterate-through-the-unicode-codepoints-of-a-java-string
Comment 1 Javen O'Neal 2017-11-20 23:28:05 UTC
Is there any way to add this to forbidden-apis-check to find the issues and make sure it stays fixed?
Comment 2 PJ Fanning 2017-11-20 23:32:11 UTC
We should forbid: Character toLowerCase() and toUpperCase() String toLowerCase() and toUpperCase() We should only use String toLowerCase(Locale) and toUpperCase(Locale)
Comment 3 Javen O'Neal 2017-11-21 05:14:26 UTC
Comment 4 Dominik Stadler 2017-12-26 10:35:08 UTC
This seems to be mostly fixed now, is there still anything missing?
Comment 5 PJ Fanning 2017-12-26 11:13:18 UTC
Dominik, there are still a lot of places where the POI code iterates over chars. I suspect that it is best not to proceed with refactoring most of this code though. The risks of introducing new bugs needs to be weighed up against the likelihood that the code in question needs to be able to process Unicode surrogates correctly.