61792 – Need to rework any code that iterates over chars

Bug 61792 - Need to rework any code that iterates over chars

Summary: Need to rework any code that iterates over chars

Status:	RESOLVED CLOSED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	POI Overall (show other bugs)
Version:	3.17-FINAL
Hardware:	PC Mac OS X 10.1

Importance:	P2 enhancement (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-11-20 22:48 UTC by PJ Fanning
Modified:	2021-10-18 19:57 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description PJ Fanning 2017-11-20 22:48:24 UTC

If we need to iterate over chars, we should use codepoints (ints) instead of char primitives. Unicode surrogates need 2 java chars to represent one Unicode codepoint.
DrawTextParagraph.java has an example where we iterate over the chars of a String.
See https://stackoverflow.com/questions/1527856/how-can-i-iterate-through-the-unicode-codepoints-of-a-java-string

Comment 1 Javen O'Neal 2017-11-20 23:28:05 UTC

Is there any way to add this to forbidden-apis-check to find the issues and make sure it stays fixed?

Comment 2 PJ Fanning 2017-11-20 23:32:11 UTC

We should forbid:
Character toLowerCase() and toUpperCase() 
String toLowerCase() and toUpperCase() 

We should only use String toLowerCase(Locale) and toUpperCase(Locale)

Comment 3 Javen O'Neal 2017-11-21 05:14:26 UTC

https://svn.apache.org/viewvc/poi/trunk/src/resources/devtools/forbidden-signatures.txt?view=log

Comment 4 Dominik Stadler 2017-12-26 10:35:08 UTC

This seems to be mostly fixed now, is there still anything missing?

Comment 5 PJ Fanning 2017-12-26 11:13:18 UTC

Dominik, there are still a lot of places where the POI code iterates over chars. I suspect that it is best not to proceed with refactoring most of this code though. The risks of introducing new bugs needs to be weighed up against the likelihood that the code in question needs to be able to process Unicode surrogates correctly.