Dear POI users I got a doc document which contains uncommon greek mu and registered symbol and tried to use characterRun.getSymbolChar() method to identify these two symbols. I have noticed however, characterRun.getSymbolChar() always returns the same character and such that I could not find a way to notice the different symbols. I looked at the CharacterRun.java and tried to print out _props.getXchSym() and found that this infact prints two different values for the greek mu and regiestered symbol, i.e. '-3987' and '3870'. I really don't know if I am doing right thing in that I could use _props.getXchSym() directly instead of using characterRun.getSymbolChar() method which returns (char)_props.getXchSym(). To make it work, I added one method next to characterRun.getSymbolChar() in CharacterRun class that returns _props.getXchSym(). Would you please take a look at this and could see if this could be added into CharacterRun class? I enclose a word document which contains those two symbols and a snippet for the new method I made to the CharacterRun class for your reference. public int getSymbolCharacterAsitis() { if (isSymbol()) { return _props.getXchSym(); } else throw new IllegalStateException("Not a symbol CharacterRun"); } Many thanks in advance Teresa
Created attachment 35396 [details] greek_mu_and_registered_text
Thank you for opening this and sharing a triggering document. I'm trying to figure out how we're supposed to map from '-3987' to mu. Anyone have an idea?
There are some known issues with some unicode characters, they may be solved by using a newer version of XMLBeans, see bug 59268 for details. Can you please try quickly with the test-version of XMLBeans from http://mvnrepository.com/artifact/com.github.pjfanning/xmlbeans so we know if your problem is fixed by this as well or if it is a different issue.
Thank you, Dominik. This is in HWPF, not XWPF...no beans.
Ah, sorry, my bad. I did take a closer look now: It works for me if I use getSymbolCharacter(), see the new unit-test that I added via r1811355. Please let us know if this still does not work for you for some reason.
(In reply to Dominik Stadler from comment #5) > Ah, sorry, my bad. I did take a closer look now: It works for me if I use > getSymbolCharacter(), see the new unit-test that I added via r1811355. > > Please let us know if this still does not work for you for some reason. Thanks Dominik You're right that indeed getSymbolCharacter() works, my confusion was that I used System.out.println() method that getSymbolCharacter() prints out the same broken character for both symbols. Sorry for confusion