Bug 61586

Summary: (HWPF) characterRun.getSymbolChar() returns the same char for different symbols
Product: POI Reporter: teresa.kim
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED WORKSFORME    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: All   
Attachments: greek_mu_and_registered_text

Description teresa.kim 2017-10-05 12:13:05 UTC
Dear POI users

I got a doc document which contains uncommon greek mu and registered symbol and tried to use characterRun.getSymbolChar() method to identify these two symbols.
I have noticed however, characterRun.getSymbolChar() always returns the same character and such that I could not find a way to notice the different symbols.

I looked at the CharacterRun.java and tried to print out
_props.getXchSym() and found that this infact prints two different values for the greek mu and regiestered symbol, i.e. '-3987' and '3870'.

I really don't know if I am doing right thing in that I could use
_props.getXchSym() directly instead of using
characterRun.getSymbolChar() method which returns (char)_props.getXchSym().

To make it work, I added one method next to
characterRun.getSymbolChar()  in CharacterRun class that returns _props.getXchSym().
Would you please take a look at this and could see if this could be added into CharacterRun class?

I enclose a word document which contains those two symbols and a snippet for the new method I made to the CharacterRun class for your reference.



  public int getSymbolCharacterAsitis()
   {
     if (isSymbol()) {
         return _props.getXchSym();
     } else
       throw new IllegalStateException("Not a symbol CharacterRun");
   }


Many thanks in advance
Teresa
Comment 1 teresa.kim 2017-10-05 12:13:41 UTC
Created attachment 35396 [details]
greek_mu_and_registered_text
Comment 2 Tim Allison 2017-10-05 17:02:05 UTC
Thank you for opening this and sharing a triggering document.  I'm trying to figure out how we're supposed to map from '-3987' to mu.  Anyone have an idea?
Comment 3 Dominik Stadler 2017-10-06 11:50:37 UTC
There are some known issues with some unicode characters, they may be solved by using a newer version of XMLBeans, see bug 59268 for details.

Can you please try quickly with the test-version of XMLBeans from http://mvnrepository.com/artifact/com.github.pjfanning/xmlbeans so we know if your problem is fixed by this as well or if it is a different issue.
Comment 4 Tim Allison 2017-10-06 11:58:05 UTC
Thank you, Dominik.  This is in HWPF, not XWPF...no beans.
Comment 5 Dominik Stadler 2017-10-06 17:41:56 UTC
Ah, sorry, my bad. I did take a closer look now: It works for me if I use getSymbolCharacter(), see the new unit-test that I added via r1811355. 

Please let us know if this still does not work for you for some reason.
Comment 6 teresa.kim 2017-10-09 07:37:29 UTC
(In reply to Dominik Stadler from comment #5)
> Ah, sorry, my bad. I did take a closer look now: It works for me if I use
> getSymbolCharacter(), see the new unit-test that I added via r1811355. 
> 
> Please let us know if this still does not work for you for some reason.

Thanks Dominik

You're right that indeed getSymbolCharacter() works, my confusion was that I used System.out.println() method that getSymbolCharacter() prints out the same broken character for both symbols. Sorry for confusion