Symbols in Word documents are stored specially: - in the CharacterRun there is a character 0x28 - CHP.fSpec is true - CHP.ftcSym contains index of the symbol font in the font table - CHP.xchSym contains character in the symbol font There are several ways how support could be implemented: 1. add getters for ftcSym and xchSym to the CharacterRun, then user could process symbols like this. 2. add helper class to help extract symbol and font from CharacterRun With the first variant the user could do this: String text = characterRun.text(); String fontName = characterRun.getFontName(); if (characterRun.isSpecialCharacter() && text.length() == 1 && text.charAt(0) == 0x28) { fontName = wd.getFontTable().getMainFont(chr.getFtcSym()); text = new String(new char[] { (char) chr.getXchSym() } ); } // work on with fontName and text In my testing, the CharacterRun with symbol always contained exactly one character, but I cannot confirm this is a rule. Please let me know which version is better to be incorporated and I will provide a patch. Viliam
I forgot to state, that (as far as I know) the current API does not allow for processing of symbols - there is no way to read read ftcSym and xchSym from CharacterRun.
I think ideally we probably want to hide some of the complexity from the user. That's how we do it for Pictures, you can pass a CharacterRun to the PicturesTable and it'll tell you if it has a picture in it, return the picture etc So my view is that we should have a helper that will return a boolean has symbol / no symbol for a given character run, and also would return the symbol itself (looking up via the font table) if requested. I don't know enough about the uses for symbols to know if this should go in the existing FontTable, or in a helper fetched from HWPFDocument that has suitable references If you could work on a patch, that'd be great! Marking as need info to indicate a patch is needed, remove that once uploaded.
Created attachment 26130 [details] Patch to support symbols (with testcase) I added the methods to CharacterRun, as processing of symbols is directly associated with particular character run and the processing does not need other information (as the case with pictures). Methods are documented.
Created attachment 26131 [details] New files not included in the diff
Applied in r1005443 Thanks, Yegor
*** Bug 33227 has been marked as a duplicate of this bug. ***