Bug 49908

Summary: Add API for processing of symbols
Product: POI Reporter: Viliam Anirud <a6537691>
Component: HWPFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: anandv
Priority: P2    
Version: 3.7-dev   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: Patch to support symbols (with testcase)
New files not included in the diff

Description Viliam Anirud 2010-09-10 06:01:38 UTC
Symbols in Word documents are stored specially:
- in the CharacterRun there is a character 0x28
- CHP.fSpec is true
- CHP.ftcSym contains index of the symbol font in the font table
- CHP.xchSym contains character in the symbol font

There are several ways how support could be implemented:

1. add getters for ftcSym and xchSym to the CharacterRun, then user could process symbols like this.

2. add helper class to help extract symbol and font from CharacterRun

With the first variant the user could do this:

  String text = characterRun.text();
  String fontName = characterRun.getFontName();
  if (characterRun.isSpecialCharacter() && text.length() == 1 && text.charAt(0) == 0x28) {
    fontName = wd.getFontTable().getMainFont(chr.getFtcSym());
    text = new String(new char[] { (char) chr.getXchSym() } );
  }
  // work on with fontName and text

In my testing, the CharacterRun with symbol always contained exactly one character, but I cannot confirm this is a rule.

Please let me know which version is better to be incorporated and I will provide a patch.

Viliam
Comment 1 Viliam Anirud 2010-09-10 06:04:15 UTC
I forgot to state, that (as far as I know) the current API does not allow for processing of symbols - there is no way to read read ftcSym and xchSym from CharacterRun.
Comment 2 Nick Burch 2010-09-20 07:25:53 UTC
I think ideally we probably want to hide some of the complexity from the user. That's how we do it for Pictures, you can pass a CharacterRun to the PicturesTable and it'll tell you if it has a picture in it, return the picture etc

So my view is that we should have a helper that will return a boolean has symbol / no symbol for a given character run, and also would return the symbol itself (looking up via the font table) if requested.

I don't know enough about the uses for symbols to know if this should go in the existing FontTable, or in a helper fetched from HWPFDocument that has suitable references

If you could work on a patch, that'd be great! Marking as need info to indicate a patch is needed, remove that once uploaded.
Comment 3 Viliam Anirud 2010-10-07 02:27:24 UTC
Created attachment 26130 [details]
Patch to support symbols (with testcase)

I added the methods to CharacterRun, as processing of symbols is directly associated with particular character run and the processing does not need other information (as the case with pictures). Methods are documented.
Comment 4 Viliam Anirud 2010-10-07 02:27:45 UTC
Created attachment 26131 [details]
New files not included in the diff
Comment 5 Yegor Kozlov 2010-10-07 09:42:38 UTC
Applied in r1005443

Thanks,
Yegor
Comment 6 David Fisher 2010-10-29 18:25:49 UTC
*** Bug 33227 has been marked as a duplicate of this bug. ***