Bug 41062

Summary: Defect reading hyphenation patterns
Product: Fop - Now in Jira Reporter: J.Pietschmann <j3322ptm>
Component: generalAssignee: fop-dev
Status: NEW ---    
Severity: normal    
Priority: P3    
Version: all   
Target Milestone: ---   
Hardware: Other   
OS: other   

Description J.Pietschmann 2006-11-28 14:57:06 UTC
The PatternParser.characters method used for parsing patterns and other things
from hyphenation XML files can't cope with the parser splitting text in multiple
character events. This may lead to patterns crossing a buffer boundary to be
parsed as two wrong patterns.
Furthermore, the implementation is quite ineffective:
- copying the characters into a StringBuffer is unnecessary
- the tokenizer moves the whole array
The readToken is declared to return a string, but it always returns null and
stores the token in a class variable (horrible design).
Comment 1 J.Pietschmann 2006-11-28 14:59:48 UTC
Umm, delete the comment about readToken always returning null. The design is
still somewhat horrible.
Comment 2 J.Pietschmann 2006-11-28 15:33:28 UTC
Further points
- Using a ternary tree for the charclass arrays seems to be wasteful. Two
  parallel arrays and a Array binary search should be sufficient.
- The charclass parser wont check whether a . (dot) represents a class. The
  dot is reserved as begin/end of word marker in patterns, using it als
  class representation will probably cause problems.
- The pattern parser wont check whether the non-digits in the patterns are
  actually charclass representations.
Comment 3 Glenn Adams 2012-04-07 01:44:35 UTC
resetting P2 open bugs to P3 pending further review