Bug 41062 - Defect reading hyphenation patterns
Summary: Defect reading hyphenation patterns
Status: NEW
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: general (show other bugs)
Version: all
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: fop-dev
Depends on:
Reported: 2006-11-28 14:57 UTC by J.Pietschmann
Modified: 2012-04-07 01:52 UTC (History)
0 users


Note You need to log in before you can comment on or make changes to this bug.
Description J.Pietschmann 2006-11-28 14:57:06 UTC
The PatternParser.characters method used for parsing patterns and other things
from hyphenation XML files can't cope with the parser splitting text in multiple
character events. This may lead to patterns crossing a buffer boundary to be
parsed as two wrong patterns.
Furthermore, the implementation is quite ineffective:
- copying the characters into a StringBuffer is unnecessary
- the tokenizer moves the whole array
The readToken is declared to return a string, but it always returns null and
stores the token in a class variable (horrible design).
Comment 1 J.Pietschmann 2006-11-28 14:59:48 UTC
Umm, delete the comment about readToken always returning null. The design is
still somewhat horrible.
Comment 2 J.Pietschmann 2006-11-28 15:33:28 UTC
Further points
- Using a ternary tree for the charclass arrays seems to be wasteful. Two
  parallel arrays and a Array binary search should be sufficient.
- The charclass parser wont check whether a . (dot) represents a class. The
  dot is reserved as begin/end of word marker in patterns, using it als
  class representation will probably cause problems.
- The pattern parser wont check whether the non-digits in the patterns are
  actually charclass representations.
Comment 3 Glenn Adams 2012-04-07 01:44:35 UTC
resetting P2 open bugs to P3 pending further review