|Summary:||Defect reading hyphenation patterns|
|Product:||Fop - Now in Jira||Reporter:||J.Pietschmann <j3322ptm>|
Description J.Pietschmann 2006-11-28 14:57:06 UTC
The PatternParser.characters method used for parsing patterns and other things from hyphenation XML files can't cope with the parser splitting text in multiple character events. This may lead to patterns crossing a buffer boundary to be parsed as two wrong patterns. Furthermore, the implementation is quite ineffective: - copying the characters into a StringBuffer is unnecessary - the tokenizer moves the whole array The readToken is declared to return a string, but it always returns null and stores the token in a class variable (horrible design).
Comment 1 J.Pietschmann 2006-11-28 14:59:48 UTC
Umm, delete the comment about readToken always returning null. The design is still somewhat horrible.
Comment 2 J.Pietschmann 2006-11-28 15:33:28 UTC
Further points - Using a ternary tree for the charclass arrays seems to be wasteful. Two parallel arrays and a Array binary search should be sufficient. - The charclass parser wont check whether a . (dot) represents a class. The dot is reserved as begin/end of word marker in patterns, using it als class representation will probably cause problems. - The pattern parser wont check whether the non-digits in the patterns are actually charclass representations.
Comment 3 Glenn Adams 2012-04-07 01:44:35 UTC
resetting P2 open bugs to P3 pending further review