Discretionary hyphenation ------------------------- Some languages use discretionary hyphenation; `discretionary' character changes at hyphenation points. For example, Catalan: paral·lel -> paral-lel, Dutch: omaatje -> oma-tje, German (before the new orthography): Schiffahrt -> Schiff-fahrt, Hungarian: asszonnyal -> asz-szony-nyal, Swedish: tillata -> till-lata. Using this extended library, you can define discretionary hyphenation patterns. For example: l·1l/l=l a1atje./a=t,1,3 .schif1fahrt/ff=f,5,2 .as3szon/sz=sz,2,3 n1nyal./ny=ny,1,3 .til1lata./ll=l,3,2 Syntax of the discretionary hyphenation patterns ------------------------------------------------ pat1tern/change[,start,cut] If this pattern matches the word, and this pattern win (see README.hyphen) in the change region of the pattern, then pattern[start, start + cut - 1] substring will be replaced with the "change". For example, a German ff -> ff-f hyphenation: f1f/ff=f or with expansion f1f/ff=f,1,2 will change every "ff" with "ff=f" at hyphenation. A more real example: % simple ff -> f-f hyphenation f1f % Schiffahrt -> Schiff-fahrt hyphenation % schif3fahrt/ff=f,5,2 Specification - Pattern: matching patterns of the original Liang's algorithm - patterns must contain only one hyphenation point at change region signed with an one-digit odd number (1, 3, 5, 7 or 9). - only the greater value guarantees the win (don't mix discretionary and non-discretionary patterns with the same value, for example instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2) - Change: new characters. Arbitrary character sequence. Equal sign (=) signs hyphenation points for OpenOffice.org (like in the example). (In a possible German LaTeX preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz with `ssz, according to the German and Hungarian Babel settings.) - Start: starting position of the change region. - begins with 1 (not 0): schif3fahrt/ff=f,5,2 - start dot doesn't matter: .schif3fahrt/ff=f,5,2 - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2 - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3 ("össze" looks "össze" in an ISO 8859-1 8-bit editor). - Cut: length of the removed character sequence in the original word. - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3 ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor). Dictionary developing --------------------- There hasn't been extended PatGen pattern generator for discretionary hyphenation patterns, yet. Fortunatelly, discretionary hyphenation points are forbidden in the PatGen generated hyphenation patterns, so with a little patch can be develop discretionary hyphenation patterns also in this case. Warning: If you use UTF-8 Unicode encoding in your patterns, call substrings.pl with UTF-8 parameter to calculate right character positions for discretionary hyphenation: ./substrings.pl input output UTF-8 Programming ----------- Use hyphenate2() for handling discretionary hyphenation. See hyphen.h for the documentation of the hyphenate2() function. See example.c for processing the output of the hyphenate2() function. Check discretionary hyphenations with the -dd option of the example program. Warning: change characters are lower cased in the source, so you may need case conversion of the change characters based on input word case detection. For example, see OpenOffice.org source (lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx). Németh László