Bug 53089

Summary: Hyphenation of Uppercase Words, Combined with Underlines
Product: Fop - Now in Jira Reporter: Thomas Schraitle <tom_schr>
Component: generalAssignee: fop-dev
Status: NEW ---    
Severity: enhancement    
Priority: P3    
Version: 1.0   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: FO file showing hyphenation issue with uppercase word(s)
PDF output from FO file of attachment#28621

Description Thomas Schraitle 2012-04-17 07:39:29 UTC
Created attachment 28621 [details]
FO file showing hyphenation issue with uppercase word(s)

Consider the attached FO file which combines words of lowercase and uppercase letters.

As it is expected, the word "expected" is hyphenated correctly (example 2). Also the uppercase "SUCCESS". Even combined with underlines before and after the word (see example 4 and 5).


However, if there is another word (like OCF_SUCCESS) the word isn't hyphenated at all anymore. I don't know if this is an expected behaviour or an issue in the hyphenation patterns. Interestingly, XEP from RenderX hyphenates it as "OCF_SUC-CESS". As far as I know, they use also the TeX hyphenation patterns as FOP.
Comment 1 Glenn Adams 2012-04-17 16:26:59 UTC
please provide a PDF output file that shows the results you are seeing
Comment 2 Glenn Adams 2012-04-17 16:54:32 UTC
the problem is in o.a.f.hyphenation.HyphenationTree#hyphenate, specifically in:

                if (!bEndOfLetters) {
                    word[i - iIgnoreAtBeginning] = (char)nc;
                } else {
                    return null;
                }

when '_' is encountered after a letter (as opposed to beginning of word), bEndOfLetters is set to true, which causes the hyphenate algorithm to bail out;

a better approach would be to divide the input word into segments separated by non-letter characters, hyphenate each segment separately, then collect and return the union of hyphenation points from these segments

would anyone like to submit a patch?
Comment 3 Thomas Schraitle 2012-04-18 13:53:06 UTC
Created attachment 28635 [details]
PDF output from FO file of attachment#28621 [details]
Comment 4 Thomas Schraitle 2012-04-18 13:54:00 UTC
(In reply to comment #1)
> please provide a PDF output file that shows the results you are seeing

See Attachement#28635
https://issues.apache.org/bugzilla/attachment.cgi?id=28635
Comment 5 Glenn Adams 2012-04-21 04:53:15 UTC
marking this as an enhancement rather than a bug, since XSL-FO does not prescribe hyphenation behavior