Bug 38264

Summary: Hyphenation does not play well with preserved linefeed-treatment or white-space-treatment
Product: Fop - Now in Jira Reporter: Franck Schmidlin <franck.schmidlin>
Component: generalAssignee: fop-dev
Status: CLOSED FIXED    
Severity: normal CC: arjen, jmt4b04d4v, WZiegler, xuanngolist
Priority: P2    
Version: 0.90   
Target Milestone: ---   
Hardware: Other   
OS: other   
Attachments: an FO file that demonstrates the problem.
the PDF output of the hyphen2.fo
Hyphens do not show up when white-space-treatment="preserve"

Description Franck Schmidlin 2006-01-13 19:30:57 UTC
When combining the attributes linefeed-treatment="preserve" and 
hyphenate="true", I get some really strange result.

In fact, it seems that both attribute are applied to the text in turn, which 
DUPLICATES the text in the output. And both outputs are wrong...


I attach an example FO and the resulting PDF.
Comment 1 Franck Schmidlin 2006-01-13 19:32:58 UTC
Created attachment 17419 [details]
an FO file that demonstrates the problem.
Comment 2 Franck Schmidlin 2006-01-13 19:34:58 UTC
Created attachment 17420 [details]
the PDF output of the hyphen2.fo
Comment 3 Simon Pepping 2006-01-13 21:21:31 UTC
This problem is also present in subversion HEAD, rev. 367760
Comment 4 Andreas L. Delmelle 2007-02-19 08:09:59 UTC
Also interesting to note: if one encloses the content of the second block in testcase 
'block_hyphenation_linefeed-preserve.xml' with an fo:inline, then 
LineLayoutManager.findHyphenationPoints() throws a NullPointerException (line 1486), due to an 
Update being added earlier which has null for an inlineLM...

Looking closer, I'm wondering whether the strange effect of duplication may have something to with:
a) a block containing preserved linefeeds generates a Paragraph of Paragraphs
b) findOptimalBreakingPoints() is called in a loop that iterates /backwards/ over the sub-paragraphs, 
while 
c) findHyphenationPoints() iterates /forwards/ over each sub-paragraph individually

This opens up the possibility that findHyphenationPoints() adds Updates to the updateList with indices 
that refer to the last sub-paragraph, and those indices are later, in the outer loop, interpreted as 
positions in the first sub-paragraph --or worse, in the super-paragraph?
Comment 5 Vincent Hennebert 2007-12-13 08:41:16 UTC
Another problem related to hyphenation and preserved white-space: when
white-space-treatment is set to "preserve", words are hyphenated correctly but
the hyphen does not show up.
Comment 6 Vincent Hennebert 2007-12-13 08:43:29 UTC
Created attachment 21274 [details]
Hyphens do not show up when white-space-treatment="preserve"
Comment 7 Andreas L. Delmelle 2007-12-24 01:30:33 UTC
*** Bug 44124 has been marked as a duplicate of this bug. ***
Comment 8 Andreas L. Delmelle 2008-05-04 10:54:00 UTC
In the meantime, managed to track down the source of the problem with linefeed-treatment="preserve".
Nothing inherently wrong with the hyphenation loop itself. After the hyphenation-points have been determined, and the updates are processed is where it goes wrong.

See LineLayoutManager.findHyphenationPoints(), second main loop. For each Paragraph, the corresponding TextLayoutManager.applyChanges() and .getChangedKnuthElements() are used.
Checking the implementations for those latter two methods reveals that they do not take into account that they can be called multiple times for the same instance. The former always sets the 'returnedIndex' member to 0, which leads to the duplication if the latter is called twice. Each subparagraph in the main paragraph is replaced by a copy of the main paragraph...

Now still looking for a solution :/
Comment 9 Andreas L. Delmelle 2008-05-05 07:27:25 UTC
Trying to gain more understanding of this issue, and as I see it, the full story wrt linefeed-treatment='preserve' and hyphenate='true' is:

1) for blocks of text containing preserved linefeeds, the TextLayoutManager actually generates multiple Paragraphs (see TextLM.getNextKnuthElements() -> in case of an explicit break, the 'current' sequence is ended, and a new one is added to the returnList)
2) the optimal line-breaks are determined by the LineLayoutManager per Paragraph ( see LineLM.createLineBreaks() )
3) the hyphenation-points are determined for each Paragraph in the same loop ( see LineLM.findOptimalBreakingPoints() )
4) BUT: the integration of hyphenation-points (applyChanges() and getChangedKnuthElements()) operate on the TextLayoutManager instance as a whole.

=> the entire content generated by the TextLM in question is copied as many times as there are paragraphs/preserved linefeeds in the source

Mainly TextLM.getChangedKnuthElements() is a bit problematic in this regard: every time this is called, it generates an element-list based on the complete set of AreaInfos for the LM. In LineLM.findHyphenationPoints(), each of the original paragraphs is replaced by that list.

I already tried to change that method to take into account the position-indices of the first and last element in the parameter oldList. This already gets me somewhat further, but still far from committable...
Comment 10 Andreas L. Delmelle 2008-05-06 08:06:37 UTC
Status update:

The main difficulty seems to be that the principal iteration in LineLM.createLineBreaks() iterates in reverse order. As a result, applyChanges() is called first for the last Paragraph if the TextLM generates multiple paragraphs.
Now, while we can keep track of the changed position indices and limit both applyChanges() and getChangedKnuthElements() to operate only on the portion corresponding to oldList, by the time the next-to-last paragraph is processed, the changed positions for the last one should again be modified to take into account added/removed areas for the changes to the preceding one.

I made such changes locally, and this does avoid the duplication, however, keeping track of the bounding indices is turning out to be quite a pain. As soon as the first paragraph has hyphenation points, the positions pointing into the later paragraphs will be wrong...
Comment 11 Andreas L. Delmelle 2009-01-29 13:28:15 UTC
*** Bug 10374 has been marked as a duplicate of this bug. ***
Comment 12 Andreas L. Delmelle 2010-06-16 14:20:08 UTC
*** Bug 49411 has been marked as a duplicate of this bug. ***
Comment 13 Andreas L. Delmelle 2010-11-25 16:31:26 UTC
Both issues fixed in r1039188:
- combination of linefeed-preserve and hyphenation failed for the reasons described in earlier comments. After having inverted the main loop in LineLM.createLineBreaks() (see r956271), the fix was to modify TextLM.applyChanges() and TextLM.getChangedKnuthElements() to account for the fact that they can be called multiple times for the same instance.
Additionally, needed to make sure LineLM.hyphenationPerformed is only set if the last paragraph has been hyphenated. Otherwise, hyphenation would be bypassed for all paragraphs following the first preserved linefeed in a block. After modification, hyphenation is only bypassed in case of a re-entry due to changing page-ipd.
- combination of white-space-treatment="preserve" and hyphenation failed due to an oversight that has probably been present for a while. See LineLM.addInlineArea(), around line 1515: lastLM was only set in case white-space-treatment is not "preserve". If white-space was preserved, this caused the call to LayoutContext.setFlags() some 70-75 lines further down to set LAST_AREA to false (childLM == lastLM), which in turn caused TextLM to ignore the hyphenation character when building the area.
Fix was to make sure that lastLM always points to the LM of the last KnuthElement in the sequence to be processed.
Comment 14 Glenn Adams 2012-04-01 06:18:48 UTC
batch transition to closed; if someone wishes to restore one of these to resolved in order to perform a verification step, then feel free to do so