Bug 16870 - Hyphenation bug including bugfix : sporadic mutilation of hyphenated word
Summary: Hyphenation bug including bugfix : sporadic mutilation of hyphenated word
Status: CLOSED DUPLICATE of bug 2106
Alias: None
Product: Fop - Now in Jira
Classification: Unclassified
Component: general (show other bugs)
Version: 0.20.4
Hardware: PC All
: P3 normal
Target Milestone: ---
Assignee: fop-dev
Keywords: PatchAvailable
Depends on:
Reported: 2003-02-07 10:12 UTC by Chris Wewerka
Modified: 2012-04-01 13:51 UTC (History)
0 users

patch (56.32 KB, patch)
2003-02-07 11:37 UTC, Chris Wewerka
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Wewerka 2003-02-07 10:12:00 UTC
Explanation of bug:
Under some circumstances (see below) some hyphenated words are mutilated.

E.g. the german word Altersvorsorge, was SOMETIMES (but not very often) 
hyphenated rsvor-Altesorge.

Xerces uses the characters() calls to give FOP a character buffer which is 
a 'view window' on the current document. It can happen that one word 
(like "Altersvorsorge") is fragmented over two calls of characters(). In the 
given example : "Alte" and "rsvorsorge"

FOP adds the first part of the word to the "pending areas". This happens in 
org\apache\fop\layout\LineArea.java in the method addText(). Xerces delivers 
the rest of the word in his second characters-call which results in a second 
call to addText(). 

In this second call (if hyphenation is set to true) the method doHyphenation() 
(also in class LineArea) is called which completely ignores pending areas!!! So 
it happens that the word fragment "rsvorsorge" is handed over to the 
hyphenation engine, which does a correct job with this fragment.

Now the Hyphenator determines that "rsvor-" is added to the current line area. 

The next call to addText checks if there are any pending areas ("Alte" in our 
example) prints it in the next line and continues with the rest of the current 
buffer ("sorge [...]" in the example).

So the reason that this bug occurs only in very few situations is that it 
depends on 
1) how often and with which buffer size the xml-parser calls the characters-
method and so I think it definitely depends on the version of the xml parser 
2) how the xml-document looks like; an additional character/newline somewhere 
BEFORE the mutilated word can change the calls to the characters method.

I changed the internals of the method doHyphenation(). It now takes into 
account any pending areas which may contain word fragments. 

New Approach in doHyphenation:
1) Scan pending areas vector for pending text fragments, and remove them from 
the pending areas vector
2) Concatenate result from 1) with the current word to be hyphenated in the 
current char-buffer 
3) call Hyphenator
4) use addWord to add pre-hyphen word fragment to current line area
5) Decision: is final hyphenation point somewhere in the pending area or in the 
current char-buffer ?

5a) hyphenation point is somewhere in the pending area :
--> add rest of characters of the pending pending text fragments to the pending 
area vector (they will be printed in a new line (by addText()) together with 
the rest of the word which is in the current buffer). For this task I used the 
existing addSpacedWord() method with the pending parameter set to true.

5b) hyphenation point is somewhere in the current char buffer:
--> just return new position in current char buffer

I also changed the signature of doHyphenation():
Parameter TextState was added : addSpacedWord method (used in 5a) needs the 
current textState

The call to doHyphenation() in LineArea.addText() is modified:
The remaining width parameter now isn't reduced by the pendingWidth, because 
doHyphenation now looks at pending areas itself:

ret = this.doHyphenation(dataCopy, i, wordStart,
 - (finalWidth
 + spaceWidth
 /*+ pendingWidth*/), textState);

I think it doesn't make sense that I include our xsl-fo documents to reproduce 
the error, because we use custom fonts, which will likely lead to a different 
layout on your system and the error will probably not occur.

Chris Wewerka
Munich, Germany
Comment 1 J.Pietschmann 2003-02-07 11:25:04 UTC

*** This bug has been marked as a duplicate of 2106 ***
Comment 2 Chris Wewerka 2003-02-07 11:37:19 UTC
Created attachment 4773 [details]
Comment 3 Glenn Adams 2012-04-01 13:51:17 UTC
batch transition to closed remaining pre-FOP1.0 resolved bugs