Bug 43474

Summary: [PATCH] wrap-option="wrap" doesn't work
Product: Fop - Now in Jira Reporter: Wolfgang Flohr <wolfgang>
Component: pdfAssignee: fop-dev
Status: NEW ---    
Severity: normal CC: maximilian.aster, vlsergey
Priority: P2    
Version: trunk   
Target Milestone: ---   
Hardware: Other   
OS: other   
Attachments: wrap-option testcase
Patch correcting the behavior (?)
Updated testcase showing the difference between wrap and no-wrap
Corrected testcase; avoid indent-inheritance...
PDF result after applying the patch
More extensive patch
Updated PDF result
Patch proposal
Another test fo
Result for the last fo

Description Wolfgang Flohr 2007-09-25 11:59:06 UTC
Running the following FO example with version 0.93 and 0.94 will produce a PDF
document containing a table with some text. The text will not break as expected
and violates the table boundary.

<?xml version="1.0" encoding="ISO-8859-1"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
	<fo:layout-master-set>
		<fo:simple-page-master page-width="21cm" page-height="29.7cm" master-name="spm0">
			<fo:region-body region-name="region-body" margin-bottom="1cm"
margin-top="1.5cm"/>
			<fo:region-before region-name="header" precedence="false" extent="4cm"/>
			<fo:region-after region-name="footer" extent="1cm"/>
		</fo:simple-page-master>
	</fo:layout-master-set>
	<fo:page-sequence master-reference="spm0">
		<fo:flow flow-name="region-body">
			<fo:block wrap-option="wrap"
				padding-end="50pt"
				padding-start="2pt"
				space-before="6.0pt"
				margin-right="60pt"
				margin-left="28.35pt"
				font-size="10.0pt"
				font-family="Courier New, Courier"
				color="black"
				border-top-width="0.75pt"
				border-top-style="double"
				border-top-color="#D3D3D3"
				border-right-width="0.75pt"
				border-right-style="double"
				border-right-color="#D3D3D3"
				border-left-width="0.75pt"
				border-left-style="double"
				border-left-color="#D3D3D3"
				border-bottom-width="0.75pt"
				border-bottom-style="double"
				border-bottom-color="#D3D3D3"
				background-color="#D5DEE3">&lt;object
classid="clsid:8AD9C840-044E-11D1-B3E9-00805F499D93" width="1" height="1"
align="baseline"
codebase="http://java.sun.comproductsplugin13textToMakeTheLineEvenBiggerThanItWasBeforejinstall13win32cab#Version=1,3,0,0"
name="xformsApplet"&gt;
			</fo:block>
		</fo:flow>
	</fo:page-sequence>
</fo:root>
Comment 1 Wolfgang Flohr 2007-09-25 12:00:55 UTC
Created attachment 20878 [details]
wrap-option testcase
Comment 2 Andreas L. Delmelle 2008-05-02 09:38:21 UTC
FWIW: I've been looking closer at this one, and it seems to be related to the support for Unicode-linebreaking.

The problem is that this algorithm, implemented in TextLayoutManager.getNextKnuthElements(), does not find a feasible break-possibility for the word-fragment in question. By itself, this is not an error. However, there is no form of correction in case the total word-fragment IPD exceeds the LayoutContext's reference-IPD, and so the entire fragment is presented to the LineLayoutManager as a monolithic box.
Comment 3 Andreas L. Delmelle 2008-05-03 05:34:01 UTC
Created attachment 21910 [details]
Patch correcting the behavior (?)

First try: If keep-together is inactive, and wrap-option is "wrap", 'prohibited by Unicode' should not be considered binding.
This results in the effect that, if Unicode line-breaking does not find a feasible break for a part of text that exceeds the available line-width, the LineLayoutManager can still break anywhere it sees fit, thus creating a wrapping effect.
Warning: 
* as the initial value of wrap-option is "wrap", this small patch currently breaks quite a few testcases. 
* also (probably suboptimal) the behavior is not really restricted to the longer word-fragments, which could generate quite some overhead, IIC...
Comment 4 Andreas L. Delmelle 2008-05-03 05:37:16 UTC
Created attachment 21911 [details]
Updated testcase showing the difference between wrap and no-wrap

This test-file adds two additional block-containers to the original to show the difference between the two values for wrap-option.
Comment 5 Andreas L. Delmelle 2008-05-03 05:39:59 UTC
Created attachment 21912 [details]
Corrected testcase; avoid indent-inheritance...
Comment 6 Andreas L. Delmelle 2008-05-03 05:41:02 UTC
Created attachment 21913 [details]
PDF result after applying the patch
Comment 7 Andreas L. Delmelle 2008-05-03 05:53:45 UTC
(In reply to comment #3)
> Warning: 
> * as the initial value of wrap-option is "wrap", this small patch currently
> breaks quite a few testcases. 
> * also (probably suboptimal) the behavior is not really restricted to the
> longer word-fragments, which could generate quite some overhead, IIC...
> 

This also shows in the fact that the PDF contains suboptimal breaks in the first block.
The words that do fit on one line should be kept together as much as possible.
Comment 8 Andreas L. Delmelle 2008-05-04 06:50:26 UTC
Created attachment 21917 [details]
More extensive patch

The updated patch takes a different approach: first let the default text-breaking loop do its work, check if the gathered word-fragment exceeds the available IPD, and if wrapping should be forced, process each character as a separate word-fragment to let the LineLayoutManager sort it out.

This effectively limits the effects only to those 'words' that would not fit in the line as a whole. For those words that do fit in the line, only one box is generated as before.
Comment 9 Andreas L. Delmelle 2008-05-04 06:53:47 UTC
Created attachment 21918 [details]
Updated PDF result


The PDF result after applying the second patch shows the effects: the resulting layout will keep words together if they fit, and for the other words, the characters are distributed nicely over as few lines as possible.
Comment 10 Andreas L. Delmelle 2008-05-08 02:33:54 UTC
Did some more testing with the last patch, and there are still some issues to sort out. Before the change, when a word would exceed the available line-width, the soft-hyphen would always be the last part of a word-fragment, preceded by other characters.
After the change we get a separate word-fragment for the soft-hyphens, which is incorrect, and leads to trailing explicit soft-hyphens (last character in a block of text) not being suppressed.

An interesting side-effect, is that after the change, we would have a nice feature.
This will render all characters below each other (already works with the modifications in the patch):

<fo:block-container inline-progression-dimension="0em">
  <fo:block>This</fo:block>
  <fo:block space-before="1em">text</fo:block>
  <fo:block space-before="1em">will</fo:block>
  <fo:block space-before="1em">be</fo:block>
  <fo:block space-before="1em">wrapped</fo:block>
</fo:block-container>

TBC
Comment 11 Max Aster 2008-11-25 05:04:01 UTC
Created attachment 22938 [details]
Patch proposal

I created a patch working with the latest trunk version. It's similar to the last patch. But there were some further problems which are fixed, list-item-labels were always wrapped, and the algorithm for choosing the correct break position was not correct inside tables. I think this functionality is very important for a lot of users as there is even a faq entry.
Comment 12 Max Aster 2008-11-25 05:05:00 UTC
Created attachment 22939 [details]
Another test fo
Comment 13 Max Aster 2008-11-25 05:05:53 UTC
Created attachment 22940 [details]
Result for the last fo
Comment 14 Andreas L. Delmelle 2008-11-25 10:56:12 UTC
(In reply to comment #11)
> I created a patch working with the latest trunk version. It's similar to the
> last patch. But there were some further problems which are fixed,
> list-item-labels were always wrapped, and the algorithm for choosing the
> correct break position was not correct inside tables. I think this
> functionality is very important for a lot of users as there is even a faq
> entry.
> 

Good to see this being picked up.
 
I'm in the process of reviewing the patch, and already have some questions/considerations:
I'm not sure I like the additional reference to ListItemLabel in TextLayoutManager. The TextLM should deal with text (FOText), and if necessary, the ancestor ListItemContentLM should pass the necessary info (a flag in the LayoutContext?) down to prevent the label from being broken. 
Right now, a dev would have to look at the TextLM to find out why a list-item-label is never broken...? I'd rather see an accompanying change to the list-related LMs to account for this. I'll see if I can come up with a polished alternative, to show what I mean.
As to the cause for the fact that the label is always broken: this is probably because the area gets its inline-progression-dimension from the content, so when the element-list is first constructed, the reference-ipd (line-width) will still be zero... I agree that not breaking would be the preferred behavior. If a user really needs this effect, he/she can always resort to a two-column table.

The changes to LineLM: Are they necessary due to the changes to TextLM, or is that a separate issue? Just asking, since the javadoc for the first added private method seems to indicate that the original expression could return a wrong result regardless of the value of wrap-option (?)

The same style issues as Chris mentioned in #42374 put aside, your patch has given me some ideas, which I'll start trying out real soon. 

Thanks for your input so far!
Comment 15 Vincent Hennebert 2008-11-26 03:59:32 UTC
Hi,

I'm sorry to chime in so late but I don't think there is anything to do on FOP's side WRT this issue. If there is no opportunity to break inside the word then there's nothing FOP can do. The wrap-option property is meant to trigger the line-breaking mechanism in its usual accepted meaning; that is, break between words or at hyphenation points inside them, not at every letter if available space is abnormally low, or the word abnormally big. At least, that's my understanding of this property.

In my opinion, this issue must be handled upstream. That is, zero-width space characters (U+200B) must be introduced at applicable places inside the word to provide FOP with more break opportunities. But I don't think this is FOP's responsibility to do that. See also the following FAQ entry:
http://xmlgraphics.apache.org/fop/faq.html#cells-overflow

Vincent


(In reply to comment #14)
> (In reply to comment #11)
> > I created a patch working with the latest trunk version. It's similar to the
> > last patch. But there were some further problems which are fixed,
> > list-item-labels were always wrapped, and the algorithm for choosing the
> > correct break position was not correct inside tables. I think this
> > functionality is very important for a lot of users as there is even a faq
> > entry.
> > 
> 
> Good to see this being picked up.
> 
> I'm in the process of reviewing the patch, and already have some
> questions/considerations:
> I'm not sure I like the additional reference to ListItemLabel in
> TextLayoutManager. The TextLM should deal with text (FOText), and if necessary,
> the ancestor ListItemContentLM should pass the necessary info (a flag in the
> LayoutContext?) down to prevent the label from being broken. 
> Right now, a dev would have to look at the TextLM to find out why a
> list-item-label is never broken...? I'd rather see an accompanying change to
> the list-related LMs to account for this. I'll see if I can come up with a
> polished alternative, to show what I mean.
> As to the cause for the fact that the label is always broken: this is probably
> because the area gets its inline-progression-dimension from the content, so
> when the element-list is first constructed, the reference-ipd (line-width) will
> still be zero... I agree that not breaking would be the preferred behavior. If
> a user really needs this effect, he/she can always resort to a two-column
> table.
> 
> The changes to LineLM: Are they necessary due to the changes to TextLM, or is
> that a separate issue? Just asking, since the javadoc for the first added
> private method seems to indicate that the original expression could return a
> wrong result regardless of the value of wrap-option (?)
> 
> The same style issues as Chris mentioned in #42374 put aside, your patch has
> given me some ideas, which I'll start trying out real soon. 
> 
> Thanks for your input so far!
> 

Comment 16 Andreas L. Delmelle 2008-11-26 11:30:42 UTC
(In reply to comment #15)
> I'm sorry to chime in so late but I don't think there is anything to do on
> FOP's side WRT this issue. 

On the one hand, I agree. I have mentioned this already in the past: wrap-option="wrap" does not include an /obligation/ to perform line-wrapping.

> If there is no opportunity to break inside the word
> then there's nothing FOP can do. 

Wrong. At least, slightly unfortunate choice of words: FOP /can/ definitely do /something/ in this case.

> The wrap-option property is meant to trigger
> the line-breaking mechanism in its usual accepted meaning; that is, break
> between words or at hyphenation points inside them, not at every letter if
> available space is abnormally low, or the word abnormally big. At least, that's
> my understanding of this property.

IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. 
Line-wrapping being the more 'blind' of the two, if you will (simply wrap to fit the text into the available space). The Recommendation remains vague in this respect (and purposely so, I think). It does not prescribe /how/ the text should be wrapped or broken. We have decided to follow Unicode, but ultimately, we always have the last word. That is: if we choose to provide an additional fallback mechanism, there is no relevant specification that makes this wrong.

> In my opinion, this issue must be handled upstream. That is, zero-width space
> characters (U+200B) must be introduced at applicable places inside the word to
> provide FOP with more break opportunities. 

A legitimate option, but not always as easily done as it said. To get the effect of real dynamic content-wrapping (fit as many characters on the line as you can), you would force the user to insert a ZWSP in between every two characters (either that or they should make a choice of every so-many characters).

> But I don't think this is FOP's responsibility to do that. See also the following FAQ entry:
> http://xmlgraphics.apache.org/fop/faq.html#cells-overflow

Maybe not, but it would mean a big relief for many users, I think, if FOP would take this responsibility, even if it is not mandated...
Comment 17 Vincent Hennebert 2008-11-27 03:33:03 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > I'm sorry to chime in so late but I don't think there is anything to do on
> > FOP's side WRT this issue. 
> 
> On the one hand, I agree. I have mentioned this already in the past:
> wrap-option="wrap" does not include an /obligation/ to perform line-wrapping.
> 
> > If there is no opportunity to break inside the word
> > then there's nothing FOP can do. 
> 
> Wrong. At least, slightly unfortunate choice of words: FOP /can/ definitely do
> /something/ in this case.
> 
> > The wrap-option property is meant to trigger
> > the line-breaking mechanism in its usual accepted meaning; that is, break
> > between words or at hyphenation points inside them, not at every letter if
> > available space is abnormally low, or the word abnormally big. At least, that's
> > my understanding of this property.
> 
> IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. 
> Line-wrapping being the more 'blind' of the two, if you will (simply wrap to
> fit the text into the available space). The Recommendation remains vague in
> this respect (and purposely so, I think).

I don't think so. Section 4.7.2, “Line-building”, states that “The partitioning [must occur] at legal line-breaks. [...] the rules of the language, script and hyphenation constraints in effect must permit a line-break between [two areas].”
http://www.w3.org/TR/xsl11/#area-linebuild

It /might/ be acceptable to relax the line-breaking algorithm somehow when the 'script' property is set to 'none', but frankly I'm not too keen on implementing a special treatment just to cope with pathological cases. The code is already complicated enough.


> It does not prescribe /how/ the text
> should be wrapped or broken. We have decided to follow Unicode, but ultimately,
> we always have the last word. That is: if we choose to provide an additional
> fallback mechanism, there is no relevant specification that makes this wrong.
> 
> > In my opinion, this issue must be handled upstream. That is, zero-width space
> > characters (U+200B) must be introduced at applicable places inside the word to
> > provide FOP with more break opportunities. 
> 
> A legitimate option, but not always as easily done as it said. To get the
> effect of real dynamic content-wrapping (fit as many characters on the line as
> you can), you would force the user to insert a ZWSP in between every two
> characters (either that or they should make a choice of every so-many
> characters).

Exactly. And I bet it's less complicated to implement some XSLT function or pre-processing step to do that than a dedicated extension in FOP's layout engine.


> > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry:
> > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow
> 
> Maybe not, but it would mean a big relief for many users, I think, if FOP would
> take this responsibility, even if it is not mandated...

Yes, but if we reason like this FOP would soon become like those Swiss knives with ridiculous numbers of blades ;-) (Note that I have nothing against Swiss knives, I've been owning one myself for years and it serves me very well! But it has only 6 functions.)

More seriously, you could take it the other way around, and find users who wouldn't be happy at all to see FOP suddenly break their texts at arbitrary places, and would rather be warned when such situations are occuring, so that they can re-work their contents.


Vincent
Comment 18 Pascal Sancho 2008-11-27 05:06:36 UTC
(In reply to comment #17)

wrap-option is cited 3 times in REC 1.1:
[1] Section 1.2.1 Paging and Scrolling
[2] Section 4.7.2 Line-building
[3] Section 7.16.13 wrap-option

IMHO, there is inconsistency in recommendation, especially between [2] and [3]:

in [3], when wrap-option="wrap", REC says "line-breaking will occur if the line overflows the available block width"

but in [2], there is no tool to do what is required in [3], as Vincent said.

So, I wonder whether FOP should not enforce a break if there is no break opportunity other than between 2 normal chars, regarding the wrap-option.

[1] http://www.w3.org/TR/xsl11/#d0e297
[2] http://www.w3.org/TR/xsl11/#area-linebuild
[3] http://www.w3.org/TR/xsl11/#wrap-option

Pascal
Comment 19 Chris Bowditch 2008-11-28 01:07:46 UTC
Hi Vincent,

(In reply to comment #17)
> (In reply to comment #16)
> > (In reply to comment #15)
> > IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. 
> > Line-wrapping being the more 'blind' of the two, if you will (simply wrap to
> > fit the text into the available space). The Recommendation remains vague in
> > this respect (and purposely so, I think).
> I don't think so. Section 4.7.2, “Line-building”, states that “The
> partitioning [must occur] at legal line-breaks. [...] the rules of the
> language, script and hyphenation constraints in effect must permit a line-break
> between [two areas].”
> http://www.w3.org/TR/xsl11/#area-linebuild
> It /might/ be acceptable to relax the line-breaking algorithm somehow when the
> 'script' property is set to 'none', but frankly I'm not too keen on
> implementing a special treatment just to cope with pathological cases. The code
> is already complicated enough.

Agreed that the code is complex, but I don't believe we are dealing with extreme or pathological cases here. Most users run into the problem of content overflowing a table cell at some point and this is shown by the number of questions on the subject that occuir on fop-user. Far too many for my liking. 

As already indicated by Pascal there are some inconsistencies in the spec in this area.

> > A legitimate option, but not always as easily done as it said. To get the
> > effect of real dynamic content-wrapping (fit as many characters on the line as
> > you can), you would force the user to insert a ZWSP in between every two
> > characters (either that or they should make a choice of every so-many
> > characters).
> Exactly. And I bet it's less complicated to implement some XSLT function or
> pre-processing step to do that than a dedicated extension in FOP's layout
> engine.
> > > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry:
> > > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow
> > 
> > Maybe not, but it would mean a big relief for many users, I think, if FOP would
> > take this responsibility, even if it is not mandated...
> Yes, but if we reason like this FOP would soon become like those Swiss knives
> with ridiculous numbers of blades ;-) (Note that I have nothing against Swiss
> knives, I've been owning one myself for years and it serves me very well! But
> it has only 6 functions.)
> More seriously, you could take it the other way around, and find users who
> wouldn't be happy at all to see FOP suddenly break their texts at arbitrary
> places, and would rather be warned when such situations are occuring, so that
> they can re-work their contents.

Well I agree that FOP shouldn't agree to a code change for every possible use-case, but we should try to adjust the code to assist the most common use cases. I do believe that text with few break possibilities in narrow table columns is a common use case.

The other thing to bear in mind is that no one is saying this feature has to be implemented right now, but I don't think it should be dismissed either.

> Vincent

Thanks,

Chris
Comment 20 Vincent Hennebert 2008-11-28 02:37:30 UTC
Hi Chris,

(In reply to comment #19)
> Hi Vincent,
> 
> (In reply to comment #17)
> > (In reply to comment #16)
> > > (In reply to comment #15)
> > > IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. 
> > > Line-wrapping being the more 'blind' of the two, if you will (simply wrap to
> > > fit the text into the available space). The Recommendation remains vague in
> > > this respect (and purposely so, I think).
> > I don't think so. Section 4.7.2, “Line-building”, states that “The
> > partitioning [must occur] at legal line-breaks. [...] the rules of the
> > language, script and hyphenation constraints in effect must permit a line-break
> > between [two areas].”
> > http://www.w3.org/TR/xsl11/#area-linebuild
> > It /might/ be acceptable to relax the line-breaking algorithm somehow when the
> > 'script' property is set to 'none', but frankly I'm not too keen on
> > implementing a special treatment just to cope with pathological cases. The code
> > is already complicated enough.
> 
> Agreed that the code is complex, but I don't believe we are dealing with
> extreme or pathological cases here. Most users run into the problem of content
> overflowing a table cell at some point and this is shown by the number of
> questions on the subject that occuir on fop-user. Far too many for my liking. 

Well, in most cases this was because they were using the keep-together shorthand, that from version 0.95 on started to be applied to lines as well. Usually they were happy with the result once they switched to keep-together.within-column.


> As already indicated by Pascal there are some inconsistencies in the spec in
> this area.

I guess the best way to have a definitive answer is to ask for clarification on xsl-editors@


> > > A legitimate option, but not always as easily done as it said. To get the
> > > effect of real dynamic content-wrapping (fit as many characters on the line as
> > > you can), you would force the user to insert a ZWSP in between every two
> > > characters (either that or they should make a choice of every so-many
> > > characters).
> > Exactly. And I bet it's less complicated to implement some XSLT function or
> > pre-processing step to do that than a dedicated extension in FOP's layout
> > engine.
> > > > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry:
> > > > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow
> > > 
> > > Maybe not, but it would mean a big relief for many users, I think, if FOP would
> > > take this responsibility, even if it is not mandated...
> > Yes, but if we reason like this FOP would soon become like those Swiss knives
> > with ridiculous numbers of blades ;-) (Note that I have nothing against Swiss
> > knives, I've been owning one myself for years and it serves me very well! But
> > it has only 6 functions.)
> > More seriously, you could take it the other way around, and find users who
> > wouldn't be happy at all to see FOP suddenly break their texts at arbitrary
> > places, and would rather be warned when such situations are occuring, so that
> > they can re-work their contents.
> 
> Well I agree that FOP shouldn't agree to a code change for every possible
> use-case, but we should try to adjust the code to assist the most common use
> cases. I do believe that text with few break possibilities in narrow table
> columns is a common use case.
> 
> The other thing to bear in mind is that no one is saying this feature has to be
> implemented right now, but I don't think it should be dismissed either.

Sure ;-) I guess something could be done when the 'script' property is implemented, but I keep thinking that the issue is better solved upstream.


> Thanks,
> 
> Chris

Vincent
Comment 21 stefan.hundhammer 2010-10-13 06:20:39 UTC
The inconvenience of overflowing lines in table-cells when there is no break opportunity instead of generally breaking/wrapping a word to next line (as done by version 0.20.5 !) is the reason why we don't use version 0.95 and 1.0

Overwriting information in the next column is so critical in our eyes, that we will never use a tool doing so.
Comment 22 Sergey Vladimirov 2011-07-27 15:50:16 UTC
Andreas,

Is there an updated patch for 1.0 version? I would like to use it, if possible.

Sergey
Comment 23 Andreas L. Delmelle 2011-07-29 16:22:39 UTC
(In reply to comment #22)
> Is there an updated patch for 1.0 version? I would like to use it, if possible.

Unfortunately not, and I do not immediately have any time available to dedicate to porting it. Not that it would take _that_ much time, but still it's not a matter of minutes...

That said, I do recall one peculiar effect that I think is not yet mentioned in this report: 
if you would have a mixture of words that do fit within the line with words that don't, then the wrapping effect would only be applied to the latter. That is, for the remainder you will get usual line-breaks, only those words that do not fit on the line in their entirety would ultimately get in-word breaks. That might lead to unexpected output, like:

short short ver
yveryverylong
short short
short

where one might expect something like (true wrapping):

short short ver
yveryverylong s
hort short shor
t
Comment 24 Glenn Adams 2012-04-07 01:45:17 UTC
resetting P2 open bugs to P3 pending further review
Comment 25 Glenn Adams 2012-04-11 03:22:33 UTC
increase priority for bugs with a patch
Comment 26 Glenn Adams 2012-04-11 06:18:05 UTC
change status from ASSIGNED to NEW for consistency