Summary: | [PATCH] wrap-option="wrap" doesn't work | ||
---|---|---|---|
Product: | Fop - Now in Jira | Reporter: | Wolfgang Flohr <wolfgang> |
Component: | Assignee: | fop-dev | |
Status: | NEW --- | ||
Severity: | normal | CC: | maximilian.aster, vlsergey |
Priority: | P2 | ||
Version: | trunk | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | other | ||
Attachments: |
wrap-option testcase
Patch correcting the behavior (?) Updated testcase showing the difference between wrap and no-wrap Corrected testcase; avoid indent-inheritance... PDF result after applying the patch More extensive patch Updated PDF result Patch proposal Another test fo Result for the last fo |
Description
Wolfgang Flohr
2007-09-25 11:59:06 UTC
Created attachment 20878 [details]
wrap-option testcase
FWIW: I've been looking closer at this one, and it seems to be related to the support for Unicode-linebreaking. The problem is that this algorithm, implemented in TextLayoutManager.getNextKnuthElements(), does not find a feasible break-possibility for the word-fragment in question. By itself, this is not an error. However, there is no form of correction in case the total word-fragment IPD exceeds the LayoutContext's reference-IPD, and so the entire fragment is presented to the LineLayoutManager as a monolithic box. Created attachment 21910 [details]
Patch correcting the behavior (?)
First try: If keep-together is inactive, and wrap-option is "wrap", 'prohibited by Unicode' should not be considered binding.
This results in the effect that, if Unicode line-breaking does not find a feasible break for a part of text that exceeds the available line-width, the LineLayoutManager can still break anywhere it sees fit, thus creating a wrapping effect.
Warning:
* as the initial value of wrap-option is "wrap", this small patch currently breaks quite a few testcases.
* also (probably suboptimal) the behavior is not really restricted to the longer word-fragments, which could generate quite some overhead, IIC...
Created attachment 21911 [details]
Updated testcase showing the difference between wrap and no-wrap
This test-file adds two additional block-containers to the original to show the difference between the two values for wrap-option.
Created attachment 21912 [details]
Corrected testcase; avoid indent-inheritance...
Created attachment 21913 [details]
PDF result after applying the patch
(In reply to comment #3) > Warning: > * as the initial value of wrap-option is "wrap", this small patch currently > breaks quite a few testcases. > * also (probably suboptimal) the behavior is not really restricted to the > longer word-fragments, which could generate quite some overhead, IIC... > This also shows in the fact that the PDF contains suboptimal breaks in the first block. The words that do fit on one line should be kept together as much as possible. Created attachment 21917 [details]
More extensive patch
The updated patch takes a different approach: first let the default text-breaking loop do its work, check if the gathered word-fragment exceeds the available IPD, and if wrapping should be forced, process each character as a separate word-fragment to let the LineLayoutManager sort it out.
This effectively limits the effects only to those 'words' that would not fit in the line as a whole. For those words that do fit in the line, only one box is generated as before.
Created attachment 21918 [details]
Updated PDF result
The PDF result after applying the second patch shows the effects: the resulting layout will keep words together if they fit, and for the other words, the characters are distributed nicely over as few lines as possible.
Did some more testing with the last patch, and there are still some issues to sort out. Before the change, when a word would exceed the available line-width, the soft-hyphen would always be the last part of a word-fragment, preceded by other characters. After the change we get a separate word-fragment for the soft-hyphens, which is incorrect, and leads to trailing explicit soft-hyphens (last character in a block of text) not being suppressed. An interesting side-effect, is that after the change, we would have a nice feature. This will render all characters below each other (already works with the modifications in the patch): <fo:block-container inline-progression-dimension="0em"> <fo:block>This</fo:block> <fo:block space-before="1em">text</fo:block> <fo:block space-before="1em">will</fo:block> <fo:block space-before="1em">be</fo:block> <fo:block space-before="1em">wrapped</fo:block> </fo:block-container> TBC Created attachment 22938 [details]
Patch proposal
I created a patch working with the latest trunk version. It's similar to the last patch. But there were some further problems which are fixed, list-item-labels were always wrapped, and the algorithm for choosing the correct break position was not correct inside tables. I think this functionality is very important for a lot of users as there is even a faq entry.
Created attachment 22939 [details]
Another test fo
Created attachment 22940 [details]
Result for the last fo
(In reply to comment #11) > I created a patch working with the latest trunk version. It's similar to the > last patch. But there were some further problems which are fixed, > list-item-labels were always wrapped, and the algorithm for choosing the > correct break position was not correct inside tables. I think this > functionality is very important for a lot of users as there is even a faq > entry. > Good to see this being picked up. I'm in the process of reviewing the patch, and already have some questions/considerations: I'm not sure I like the additional reference to ListItemLabel in TextLayoutManager. The TextLM should deal with text (FOText), and if necessary, the ancestor ListItemContentLM should pass the necessary info (a flag in the LayoutContext?) down to prevent the label from being broken. Right now, a dev would have to look at the TextLM to find out why a list-item-label is never broken...? I'd rather see an accompanying change to the list-related LMs to account for this. I'll see if I can come up with a polished alternative, to show what I mean. As to the cause for the fact that the label is always broken: this is probably because the area gets its inline-progression-dimension from the content, so when the element-list is first constructed, the reference-ipd (line-width) will still be zero... I agree that not breaking would be the preferred behavior. If a user really needs this effect, he/she can always resort to a two-column table. The changes to LineLM: Are they necessary due to the changes to TextLM, or is that a separate issue? Just asking, since the javadoc for the first added private method seems to indicate that the original expression could return a wrong result regardless of the value of wrap-option (?) The same style issues as Chris mentioned in #42374 put aside, your patch has given me some ideas, which I'll start trying out real soon. Thanks for your input so far! Hi, I'm sorry to chime in so late but I don't think there is anything to do on FOP's side WRT this issue. If there is no opportunity to break inside the word then there's nothing FOP can do. The wrap-option property is meant to trigger the line-breaking mechanism in its usual accepted meaning; that is, break between words or at hyphenation points inside them, not at every letter if available space is abnormally low, or the word abnormally big. At least, that's my understanding of this property. In my opinion, this issue must be handled upstream. That is, zero-width space characters (U+200B) must be introduced at applicable places inside the word to provide FOP with more break opportunities. But I don't think this is FOP's responsibility to do that. See also the following FAQ entry: http://xmlgraphics.apache.org/fop/faq.html#cells-overflow Vincent (In reply to comment #14) > (In reply to comment #11) > > I created a patch working with the latest trunk version. It's similar to the > > last patch. But there were some further problems which are fixed, > > list-item-labels were always wrapped, and the algorithm for choosing the > > correct break position was not correct inside tables. I think this > > functionality is very important for a lot of users as there is even a faq > > entry. > > > > Good to see this being picked up. > > I'm in the process of reviewing the patch, and already have some > questions/considerations: > I'm not sure I like the additional reference to ListItemLabel in > TextLayoutManager. The TextLM should deal with text (FOText), and if necessary, > the ancestor ListItemContentLM should pass the necessary info (a flag in the > LayoutContext?) down to prevent the label from being broken. > Right now, a dev would have to look at the TextLM to find out why a > list-item-label is never broken...? I'd rather see an accompanying change to > the list-related LMs to account for this. I'll see if I can come up with a > polished alternative, to show what I mean. > As to the cause for the fact that the label is always broken: this is probably > because the area gets its inline-progression-dimension from the content, so > when the element-list is first constructed, the reference-ipd (line-width) will > still be zero... I agree that not breaking would be the preferred behavior. If > a user really needs this effect, he/she can always resort to a two-column > table. > > The changes to LineLM: Are they necessary due to the changes to TextLM, or is > that a separate issue? Just asking, since the javadoc for the first added > private method seems to indicate that the original expression could return a > wrong result regardless of the value of wrap-option (?) > > The same style issues as Chris mentioned in #42374 put aside, your patch has > given me some ideas, which I'll start trying out real soon. > > Thanks for your input so far! > (In reply to comment #15) > I'm sorry to chime in so late but I don't think there is anything to do on > FOP's side WRT this issue. On the one hand, I agree. I have mentioned this already in the past: wrap-option="wrap" does not include an /obligation/ to perform line-wrapping. > If there is no opportunity to break inside the word > then there's nothing FOP can do. Wrong. At least, slightly unfortunate choice of words: FOP /can/ definitely do /something/ in this case. > The wrap-option property is meant to trigger > the line-breaking mechanism in its usual accepted meaning; that is, break > between words or at hyphenation points inside them, not at every letter if > available space is abnormally low, or the word abnormally big. At least, that's > my understanding of this property. IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. Line-wrapping being the more 'blind' of the two, if you will (simply wrap to fit the text into the available space). The Recommendation remains vague in this respect (and purposely so, I think). It does not prescribe /how/ the text should be wrapped or broken. We have decided to follow Unicode, but ultimately, we always have the last word. That is: if we choose to provide an additional fallback mechanism, there is no relevant specification that makes this wrong. > In my opinion, this issue must be handled upstream. That is, zero-width space > characters (U+200B) must be introduced at applicable places inside the word to > provide FOP with more break opportunities. A legitimate option, but not always as easily done as it said. To get the effect of real dynamic content-wrapping (fit as many characters on the line as you can), you would force the user to insert a ZWSP in between every two characters (either that or they should make a choice of every so-many characters). > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry: > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow Maybe not, but it would mean a big relief for many users, I think, if FOP would take this responsibility, even if it is not mandated... (In reply to comment #16) > (In reply to comment #15) > > I'm sorry to chime in so late but I don't think there is anything to do on > > FOP's side WRT this issue. > > On the one hand, I agree. I have mentioned this already in the past: > wrap-option="wrap" does not include an /obligation/ to perform line-wrapping. > > > If there is no opportunity to break inside the word > > then there's nothing FOP can do. > > Wrong. At least, slightly unfortunate choice of words: FOP /can/ definitely do > /something/ in this case. > > > The wrap-option property is meant to trigger > > the line-breaking mechanism in its usual accepted meaning; that is, break > > between words or at hyphenation points inside them, not at every letter if > > available space is abnormally low, or the word abnormally big. At least, that's > > my understanding of this property. > > IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. > Line-wrapping being the more 'blind' of the two, if you will (simply wrap to > fit the text into the available space). The Recommendation remains vague in > this respect (and purposely so, I think). I don't think so. Section 4.7.2, “Line-building”, states that “The partitioning [must occur] at legal line-breaks. [...] the rules of the language, script and hyphenation constraints in effect must permit a line-break between [two areas].” http://www.w3.org/TR/xsl11/#area-linebuild It /might/ be acceptable to relax the line-breaking algorithm somehow when the 'script' property is set to 'none', but frankly I'm not too keen on implementing a special treatment just to cope with pathological cases. The code is already complicated enough. > It does not prescribe /how/ the text > should be wrapped or broken. We have decided to follow Unicode, but ultimately, > we always have the last word. That is: if we choose to provide an additional > fallback mechanism, there is no relevant specification that makes this wrong. > > > In my opinion, this issue must be handled upstream. That is, zero-width space > > characters (U+200B) must be introduced at applicable places inside the word to > > provide FOP with more break opportunities. > > A legitimate option, but not always as easily done as it said. To get the > effect of real dynamic content-wrapping (fit as many characters on the line as > you can), you would force the user to insert a ZWSP in between every two > characters (either that or they should make a choice of every so-many > characters). Exactly. And I bet it's less complicated to implement some XSLT function or pre-processing step to do that than a dedicated extension in FOP's layout engine. > > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry: > > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow > > Maybe not, but it would mean a big relief for many users, I think, if FOP would > take this responsibility, even if it is not mandated... Yes, but if we reason like this FOP would soon become like those Swiss knives with ridiculous numbers of blades ;-) (Note that I have nothing against Swiss knives, I've been owning one myself for years and it serves me very well! But it has only 6 functions.) More seriously, you could take it the other way around, and find users who wouldn't be happy at all to see FOP suddenly break their texts at arbitrary places, and would rather be warned when such situations are occuring, so that they can re-work their contents. Vincent (In reply to comment #17) wrap-option is cited 3 times in REC 1.1: [1] Section 1.2.1 Paging and Scrolling [2] Section 4.7.2 Line-building [3] Section 7.16.13 wrap-option IMHO, there is inconsistency in recommendation, especially between [2] and [3]: in [3], when wrap-option="wrap", REC says "line-breaking will occur if the line overflows the available block width" but in [2], there is no tool to do what is required in [3], as Vincent said. So, I wonder whether FOP should not enforce a break if there is no break opportunity other than between 2 normal chars, regarding the wrap-option. [1] http://www.w3.org/TR/xsl11/#d0e297 [2] http://www.w3.org/TR/xsl11/#area-linebuild [3] http://www.w3.org/TR/xsl11/#wrap-option Pascal Hi Vincent, (In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #15) > > IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. > > Line-wrapping being the more 'blind' of the two, if you will (simply wrap to > > fit the text into the available space). The Recommendation remains vague in > > this respect (and purposely so, I think). > I don't think so. Section 4.7.2, “Line-building”, states that “The > partitioning [must occur] at legal line-breaks. [...] the rules of the > language, script and hyphenation constraints in effect must permit a line-break > between [two areas].” > http://www.w3.org/TR/xsl11/#area-linebuild > It /might/ be acceptable to relax the line-breaking algorithm somehow when the > 'script' property is set to 'none', but frankly I'm not too keen on > implementing a special treatment just to cope with pathological cases. The code > is already complicated enough. Agreed that the code is complex, but I don't believe we are dealing with extreme or pathological cases here. Most users run into the problem of content overflowing a table cell at some point and this is shown by the number of questions on the subject that occuir on fop-user. Far too many for my liking. As already indicated by Pascal there are some inconsistencies in the spec in this area. > > A legitimate option, but not always as easily done as it said. To get the > > effect of real dynamic content-wrapping (fit as many characters on the line as > > you can), you would force the user to insert a ZWSP in between every two > > characters (either that or they should make a choice of every so-many > > characters). > Exactly. And I bet it's less complicated to implement some XSLT function or > pre-processing step to do that than a dedicated extension in FOP's layout > engine. > > > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry: > > > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow > > > > Maybe not, but it would mean a big relief for many users, I think, if FOP would > > take this responsibility, even if it is not mandated... > Yes, but if we reason like this FOP would soon become like those Swiss knives > with ridiculous numbers of blades ;-) (Note that I have nothing against Swiss > knives, I've been owning one myself for years and it serves me very well! But > it has only 6 functions.) > More seriously, you could take it the other way around, and find users who > wouldn't be happy at all to see FOP suddenly break their texts at arbitrary > places, and would rather be warned when such situations are occuring, so that > they can re-work their contents. Well I agree that FOP shouldn't agree to a code change for every possible use-case, but we should try to adjust the code to assist the most common use cases. I do believe that text with few break possibilities in narrow table columns is a common use case. The other thing to bear in mind is that no one is saying this feature has to be implemented right now, but I don't think it should be dismissed either. > Vincent Thanks, Chris Hi Chris, (In reply to comment #19) > Hi Vincent, > > (In reply to comment #17) > > (In reply to comment #16) > > > (In reply to comment #15) > > > IIC, there's a subtle difference between line-WRAPPING and line-BREAKING. > > > Line-wrapping being the more 'blind' of the two, if you will (simply wrap to > > > fit the text into the available space). The Recommendation remains vague in > > > this respect (and purposely so, I think). > > I don't think so. Section 4.7.2, “Line-building”, states that “The > > partitioning [must occur] at legal line-breaks. [...] the rules of the > > language, script and hyphenation constraints in effect must permit a line-break > > between [two areas].” > > http://www.w3.org/TR/xsl11/#area-linebuild > > It /might/ be acceptable to relax the line-breaking algorithm somehow when the > > 'script' property is set to 'none', but frankly I'm not too keen on > > implementing a special treatment just to cope with pathological cases. The code > > is already complicated enough. > > Agreed that the code is complex, but I don't believe we are dealing with > extreme or pathological cases here. Most users run into the problem of content > overflowing a table cell at some point and this is shown by the number of > questions on the subject that occuir on fop-user. Far too many for my liking. Well, in most cases this was because they were using the keep-together shorthand, that from version 0.95 on started to be applied to lines as well. Usually they were happy with the result once they switched to keep-together.within-column. > As already indicated by Pascal there are some inconsistencies in the spec in > this area. I guess the best way to have a definitive answer is to ask for clarification on xsl-editors@ > > > A legitimate option, but not always as easily done as it said. To get the > > > effect of real dynamic content-wrapping (fit as many characters on the line as > > > you can), you would force the user to insert a ZWSP in between every two > > > characters (either that or they should make a choice of every so-many > > > characters). > > Exactly. And I bet it's less complicated to implement some XSLT function or > > pre-processing step to do that than a dedicated extension in FOP's layout > > engine. > > > > But I don't think this is FOP's responsibility to do that. See also the following FAQ entry: > > > > http://xmlgraphics.apache.org/fop/faq.html#cells-overflow > > > > > > Maybe not, but it would mean a big relief for many users, I think, if FOP would > > > take this responsibility, even if it is not mandated... > > Yes, but if we reason like this FOP would soon become like those Swiss knives > > with ridiculous numbers of blades ;-) (Note that I have nothing against Swiss > > knives, I've been owning one myself for years and it serves me very well! But > > it has only 6 functions.) > > More seriously, you could take it the other way around, and find users who > > wouldn't be happy at all to see FOP suddenly break their texts at arbitrary > > places, and would rather be warned when such situations are occuring, so that > > they can re-work their contents. > > Well I agree that FOP shouldn't agree to a code change for every possible > use-case, but we should try to adjust the code to assist the most common use > cases. I do believe that text with few break possibilities in narrow table > columns is a common use case. > > The other thing to bear in mind is that no one is saying this feature has to be > implemented right now, but I don't think it should be dismissed either. Sure ;-) I guess something could be done when the 'script' property is implemented, but I keep thinking that the issue is better solved upstream. > Thanks, > > Chris Vincent The inconvenience of overflowing lines in table-cells when there is no break opportunity instead of generally breaking/wrapping a word to next line (as done by version 0.20.5 !) is the reason why we don't use version 0.95 and 1.0 Overwriting information in the next column is so critical in our eyes, that we will never use a tool doing so. Andreas, Is there an updated patch for 1.0 version? I would like to use it, if possible. Sergey (In reply to comment #22) > Is there an updated patch for 1.0 version? I would like to use it, if possible. Unfortunately not, and I do not immediately have any time available to dedicate to porting it. Not that it would take _that_ much time, but still it's not a matter of minutes... That said, I do recall one peculiar effect that I think is not yet mentioned in this report: if you would have a mixture of words that do fit within the line with words that don't, then the wrapping effect would only be applied to the latter. That is, for the remainder you will get usual line-breaks, only those words that do not fit on the line in their entirety would ultimately get in-word breaks. That might lead to unexpected output, like: short short ver yveryverylong short short short where one might expect something like (true wrapping): short short ver yveryverylong s hort short shor t resetting P2 open bugs to P3 pending further review increase priority for bugs with a patch change status from ASSIGNED to NEW for consistency |