Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing |
Summary: | Linebreaking does not work properly with Japanese punctuation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Writer | Reporter: | larsko <lars> | ||||||
Component: | formatting | Assignee: | stefan.baltzer | ||||||
Status: | CLOSED FIXED | QA Contact: | issues@sw <issues> | ||||||
Severity: | Trivial | ||||||||
Priority: | P3 | CC: | curvirgo, issues, kamataki, karl.hong, masaya.k, ooo, tora3, y-catch | ||||||
Version: | OOo 3.0 Beta 2 | ||||||||
Target Milestone: | --- | ||||||||
Hardware: | PC | ||||||||
OS: | All | ||||||||
Issue Type: | DEFECT | Latest Confirmation in: | --- | ||||||
Developer Difficulty: | --- | ||||||||
Attachments: |
|
Description
larsko
2008-08-08 07:41:26 UTC
MRU->ES: pls evaluate; maybe this is already fixed in 3.0... Still reproducible in DEV300m29 Created attachment 55654 [details]
a sample file
fme->larsko: This is a feature called 'hanging punctuation'. If you have the Asian features enabled (Tools - Options - Language settings - Languages), you will find a tab page 'Asian typography' containing a setting 'allow hanging punctuation' in the Format - Paragraph dialog. . fme: thanks for the pointer. I've read up on hanging punctuation, but this only seems to include 。 and ã€, not any of the other characters I've seen this occur with -- which aren't really punctuation (cf. http://www.w3.org/TR/ jlreq/). Can you point me to the reference used when deciding that those characters should be included in hanging punctuation? fme->larsko: I think its the characters listed in Tools - Options - Language Settings - Asian Layout - Not at start of line. fme: So all those characters are allowed as hanging punctuation? Seems a bit much since the w3 recommendation only specifies dot and comma. The Japanese wikipedia article on this topic [1] also only specifies real punctuation. Should the small Kana etc. really be part of hanging punctuation? [1] http://ja.wikipedia.org/wiki/ã¶ã‚‰ä¸‹ã’組㿠fme->larsko: Thanks for the pointer. It looks like making the hanging punctuation depend on the forbidden characters does not seem to be correct. fme->khong: Please have a look and take over. Looks like Word also does not allow all the not-at-start to be hanging punctuation either. fme->tora: Any input from your side? . tora->fme: Thank you for giving me a chance to comment. Current implementation of OOo has two lists of characters: (1) Not at start of line (2) Not at end of line Theoretical implementation might have three lists of characters: (1) Not at start of line (2) Not at end of line (3) Punctuation Current implementation of OOo treats (1) as (3) while Word seems to use three lists. In Word, a set of both (1) and (2) can be tweaked in a similar way of OOo through one of the followings: - the tab Asian Typography of the menu Tools > Options - the button Options in the tab Asian Typography of the menu Format > Paragraph Word names (1) "Cannnot start line:" and (2) "Cannot end line:." (3) of Word could be specified through somewhere or be hard-coded. I am not sure, but the dialog Properties of IME 2003 has a combo box listing the following combinations: (a) ã€ã€‚ u3001 and u3002 (widely used for several purposes) (b) ,. uFF0C and uFF0E (sometimes used in a thesis, similar paper, book,...) (c) ã€ï¼Ž u3001 and uFF0E (sometimes can be seen in a book or magazine) (d) ,。 uFF0C and u3002 (sometimes can be seen in a book or magazine) http://www.unicode.org/charts/PDF/U3000.pdf http://www.unicode.org/charts/PDF/UFF00.pdf Some punctuation characters described above and some special characters such as 「 and ã€, u300C and u300D can be pushed within the margin. Word offers this feature while current OOo does not. A concept of this feature is illustrated in http://www.openoffice.org/nonav/issues/showattachment.cgi/18738/concept01.png attached in the issue 36313. http://www.openoffice.org/nonav/issues/showattachment.cgi/18786/Japanese_Justification_0.1.sxw attached in the issue 36408 does also try to describe the concept, but it has not been finished yet. In sum, it would be better if OOo has the third list (3) for punctuation characters which sorely can be hanged beyond the margin and the first list (1) should not be used for the hanging characters. In addition to the third list (3), a new feature could be also incorporated. The feature compresses a total width of line to meet to the margin by slightly shrinking rooms between every characters in a line if the line ends with a hanging punctuation character or a combination of one or more hanging characters. fme->tora: Thank you for your detailed analysis of this issue. While I agree that having a third list would be the perfect solution, I'm tempted to ask whether we can't start with a hard-coded list? Looks like this list is hard-coded in Word as well. Currently the LineBreakUserOptions which are passed to the break iterator cannot hold a third list. So introducing a third list means: 1. Changing the UI 2. Changing the API 3. Changing settings.xml in the ODF files Looks like a lot of work for this not-too-much-requested issue. Another point I like to address is this: Changing the line break algorithm means that existing documents might change their layout. Can we cope with this or should we introduce some kind of (hidden) compatibility option so that only new documents make use of the changed line break behavior? Forbiden rule characters are editible by end users, so it needs to be passed from writer to breakiterator. If the third hanging punctuation list is hidden from end users, we can keep it in locale data, wihch will be known only inside i18npool module, and no API and UI changes are required. If we don't need the compatibility option fme mentioned, I will implement third list in locale data in i18npool in next release. tora->khong: I agree with you. khong->tora, I add thrid list <LineBreakHangingCharacters>!,.:;?ã€ã€‚ï¼ï¼Œï¼Žï¼šï¼›ï¼Ÿ</LineBreakHangingCharacters> in localedata for CJK languages. Please let me know if the list is sufficient. Fixed in cws i18n45. tora->khong: Thank you for your implementation. The list for Japanese might be either (a) or (b). (a) <LineBreakHangingCharacters>ã€ã€‚,.</LineBreakHangingCharacters> (b) <LineBreakHangingCharacters>ã€ã€‚</LineBreakHangingCharacters> I am asking comments in the mailing list of Japanese community and letting you know. ready for QA. tora->khong: Could you revise the locale data of Japanese? <LineBreakHangingCharacters>ã€ã€‚,.</LineBreakHangingCharacters> Notes: - 〠u3001 IDEOGRAPHIC COMMA - 。 u3002 IDEOGRAPHIC FULL STOP - , uFF0C FULLWIDTH COMMA - . uFF0E FULLWIDTH FULL STOP References: http://www.unicode.org/charts/PDF/U3000.pdf http://www.unicode.org/charts/PDF/UFF00.pdf http://www.unicode.org/Public/UNIDATA/NamesList.txt Discussion: http://www.freeml.com/openoffice/11243/latest (Japanese) Added khong on c/c. Stefan -> Karl: Please note toras last question and comment. Thank you. khong->tora, yes, that is the list currently implemented in cws i18n45. tora->khong, thanks a lot. SBA: Verified in CWS i18n45. I will attach a bugdoc that has all given example characters from the initial description at line ends. To see, add and remove "i" letters to "shift the text" of the respective line. Created attachment 56602 [details]
Bugdoc with all given example characters at line ends
Correcting target to OOo 3.1. CWS i18n45 is already integrated. OK in OOO310_m3. Closing issue. |