Apache OpenOffice (AOO) Bugzilla – Issue 97602
New locale data for Asturian Language ast_ES
Last modified: 2013-08-07 15:02:13 UTC
Hi I've created a new locale data for asturian Launguage ast_ES The file is 1230386981_ast_ES.xml I would like to know, what's the next stept? and It's the file correct or contains any error? thanks astur@softastur.org
Created attachment 59020 [details] new locale data for Asturian language ast_ES
confirm
Sorry, seems I overlooked this earlier. Thank you for your contribution. This locale data files looks as if it was generated using the locale data generator at http://www.it46.se/localegen/, so if you do not plan further contributions to the OOo code base that would need a signed SCA (see http://wiki.services.openoffice.org/wiki/Sun_Contributor_Agreement ) you may as well make use of the joint copyright agreement as lined out in http://www.it46.se/localegen/copyright.php @it46: For this I formally ask Alberto Escudero-Pascual to contribute the attached locale data file to the OOo code base under the JCA/SCA he signed. Thanks Eike I had a quick glance at the data and spotted the following: <DecimalSeparator>,</DecimalSeparator> but <Time100SecSeparator>.</Time100SecSeparator> Usually the Time100SecSeparator is the same as DecimalSeparator, not necessarily, but ... Note that in the the format codes MM:SS.00 and [HH]:MM:SS.00 the dot is used, this would have to be corrected as well if the Time100SecSeparator was changed. <QuotationStart>“</QuotationStart> <QuotationEnd>”</QuotationEnd> <DoubleQuotationStart>‘</DoubleQuotationStart> <DoubleQuotationEnd>’</DoubleQuotationEnd> Quotation denotes single quotes and DoubleQuotation, well, double quotes, here it is vice versa. <TimeAM>AM</TimeAM> <TimePM>PM</TimePM> The untranslated English terms, I guess that's missed? Currency format codes contain ([CURRENCY]#.##0,00), note the parentheses without minus sign, usually this is only correct for US-derived format codes and needs to be replaced by -[CURRENCY]#.##0,00 or - [CURRENCY]#.##0,00 or similar, depending where the minus sign is positioned. <FollowPageWord>s/páx</FollowPageWord> <FollowPageWord>s/páx</FollowPageWord> Twice the same? In the Writer's index the first instance is used for "page x and the following page" (singular) and the second instance for "page x and following pages" (plural). I think that's all.
Hi! This strings is imported from Gnome Locale. I think is the same :O We didn't undertand if we need accept the copyright, can you say us again, please :) Thanks. In generally the locale is the same as spanish locale: Correct <DecimalSeparator>,</DecimalSeparator> Correct <Time100SecSeparator>.</Time100SecSeparator> Correct <QuotationStart>“</QuotationStart> Correct <QuotationEnd>”</QuotationEnd> Correct <DoubleQuotationStart>‘</DoubleQuotationStart> Correct <DoubleQuotationEnd>’</DoubleQuotationEnd> Correct <TimeAM>AM</TimeAM> (Same as english) Correct <TimePM>PM</TimePM> (Same as english) The untranslated English terms, I guess that's missed? Yes ;) Currency format codes contain ([CURRENCY]#.##0,00). Us currency is Euro, I think is correct :O Correct <FollowPageWord>s/páx</FollowPageWord> Correct <FollowPageWord>s/páx</FollowPageWord> Correct because the singular is "páxina" and plural "páxines", then in abreviation is "páx" ;) Will be important for us can see the translations in the application Openoffice, because only with the strings is complicate. Thanks very much!
I was thinking in this, I think I was wrong: I think must be: <Time100SecSeparator>:</Time100SecSeparator> <QuotationStart>'</QuotationStart> <QuotationEnd>'</QuotationEnd> <DoubleQuotationStart>"</DoubleQuotationStart> <DoubleQuotationEnd>"</DoubleQuotationEnd> <TimeAM>AM</TimeAM> <TimePM>PM</TimePM> -#.##0,00 [CURRENCY] <FollowPageWord>s/páx</FollowPageWord> <FollowPageWord>s/páxs</FollowPageWord> Sorry very much, but maybe "astur" can add the locale of Gnome for more information. astur Can you confirm this? Thanks very much.
> We didn't undertand if we need accept the copyright, can you say us again, > please :) Thanks. For locale data generated at it46.se there are two possibilities: one as lined out above, by using the generator you accepted that Alberto contributes the data based on your work to other projects, such as OOo and the CLDR. Alberto signed the JCA with OOo. You would not need to sign the SCA. The other possibility is, you sign the SCA with OOo yourself, which would be needed anyway if you plan to contribute anything else than locale data to the code repository, such as patches or localization / translation of UI elements. > I was thinking in this, I think I was wrong: I think must be: > <Time100SecSeparator>:</Time100SecSeparator> No, that would be definitely wrong, as it would be identical to the TimeSeparator used to separate hours, minutes and seconds. Time100SecSeparator is used to separate seconds from 100th seconds or milliseconds. > <QuotationStart>'</QuotationStart> > <QuotationEnd>'</QuotationEnd> > <DoubleQuotationStart>"</DoubleQuotationStart> > <DoubleQuotationEnd>"</DoubleQuotationEnd> These are now the ASCII quote characters instead of the typographic quote characters, which looks wrong. > -#.##0,00 [CURRENCY] So: no space between minus sign and amount, and the currency symbol follows the amount, separated by a blank.
Hi, I've changed several things, following the suggestions of er. I hope they are ok now. I attach the new version of the file 1230386981_ast_ES.xml
Created attachment 60961 [details] new version of 1230386981_ast_ES.xml
I understand that the correct format for currency is: -[CURRENCY]#.##0,00 but I'm not really sure, and also, I don't know where to change it inside the .xml file. Sorry. er, can you change it for me?
it46->er long time no hear :) it46->marquinos,astur Do you think you can fax me a simple letter stating that you used localegen to create the locale and allows myself to submit on your behalf? Alberto
@it46: > it46->er long time no hear :) Yay, good that we're both still alive ;-) Btw, that currency format section definitely needs some hint that the default format usually is not the format a locale uses and people should think about and select the correct format. @astur: Thanks, data looks better now. > I understand that the correct format for currency is: > -[CURRENCY]#.##0,00 > > but I'm not really sure, and also, I don't know where to change it inside the > .xml file. Sorry. The format codes are generated from input at localegen's step 3, section "I. Currency", "I6. Currency format for positive values" and "I7. Currency format for negative values". If you chose other than the default you get different formats. > er, can you change it for me? Sure, I can do this manually for OOo. Just that if the generated locale data file would be used for other purposes, e.g. the CLDR repository, the data would still be wrong.. Regarding the LC_INDEX data: <IndexKey phonetic="false" default="true" unoid="alphanumeric">A-Z</IndexKey> <UnicodeScript>0</UnicodeScript> <UnicodeScript>1</UnicodeScript> <UnicodeScript>2</UnicodeScript> <UnicodeScript>3</UnicodeScript> <UnicodeScript>4</UnicodeScript> As the language uses accented characters, the IndexKey data may need rework. Currently, as "A-Z" is specified, in a Writer's index section the sort order would be A-Z and all accented characters after Z in order of the Unicode collation. You may want to have accented characters listed after their ASCII equivalent, for example the letter Á between A and B: "A Á B-Z". The Spanish locale data defines IndexKey "A-N Ñ O Ó P-Z". See also http://www.it46.se/localegen/docs/Creating_locale_OOo_LocaleGEN_v2.1.php 3.7 LC_INDEX Section Are you sure that Unicode scripts 2-4 are needed to write Asturian? I doubt so.. Compare with the doc's table at LC_INDEX section and character charts available at http://www.unicode.org/charts/
astur-->er ("the IndexKey data may need rework") 1) Well this is our alphabet: A Á B C D E É F G H Ḥ I Í J K L Ḷ M N Ñ O Ó P Q R S T U Ú Ü V W X Y Z a á b c d e é f g h ḥ i í j k l ḷ m n ñ o ó p q r s t u ú ü v w x y z These are all the letters that we use. It's the same as spanish, but we need Ḥ and Ḷ (lowercase ḥ and ḷ) UTF-8. We never use ḷ as single, but as digraf in this form ḷḷ or ḶḶ (examples Ḷḷaciana, butieḷḷo) Ḥ and ḥ are used like a simple graph (example guaḥe, ḥoguera) so I think the correct is: <IndexKey phonetic="false" default="true" unoid="alphanumeric">A-C {CH} D-H Ḥ I-K L {LL} {ḶḶ} M-N Ñ O-Z</IndexKey> 2) It seems that Unicode scripts 2-4 aren't needed to write Asturian 3) astur-->it46, alberto ("Do you think you can fax me a simple letter stating that you used localegen to create the locale and allows myself to submit on your behalf?") I'm confused, but I think that's important. Me, astur, used localegen to create de locale, and allow you to submit in our name (astur and marquinos). If you need some document with my signature or similar please send it me, and I will reply it confirming.
Hi to all Well, I'm really lost now, and I don't know in which point we are.. I would like to know what's the next step, and when we could begin to translate strings in openoffice. The Asturian team is ready to start, but we don't know how to do...
@astur: You can start to translate at any time, independent of inclusion of the locale data, just use the 'ast' language code. Assuming you want to use the Pootle service, please ask on the dev@l10n mailing list for how to proceed. For other details see http://wiki.services.openoffice.org/wiki/NLC:New_Translators_Start_here http://wiki.services.openoffice.org/wiki/Category:Localization However, to include the translated data we will need a signed SCA, please see http://wiki.services.openoffice.org/wiki/Sun_Contributor_Agreement If there's anything left to clarify, please discuss on the dev@l10n mailing list. Regarding characters used and the IndexKey: If your alphabet includes Ü and ü, these probably should also be included in the IndexKey at their proper position, same for Á,É,Í,Ó,Ú I guess this then would be A Á B-C {CH} D-E É F-H Ḥ I Í J-K L {LL} {ḶḶ} M-N Ñ O Ó P-U Ú Ü V-Z Btw, the double entry L {LL} without dot is really intended? The characters with dot below Ḥ and Ḷ actually need the Unicode block "Latin Extended Additional" (1E00-1EFF), which corresponds to our UnicodeScript value 37.
Reassigning to spare time account.
Yes, you are right. Better without double entry L {LL} (I included it because I was now sure) That's ok: A Á B-C {CH} D-E É F-H Ḥ I Í J-K L {ḶḶ} M-N Ñ O Ó P-U Ú Ü V-Z Thank you very much
erack, Pinging about this issue to see if you need anything further from the bug submitter before committing the change.
Test compiling the data of the second attachment gave: Warning: FormatCode formatindex="12" for currency uses parentheses for negative amounts, which probably is not correct for locales not based on en_US. Warning: FormatCode formatindex="13" for currency uses parentheses for negative amounts, which probably is not correct for locales not based on en_US. Warning: FormatCode formatindex="14" for currency uses parentheses for negative amounts, which probably is not correct for locales not based on en_US. Warning: FormatCode formatindex="15" for currency uses parentheses for negative amounts, which probably is not correct for locales not based on en_US. Warning: FormatCode formatindex="17" for currency uses parentheses for negative amounts, which probably is not correct for locales not based on en_US. Error: Time100SecSeparator not present in FormatCode formatindex="44". Error: Time100SecSeparator+00 not present in FormatCode formatindex="44". Error: Ordering of Time100SecSeparator and TimeSeparator not correct in formatindex="44". Warning: formatindex="4","44","45" are the only FormatCode elements checked for separator usage, there may be others that have errors. Error: Time100SecSeparator not present in FormatCode formatindex="45". Error: Time100SecSeparator+00 not present in FormatCode formatindex="45". Error: Ordering of Time100SecSeparator and TimeSeparator not correct in formatindex="45". Warning: formatindex="4","44","45" are the only FormatCode elements checked for separator usage, there may be others that have errors. I'll correct the errors on the fly, as I now have enough information on how it should look like.
In cws locales32: revision 273444 i18npool/inc/i18npool/lang.h i18npool/source/isolang/isolang.cxx i18npool/source/localedata/data/ast_ES.xml i18npool/source/localedata/data/localedata_euro.map i18npool/source/localedata/data/makefile.mk i18npool/source/localedata/localedata.cxx solenv/inc/postset.mk svtools/source/misc/langtab.src Corrected also IndexKey ad UnicodeScript values as indicated above.
Created attachment 63239 [details] corrected locale data as committed, for reference
Reassigning to QA for verification.
Verified in CWS locales32.
Hi, I would like to know if it's neccessary to do some step more to see the results in Openoffice.org As you can see, we have made progressions in the translations, http://www.sunvirtuallab.com:32300/ast/openoffice_org/ ...and we wonder when we can see this results, or if it's neccessary another requirement more. Thank you!!
OK in DEV300_m60. Closed.
Note: This issue is (was) about localedata. Integrated since build DEV300_m60, thus visible in OOo 3.2 Beta and release candidates and final. See download.openoffice.org for developer snapshots. Translation goes via other channels.