97602 – New locale data for Asturian Language ast_ES

Issue 97602 - New locale data for Asturian Language ast_ES

Summary: New locale data for Asturian Language ast_ES

Status:	CLOSED FIXED

Alias:	None

Product:	Internationalization
Classification:	Code
Component:	localedata (show other issues)
Version:	current
Hardware:	All All

Importance:	P3 Trivial (vote)
Target Milestone:	---
Assignee:	stefan.baltzer
QA Contact:	issues@l10n

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-12-27 14:51 UTC by astur
Modified:	2013-08-07 15:02 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	ENHANCEMENT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
new locale data for Asturian language ast_ES (13.13 KB, text/xml) 2008-12-27 14:55 UTC, astur	no flags	Details
new version of 1230386981_ast_ES.xml (13.13 KB, text/plain) 2009-03-16 20:40 UTC, astur	no flags	Details
corrected locale data as committed, for reference (15.19 KB, text/xml) 2009-06-26 21:59 UTC, erack	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description astur 2008-12-27 14:51:46 UTC

Hi
I've created a new locale data for asturian Launguage ast_ES

The file is 1230386981_ast_ES.xml

I would like to know, what's the next stept? and 

It's the file correct or contains any error? thanks

astur@softastur.org

Comment 1 astur 2008-12-27 14:55:34 UTC

Created attachment 59020 [details]
new locale data for Asturian language ast_ES

Comment 2 ccheney 2009-03-14 06:59:34 UTC

confirm

Comment 3 ooo 2009-03-16 14:01:39 UTC

Sorry, seems I overlooked this earlier.

Thank you for your contribution.
This locale data files looks as if it was generated using the locale
data generator at http://www.it46.se/localegen/, so if you do not plan
further contributions to the OOo code base that would need a signed SCA
(see http://wiki.services.openoffice.org/wiki/Sun_Contributor_Agreement )
you may as well make use of the joint copyright agreement as lined out
in http://www.it46.se/localegen/copyright.php

@it46: For this I formally ask Alberto Escudero-Pascual to contribute
the attached locale data file to the OOo code base under the JCA/SCA he
signed.

Thanks
  Eike


I had a quick glance at the data and spotted the following:

<DecimalSeparator>,</DecimalSeparator>
but
<Time100SecSeparator>.</Time100SecSeparator>

Usually the Time100SecSeparator is the same as DecimalSeparator, not
necessarily, but ...
Note that in the the format codes
MM:SS.00
and
[HH]:MM:SS.00
the dot is used, this would have to be corrected as well if the
Time100SecSeparator was changed.

<QuotationStart>“</QuotationStart>
<QuotationEnd>”</QuotationEnd>
<DoubleQuotationStart>‘</DoubleQuotationStart>
<DoubleQuotationEnd>’</DoubleQuotationEnd>

Quotation denotes single quotes and DoubleQuotation, well, double quotes, here
it is vice versa.


<TimeAM>AM</TimeAM>
<TimePM>PM</TimePM>

The untranslated English terms, I guess that's missed?

Currency format codes contain ([CURRENCY]#.##0,00), note the parentheses without
minus sign, usually this is only correct for US-derived format codes and needs
to be replaced by
-[CURRENCY]#.##0,00
or
- [CURRENCY]#.##0,00
or similar, depending where the minus sign is positioned.


<FollowPageWord>s/páx</FollowPageWord>
<FollowPageWord>s/páx</FollowPageWord>

Twice the same? In the Writer's index the first instance is used for "page x and
the following page" (singular) and the second instance for "page x and following
pages" (plural).

I think that's all.

Comment 4 maacub 2009-03-16 15:30:31 UTC

Hi!
This strings is imported from Gnome Locale. I think is the same :O

We didn't undertand if we need accept the copyright, can you say us again,
please :) Thanks.

In generally the locale is the same as spanish locale:
Correct <DecimalSeparator>,</DecimalSeparator>
Correct <Time100SecSeparator>.</Time100SecSeparator>
Correct <QuotationStart>“</QuotationStart>
Correct <QuotationEnd>”</QuotationEnd>
Correct <DoubleQuotationStart>‘</DoubleQuotationStart>
Correct <DoubleQuotationEnd>’</DoubleQuotationEnd>
Correct <TimeAM>AM</TimeAM> (Same as english)
Correct <TimePM>PM</TimePM> (Same as english)

The untranslated English terms, I guess that's missed?
Yes ;)

Currency format codes contain ([CURRENCY]#.##0,00).
Us currency is Euro, I think is correct :O

Correct <FollowPageWord>s/páx</FollowPageWord>
Correct <FollowPageWord>s/páx</FollowPageWord>
Correct because the singular is "páxina" and plural "páxines", then in
abreviation is "páx" ;)

Will be important for us can see the translations in the application Openoffice,
because only with the strings is complicate.

Thanks very much!

Comment 5 maacub 2009-03-16 17:16:54 UTC

I was thinking in this, I think I was wrong: I think must be:
<Time100SecSeparator>:</Time100SecSeparator>

<QuotationStart>'</QuotationStart>
<QuotationEnd>'</QuotationEnd>
<DoubleQuotationStart>"</DoubleQuotationStart>
<DoubleQuotationEnd>"</DoubleQuotationEnd>

<TimeAM>AM</TimeAM>
<TimePM>PM</TimePM>

-#.##0,00 [CURRENCY]

<FollowPageWord>s/páx</FollowPageWord>
<FollowPageWord>s/páxs</FollowPageWord>

Sorry very much, but maybe "astur" can add the locale of Gnome for more information.
astur Can you confirm this?
Thanks very much.

Comment 6 ooo 2009-03-16 18:12:51 UTC

> We didn't undertand if we need accept the copyright, can you say us again,
> please :) Thanks.

For locale data generated at it46.se there are two possibilities: one as
lined out above, by using the generator you accepted that Alberto
contributes the data based on your work to other projects, such as OOo
and the CLDR. Alberto signed the JCA with OOo. You would not need to
sign the SCA.

The other possibility is, you sign the SCA with OOo yourself, which
would be needed anyway if you plan to contribute anything else than
locale data to the code repository, such as patches or localization
/ translation of UI elements.


> I was thinking in this, I think I was wrong: I think must be:
> <Time100SecSeparator>:</Time100SecSeparator>

No, that would be definitely wrong, as it would be identical to the
TimeSeparator used to separate hours, minutes and seconds.
Time100SecSeparator is used to separate seconds from 100th seconds or
milliseconds.

> <QuotationStart>'</QuotationStart>
> <QuotationEnd>'</QuotationEnd>
> <DoubleQuotationStart>"</DoubleQuotationStart>
> <DoubleQuotationEnd>"</DoubleQuotationEnd>

These are now the ASCII quote characters instead of the typographic
quote characters, which looks wrong.

> -#.##0,00 [CURRENCY]

So: no space between minus sign and amount, and the currency symbol
follows the amount, separated by a blank.

Comment 7 astur 2009-03-16 20:38:37 UTC

Hi, I've changed several things, following the suggestions of er. I hope they
are ok now. I attach the new version of the file 1230386981_ast_ES.xml

Comment 8 astur 2009-03-16 20:40:52 UTC

Created attachment 60961 [details]
new version of 1230386981_ast_ES.xml

Comment 9 astur 2009-03-16 20:48:23 UTC

I understand that the correct format for currency is:
-[CURRENCY]#.##0,00

but I'm not really sure, and also, I don't know where to change it inside the
.xml file.  Sorry.

er, can you change it for me?

Comment 10 it46 2009-03-17 00:10:16 UTC

it46->er long time no hear :)

it46->marquinos,astur
Do you think you can fax me a simple letter stating that you used localegen to
create the locale and allows myself to submit on your behalf? 

Alberto

Comment 11 ooo 2009-03-17 12:33:39 UTC

@it46:
> it46->er long time no hear :)
Yay, good that we're both still alive ;-)

Btw, that currency format section definitely needs some hint that the
default format usually is not the format a locale uses and people should
think about and select the correct format.


@astur:
Thanks, data looks better now.

> I understand that the correct format for currency is:
> -[CURRENCY]#.##0,00
> 
> but I'm not really sure, and also, I don't know where to change it inside the
> .xml file.  Sorry.

The format codes are generated from input at localegen's step 3, section
"I. Currency", "I6. Currency format for positive values" and "I7.
Currency format for negative values". If you chose other than the
default you get different formats.

> er, can you change it for me?

Sure, I can do this manually for OOo. Just that if the generated locale
data file would be used for other purposes, e.g. the CLDR repository,
the data would still be wrong..


Regarding the LC_INDEX data:
<IndexKey phonetic="false" default="true" unoid="alphanumeric">A-Z</IndexKey>
<UnicodeScript>0</UnicodeScript>
<UnicodeScript>1</UnicodeScript>
<UnicodeScript>2</UnicodeScript>
<UnicodeScript>3</UnicodeScript>
<UnicodeScript>4</UnicodeScript>

As the language uses accented characters, the IndexKey data may need
rework. Currently, as "A-Z" is specified, in a Writer's index section
the sort order would be A-Z and all accented characters after Z in order
of the Unicode collation. You may want to have accented characters
listed after their ASCII equivalent, for example the letter Á between
A and B: "A Á B-Z". The Spanish locale data defines IndexKey
"A-N Ñ O Ó P-Z". See also
http://www.it46.se/localegen/docs/Creating_locale_OOo_LocaleGEN_v2.1.php
3.7 LC_INDEX Section

Are you sure that Unicode scripts 2-4 are needed to write Asturian?
I doubt so..  Compare with the doc's table at LC_INDEX section and
character charts available at http://www.unicode.org/charts/

Comment 12 astur 2009-03-17 15:27:07 UTC

astur-->er ("the IndexKey data may need rework")

1) Well this is our alphabet:
A Á B C D E É F G H Ḥ I Í J K L Ḷ M N Ñ O Ó P Q R S T U Ú Ü V W X Y Z

a á b c d e é f g h ḥ i í j k l ḷ m n ñ o ó p q r s t u ú ü v w x y z

These are all the letters that we use. It's the same as spanish, but we need Ḥ
and Ḷ (lowercase ḥ and ḷ) UTF-8. We never use ḷ as single, but as digraf in this
form ḷḷ  or ḶḶ  (examples Ḷḷaciana, butieḷḷo)

Ḥ and ḥ are used like a simple graph (example guaḥe, ḥoguera) 

so I think the correct is:
<IndexKey phonetic="false" default="true" unoid="alphanumeric">A-C {CH}
D-H Ḥ I-K L {LL} {ḶḶ} M-N Ñ O-Z</IndexKey>

2) It seems that Unicode scripts 2-4 aren't needed to write Asturian

3) astur-->it46, alberto ("Do you think you can fax me a simple letter stating
that you used localegen to create the locale and allows myself to submit on your
behalf?")

I'm confused, but I think that's important. 

Me, astur, used localegen to create de locale, and allow you to submit in our
name (astur and marquinos). If you need some document with my signature or
similar please send it me, and I will reply it confirming.

Comment 13 astur 2009-03-21 15:28:51 UTC

Hi to all
Well, I'm really lost now, and I don't know in which point we are..

I would like to know what's the next step, and when we could begin to translate
strings in openoffice. The Asturian team is ready to start, but we don't know
how to do...

Comment 14 erack 2009-03-23 21:58:51 UTC

@astur:
You can start to translate at any time, independent of inclusion of the
locale data, just use the 'ast' language code. Assuming you want to use
the Pootle service, please ask on the dev@l10n mailing list for how to
proceed. For other details see
http://wiki.services.openoffice.org/wiki/NLC:New_Translators_Start_here
http://wiki.services.openoffice.org/wiki/Category:Localization

However, to include the translated data we will need a signed SCA,
please see
http://wiki.services.openoffice.org/wiki/Sun_Contributor_Agreement

If there's anything left to clarify, please discuss on the dev@l10n
mailing list.


Regarding characters used and the IndexKey:
If your alphabet includes Ü and ü, these probably should also be
included in the IndexKey at their proper position, same for Á,É,Í,Ó,Ú
I guess this then would be

A Á B-C {CH} D-E É F-H Ḥ I Í J-K L {LL} {ḶḶ} M-N Ñ O Ó P-U Ú Ü V-Z

Btw, the double entry  L {LL}  without dot is really intended?


The characters with dot below Ḥ and Ḷ actually need the Unicode block
"Latin Extended Additional" (1E00-1EFF), which corresponds to our
UnicodeScript value 37.

Comment 15 erack 2009-03-23 22:10:12 UTC

Reassigning to spare time account.

Comment 16 astur 2009-03-23 23:03:22 UTC

Yes, you are right. Better without double entry L {LL} (I included it because I
was now sure)

That's ok:

A Á B-C {CH} D-E É F-H Ḥ I Í J-K L {ḶḶ} M-N Ñ O Ó P-U Ú Ü V-Z

Thank you very much

Comment 17 ccheney 2009-06-05 16:14:00 UTC

erack,

Pinging about this issue to see if you need anything further from the bug
submitter before committing the change.

Comment 18 erack 2009-06-26 21:23:34 UTC

Test compiling the data of the second attachment gave:

Warning: FormatCode formatindex="12" for currency uses parentheses for negative
amounts, which probably is not correct for locales not based on en_US.
Warning: FormatCode formatindex="13" for currency uses parentheses for negative
amounts, which probably is not correct for locales not based on en_US.
Warning: FormatCode formatindex="14" for currency uses parentheses for negative
amounts, which probably is not correct for locales not based on en_US.
Warning: FormatCode formatindex="15" for currency uses parentheses for negative
amounts, which probably is not correct for locales not based on en_US.
Warning: FormatCode formatindex="17" for currency uses parentheses for negative
amounts, which probably is not correct for locales not based on en_US.
Error: Time100SecSeparator not present in FormatCode formatindex="44".
Error: Time100SecSeparator+00 not present in FormatCode formatindex="44".
Error: Ordering of Time100SecSeparator and TimeSeparator not correct in
formatindex="44".
Warning: formatindex="4","44","45" are the only FormatCode elements checked for
separator usage, there may be others that have errors.
Error: Time100SecSeparator not present in FormatCode formatindex="45".
Error: Time100SecSeparator+00 not present in FormatCode formatindex="45".
Error: Ordering of Time100SecSeparator and TimeSeparator not correct in
formatindex="45".
Warning: formatindex="4","44","45" are the only FormatCode elements checked for
separator usage, there may be others that have errors.

I'll correct the errors on the fly, as I now have enough information on how it
should look like.

Comment 19 erack 2009-06-26 21:57:21 UTC

In cws locales32:

revision 273444
i18npool/inc/i18npool/lang.h
i18npool/source/isolang/isolang.cxx
i18npool/source/localedata/data/ast_ES.xml
i18npool/source/localedata/data/localedata_euro.map
i18npool/source/localedata/data/makefile.mk
i18npool/source/localedata/localedata.cxx
solenv/inc/postset.mk
svtools/source/misc/langtab.src

Corrected also IndexKey ad UnicodeScript values as indicated above.

Comment 20 erack 2009-06-26 21:59:02 UTC

Created attachment 63239 [details]
corrected locale data as committed, for reference

Comment 21 ooo 2009-09-04 17:45:25 UTC

Reassigning to QA for verification.

Comment 22 stefan.baltzer 2009-09-11 14:42:43 UTC

Verified in CWS locales32.

Comment 23 astur 2009-09-22 13:53:09 UTC

Hi, I would like to know if it's neccessary to do some step more to see the
results in Openoffice.org

As you can see, we have made progressions in the translations,
http://www.sunvirtuallab.com:32300/ast/openoffice_org/ 

...and we wonder when we can see this results, or if it's neccessary 	
another requirement more.

Thank you!!

Comment 24 stefan.baltzer 2009-09-29 11:51:25 UTC

OK in DEV300_m60. Closed.

Comment 25 stefan.baltzer 2009-09-29 12:31:37 UTC

Note: This issue is (was) about localedata. Integrated since build DEV300_m60,
thus visible in OOo 3.2 Beta and release candidates and final. See
download.openoffice.org for developer snapshots.

Translation goes via other channels.