Issue 87999

Summary: In "standard" data type csv import should never convert items with one dot to dates
Product: Calc Reporter: mfrasca <mfrasca>
Component: open-importAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: cno, discoleo, higuita, issues, jbf.faure, kyoshida, ooo, rb.henschel, tuharsky
Version: OOo 2.4.0Keywords: oooqa
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---

Description mfrasca 2008-04-07 10:47:33 UTC
I have a CSV file holding this line:
20-12-2007,22:44:00,007,24.08,0.0,24.85

the types are, respectively:
date, time, string, float, float, float

I cannot tell the csv importer that the last three columns are floats, so I
leave that as 'standard'.  the importer produces these types:
date, time, string, date, float, float

I am aware that this issue is a duplicate of 23024, but I really have a problem
accepting this as a "works for us!"...  it's true that the locale of most
European languages dictate that ',' is the separator for decimal, but it's only
in Germany where the locale dictates that the '.' separates dates!  when scalc
parses a '24.08' and sees it does not conform to 'decimal' (expecting comma) nor
to 'date' (unless in Germany, but then they would write '24.8.', with TWO
points, not just one), why don't you just make it choose for decimal?  I can
assure you that this is the least surprising behaviour for most Europeans!!!

thanks a lot for this otherwise beautiful piece of software, running also on my
linux-powerpc!
Comment 1 Regina Henschel 2008-04-07 11:00:54 UTC
When you are in the import dialog, click on the column head. This enables a
drop-down list "column type" above. In that list choose "US English". Then the
values will be imported as float in English notation. They are shown in your
local later on in the spreadsheet.
Using this list you can control the date format too, where the order of year,
month and day differs, and you can force a content to be handled as text.

It is really "worksforme".
Comment 2 mfrasca 2008-04-07 11:35:37 UTC
I understand that it is possible to select each of the columns containing
numbers like 17.03 and 6.12, inform the importer that the column contains "US
English" data, but the problem is:
rule for date: dd-mm-yyyy
rule for number: x,d (but on the same system M$Excel uses x.d)
input 17.03 does not respect either.
having this interpreted as a date is surprising...

I tried with this option:
Tools->Options->Language Settings->Languages->Decimal separator key
even if I leave it unchecked, it still does not work for me.

moreover, this behaviour is not only in the importer but also for typed data.

again: really no European would expect 1.13 to mean January 13th...
Comment 3 Regina Henschel 2008-04-07 11:55:05 UTC
What Operating system do you use?
What is the local setting of your OS?
What GUI-language has your OOo?
What is the local setting in OOo?
Is in "Tools>...>Decimal seperator key"  the option "Same as local setting" checked?

From where do you call that csv-import dialog? File>Open, Insert>Sheet,
clipboard or...?

Please attach such a file, which imports wrong.

I cannot reproduce your problem. All files import correct for me. Do you really
chose the correct column type in the import dialog? From your example it should
be "Date (DMY)", "Time", "Text", "US English", "US English", and "US English".

The way the float values are shown in the spreadsheet does not depend on the
import, but on the language you use for the cells in your spreadsheet.
Comment 4 mfrasca 2008-04-07 12:52:37 UTC
no, I do not instruct OpenOffice to import my data (see original post) as US
English and the whole point is that I would like not needing that.  the data is
not US English data and my colleagues here see this behaviour in OOo as an extra
non obvious step as compared to M$Excel.

at home I have a linux-powerpc machine, with locale set to 'en_DK.UTF-8' and I
don't experience this defect there.  here at office they gave me a windows
machine (XP), Dutch localized.  I installed the English version of OOo 2.4 (same
as at home).

I don't know where to look, in windows, for the answer to
"What is the local setting of your OS?"
but I assume it's Dutch (context menus in explorer are in Dutch).
the OOo locale is set to Default

Is in "Tools>...>Decimal seperator key"  the option "Same as local setting" checked?
setting or unsetting this option does not alter the behaviour "type/import 1.13
and get 13-01-08"

as I see it, my locale settings say that I should type "13-1" to mean 13th of
January of the current year.  the same setting also say that I should type 13,1
if I mean 13+1/10.  so actually the string 13.1 should either be left
uninterpreted as a string (does not respect any known format) or if the program
wants to be kind and helpful, it could be interpreted as a floating point
number, which I believe would be the case in most cases.  interpreting it as a
date is based on the German format for dates, where the Germans use the '.' to
make an ordinal number out of the preceding cardinal number, but the fact is that:
1) both day and month should be followed by that dot in the German notation for
dates
2) Dutch, Italian, French, Spanish, are all not German and we all don't
understand why 13.1 should mean January 13th at all...

I notice that the English (Eire) locale comes quite close to what I expect and I
think I can live with this.  but I insist that you reconsider the fallback to
parsing as date (using a sloppy rule from a different locale) as a default
behaviour for unparseable input (which would be parseable as float in the most
widespread locale)
Comment 5 mfrasca 2008-04-07 12:58:15 UTC
also check this:
http://www.cl.cam.ac.uk/~mgk25/iso-time.html

where it is stated that the German format dd.mm.yy(yy) has been obsoleted in May
1995.
Comment 6 Regina Henschel 2008-04-07 19:30:04 UTC
I have changed the issue type to an enhancement and changed the summary to
reflect the problem. I hope, that I got you right.
Comment 7 discoleo 2008-04-07 20:01:43 UTC
A similar discussion took place in December on the OOo Sc-mailing list:
http://sc.openoffice.org/servlets/ReadMsg?listName=dev&msgNo=2751

It was my experience that the automatic date-recognition mechanism will never
reach an acceptable level of reliability in true life. It annoys me over and
over again when OOo Calc tries to figure out what date it is, when there is
really *NO* date. And "outsmarting" the user is a sure way to earn you a lot of
anger.

It was therefore my belief, that OOo needs a mechanism to set the
date-interpretation algorithm (aka which numbers are dates):
 - e.g. 'dd.mm' should be interpreted as date
 - others might choose only 'dd/mm'
 - and still others only 'mm/dd'

Indeed, this would give complete control back to the users. Especially users in
multi-locale environments would benefit from this feature, but many others would
be grateful, too.
Comment 8 mfrasca 2008-04-08 15:17:18 UTC
well, this would be a clear enhancement _for us_, but if a user is using DIN
5008 data prior to the 1995 correction, he might have reasons to complain...  as
for Leonard, it's the "never" which worries me a bit.  

I would say "never if the locale does not say date:==dd.mm.yyyy", i.e.: "only if
the locale explicitly says so".

maybe "in case on non match with locale, prefer converting to number above
converting to date"

I agree with Leonard when he says that it would be nice if we could specify our
own date format(s) -overruling the locale- and that input data (imported or
typed) is recognized as date if and only if it matches and absolutely not otherwise.

if you offered this option, I would define these formats and be happy:
dd/mm
dd/mm/yyyy
yyyy-mm-dd
Comment 9 discoleo 2008-04-08 17:44:30 UTC
The automatic date-issue pops up over and over again, but these issues get
either closed as WORKSFORME, or for many other reasons, so I will keep a list on
this (still open) issue:
http://www.openoffice.org/issues/show_bug.cgi?id=13509
http://www.openoffice.org/issues/show_bug.cgi?id=13929
http://www.openoffice.org/issues/show_bug.cgi?id=20804
http://www.openoffice.org/issues/show_bug.cgi?id=22725
http://www.openoffice.org/issues/show_bug.cgi?id=23024
http://www.openoffice.org/issues/show_bug.cgi?id=23328
http://www.openoffice.org/issues/show_bug.cgi?id=23945
http://www.openoffice.org/issues/show_bug.cgi?id=25201
[...]
and many more will follow when I have more time
[...]
http://www.openoffice.org/issues/show_bug.cgi?id=88020

http://www.openoffice.org/issues/show_bug.cgi?id=1145
[closed only for Data-Pilot]
http://www.openoffice.org/issues/show_bug.cgi?id=7905
[touches on locale - but same underlying problem]


[Partial Matches]
http://www.openoffice.org/issues/show_bug.cgi?id=71997
http://www.openoffice.org/issues/show_bug.cgi?id=17146

In the meantime, I have re-opened the discussion on the mailing-lists using the
previous date-enhancement proposal:
see e.g.
http://ux.openoffice.org/servlets/ReadMsg?list=discuss&msgNo=1573
[for the UX-mailing list]

As I side-note, the enhancement I propose is called in computer-jargon: a data 
*input mask*. ;-)
Comment 10 discoleo 2008-04-15 11:39:23 UTC
As said, this issue pops up at least a dozen times on Bugzilla, but the real
number is probably more like over 100 times. I'll investigate further when I
have some spare time.

Some new additions (as always closed as WORKSFORME):
http://www.openoffice.org/issues/show_bug.cgi?id=88243
http://www.openoffice.org/issues/show_bug.cgi?id=88271

[...]
definitely more to come
Comment 11 mfrasca 2008-04-15 12:12:39 UTC
if you are affected by this issue, please vote for it!
Comment 12 discoleo 2008-04-23 19:51:03 UTC
One more member to the list of similar issues:

http://www.openoffice.org/issues/show_bug.cgi?id=77973
[I forgot about this one. It contains also a detailed discussion about automatic
date-conversions.]
Comment 13 mfrasca 2008-04-24 06:58:41 UTC
regina, would you agree marking this back as a 'defect' rather than
'enhancement'?  'defect', in the sense of removing/modifying a surprising
behaviour, not 'enhancement' in the sense of adding a feature.  friendly
regards.  Mario.
Comment 14 discoleo 2008-04-29 18:31:19 UTC
Issue 33723 (http://www.openoffice.org/issues/show_bug.cgi?id=33723) covers the
same problem: automatic number recognition (more specifically, the automatic
date conversion is the culprit).

That issue has gathered some 31 votes.
Comment 15 cno 2008-05-05 19:09:12 UTC
+ 2
Comment 16 ensonic 2009-04-23 06:59:41 UTC
I also had the problem:
http://www.openoffice.org/issues/show_bug.cgi?id=101286
Comment 17 sos 2009-04-23 10:03:51 UTC
+ 2
same problems for Belgium wher we hav to cope with 3 locals -:)
Comment 18 pointbreak 2009-06-04 06:25:41 UTC
Just to spite the bigots who think everyone should be forced to use their poorly
thought out features, I've submitted an issue.

Issue 102488 http://www.openoffice.org/issues/show_bug.cgi?id=102488
Comment 19 kyoshida 2009-06-04 17:23:05 UTC
*** Issue 102488 has been marked as a duplicate of this issue. ***
Comment 20 kyoshida 2009-06-04 17:25:22 UTC
BTW, people, try not to swear ok?  We are all humans.  No one likes to be called
names.
Comment 21 mfrasca 2009-06-04 20:29:52 UTC
no-one likes to be called names, that's clear and should be avoided...  I assume
on the other hand that most of us reporting bugs here are working as advocates
of openoffice.org, each of us inside of our respective companies, and we feel
the duty to defend something we sometimes don't like and (as in this case) don't
understand.  

please note that issues like this one have been opened and quickly closed as a
"works for me" since April 2003 (check
http://www.openoffice.org/issues/show_bug.cgi?id=13929), this is over 6 years
ago.  you maybe can understand some impatience on users' side.

any plan to schedule a solution to this for any specific future release?

thanks a lot for your attention,
Mario Frasca
Comment 22 kyoshida 2009-06-04 21:08:35 UTC
Sure.  I understand the impatience of the users.  On the other hand, there are
currently thousands of issues like this with very many frustrating users, and
there are only a few developers writing code.  So, there are also equal amount
(if not greater) of frustrations coming from the developers side as well.  All
we can do is fix the issues one at a time, but there are still 999 more issues
to go.  It's a very painfully slow process, and sometimes it feels endless.
Comment 23 bbouwens 2009-06-04 21:27:33 UTC
It is quite understandable that it can take quite a while before issues are
fixed as there are so many. Personally I sometimes find it rewarding to look for
some low hanging fruit, which brings down the number of issues down quickly,
even if it is intellectually utterly boring. In this case: I really think
someone who knows what he's doing can fix this within an hour: it's just
deleting support for a totally obsolete German-only format.
Oops, now I may be offending some senior Germans ....
Comment 24 mfrasca 2009-06-08 08:32:18 UTC
a possibly relevant link:
http://specs.openoffice.org/calc/ease-of-use/enhance_number_recognition.sxw
Comment 25 tuharsky 2009-09-04 11:06:54 UTC
The behaviour is illogical more than that. The way the data are parsed is
applied BY CELL! So that it ends up like a useless mess of (bogus) dates and
numbers.

CSV is usually used as simple data transfer format, not intended for any
formatting or so. We can quite reliably assume, that the format of any column,
excepting the first line, should be of same type.

So, if the program dosen't know for sure, how to eat the values, it should at
least keep up with the same resolution for the whole column (excepting the first
line). The most secure way to do is simply interpret them as text and let the
user decide later, how to deal with them (he can always select them and say
"they are all numbers").

That seems for me quite reasonable approach: "Hmm, there are some strange values
in the column C, that don't hold together with other values in that column.. I
will better mark them as text and let the user decide".

At least, there is no dataloss using this approach. This is much more favorable
than generating dates from everything that remotely resembles a date, thus
destroying original values.
Comment 26 kyoshida 2009-09-04 14:20:03 UTC
@tuharsky: perhaps this feature of mine

http://wiki.services.openoffice.org/wiki/Calc/Features/Numbers_import_for_plain_text_files

will allow you to do that once integrated.  The implementation is done, but due
to insufficient time to perform proper QA & other testing, chances of getting
that into 3.2 is becoming very very slim.
Comment 27 soerenb 2009-09-05 17:37:13 UTC
@kohei:

thank you!!! and thank God!!! something is happening along these lines finally.
Being a citizen of one of the concerned countries (Germany), I have been waiting
for this from the ages of StaOffice 5.2. I hope I get this right: downloading &
installing the file
/~kohei/builds/cws/koheicsvimport/OOo_3.2.0_090807_Win32Intel_install.exe gives
me an en-US installation pack to install in addition to my german one? But even
if it kills that: I am going to put up with quite a lot of trouble for this
feature, which, as other folks have stated here ..and there.., is the only
logical way to do this in a world of wildly different conventions as far as
special numbers are concerned. THANK YOU AGAIN!!!   
Comment 28 soerenb 2009-09-05 18:05:42 UTC
again, @kohei:
install failed, error: 2etup -that's what it read!- not found
Comment 29 Regina Henschel 2009-09-10 18:44:45 UTC
*** Issue 104913 has been marked as a duplicate of this issue. ***
Comment 30 Mechtilde 2009-09-17 19:22:20 UTC
*** Issue 105137 has been marked as a duplicate of this issue. ***
Comment 31 Rainer Bielefeld 2010-03-20 12:15:13 UTC
*** Issue 110263 has been marked as a duplicate of this issue. ***
Comment 32 gregorystec 2010-03-22 11:29:29 UTC
It's time to resolve this issue (defect). Thats completely crazy if the column
with row numbering or us-US style numbers with decimal point (imported to cal
from hardware devices - any type of meters) are converted to data type
automatically. Date from electronic meters usually are complex and huge data
file, and manually correcting this is unacceptable. I think that giving the
option to mark "do not convert to data" on the import form is the good way. The
better is to automatically convert us-US style numbers with period (.) to comma
(,) style.