Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||In "standard" data type csv import should never convert items with one dot to dates|
|Component:||open-import||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P3||CC:||cno, discoleo, higuita, issues, jbf.faure, kyoshida, ooo, rb.henschel, tuharsky|
|Issue Type:||DEFECT||Latest Confirmation in:||---|
Description mfrasca 2008-04-07 10:47:33 UTC
I have a CSV file holding this line: 20-12-2007,22:44:00,007,24.08,0.0,24.85 the types are, respectively: date, time, string, float, float, float I cannot tell the csv importer that the last three columns are floats, so I leave that as 'standard'. the importer produces these types: date, time, string, date, float, float I am aware that this issue is a duplicate of 23024, but I really have a problem accepting this as a "works for us!"... it's true that the locale of most European languages dictate that ',' is the separator for decimal, but it's only in Germany where the locale dictates that the '.' separates dates! when scalc parses a '24.08' and sees it does not conform to 'decimal' (expecting comma) nor to 'date' (unless in Germany, but then they would write '24.8.', with TWO points, not just one), why don't you just make it choose for decimal? I can assure you that this is the least surprising behaviour for most Europeans!!! thanks a lot for this otherwise beautiful piece of software, running also on my linux-powerpc!
Comment 1 Regina Henschel 2008-04-07 11:00:54 UTC
When you are in the import dialog, click on the column head. This enables a drop-down list "column type" above. In that list choose "US English". Then the values will be imported as float in English notation. They are shown in your local later on in the spreadsheet. Using this list you can control the date format too, where the order of year, month and day differs, and you can force a content to be handled as text. It is really "worksforme".
Comment 2 mfrasca 2008-04-07 11:35:37 UTC
I understand that it is possible to select each of the columns containing numbers like 17.03 and 6.12, inform the importer that the column contains "US English" data, but the problem is: rule for date: dd-mm-yyyy rule for number: x,d (but on the same system M$Excel uses x.d) input 17.03 does not respect either. having this interpreted as a date is surprising... I tried with this option: Tools->Options->Language Settings->Languages->Decimal separator key even if I leave it unchecked, it still does not work for me. moreover, this behaviour is not only in the importer but also for typed data. again: really no European would expect 1.13 to mean January 13th...
Comment 3 Regina Henschel 2008-04-07 11:55:05 UTC
What Operating system do you use? What is the local setting of your OS? What GUI-language has your OOo? What is the local setting in OOo? Is in "Tools>...>Decimal seperator key" the option "Same as local setting" checked? From where do you call that csv-import dialog? File>Open, Insert>Sheet, clipboard or...? Please attach such a file, which imports wrong. I cannot reproduce your problem. All files import correct for me. Do you really chose the correct column type in the import dialog? From your example it should be "Date (DMY)", "Time", "Text", "US English", "US English", and "US English". The way the float values are shown in the spreadsheet does not depend on the import, but on the language you use for the cells in your spreadsheet.
Comment 4 mfrasca 2008-04-07 12:52:37 UTC
no, I do not instruct OpenOffice to import my data (see original post) as US English and the whole point is that I would like not needing that. the data is not US English data and my colleagues here see this behaviour in OOo as an extra non obvious step as compared to M$Excel. at home I have a linux-powerpc machine, with locale set to 'en_DK.UTF-8' and I don't experience this defect there. here at office they gave me a windows machine (XP), Dutch localized. I installed the English version of OOo 2.4 (same as at home). I don't know where to look, in windows, for the answer to "What is the local setting of your OS?" but I assume it's Dutch (context menus in explorer are in Dutch). the OOo locale is set to Default Is in "Tools>...>Decimal seperator key" the option "Same as local setting" checked? setting or unsetting this option does not alter the behaviour "type/import 1.13 and get 13-01-08" as I see it, my locale settings say that I should type "13-1" to mean 13th of January of the current year. the same setting also say that I should type 13,1 if I mean 13+1/10. so actually the string 13.1 should either be left uninterpreted as a string (does not respect any known format) or if the program wants to be kind and helpful, it could be interpreted as a floating point number, which I believe would be the case in most cases. interpreting it as a date is based on the German format for dates, where the Germans use the '.' to make an ordinal number out of the preceding cardinal number, but the fact is that: 1) both day and month should be followed by that dot in the German notation for dates 2) Dutch, Italian, French, Spanish, are all not German and we all don't understand why 13.1 should mean January 13th at all... I notice that the English (Eire) locale comes quite close to what I expect and I think I can live with this. but I insist that you reconsider the fallback to parsing as date (using a sloppy rule from a different locale) as a default behaviour for unparseable input (which would be parseable as float in the most widespread locale)
Comment 5 mfrasca 2008-04-07 12:58:15 UTC
also check this: http://www.cl.cam.ac.uk/~mgk25/iso-time.html where it is stated that the German format dd.mm.yy(yy) has been obsoleted in May 1995.
Comment 6 Regina Henschel 2008-04-07 19:30:04 UTC
I have changed the issue type to an enhancement and changed the summary to reflect the problem. I hope, that I got you right.
Comment 7 discoleo 2008-04-07 20:01:43 UTC
A similar discussion took place in December on the OOo Sc-mailing list: http://sc.openoffice.org/servlets/ReadMsg?listName=dev&msgNo=2751 It was my experience that the automatic date-recognition mechanism will never reach an acceptable level of reliability in true life. It annoys me over and over again when OOo Calc tries to figure out what date it is, when there is really *NO* date. And "outsmarting" the user is a sure way to earn you a lot of anger. It was therefore my belief, that OOo needs a mechanism to set the date-interpretation algorithm (aka which numbers are dates): - e.g. 'dd.mm' should be interpreted as date - others might choose only 'dd/mm' - and still others only 'mm/dd' Indeed, this would give complete control back to the users. Especially users in multi-locale environments would benefit from this feature, but many others would be grateful, too.
Comment 8 mfrasca 2008-04-08 15:17:18 UTC
well, this would be a clear enhancement _for us_, but if a user is using DIN 5008 data prior to the 1995 correction, he might have reasons to complain... as for Leonard, it's the "never" which worries me a bit. I would say "never if the locale does not say date:==dd.mm.yyyy", i.e.: "only if the locale explicitly says so". maybe "in case on non match with locale, prefer converting to number above converting to date" I agree with Leonard when he says that it would be nice if we could specify our own date format(s) -overruling the locale- and that input data (imported or typed) is recognized as date if and only if it matches and absolutely not otherwise. if you offered this option, I would define these formats and be happy: dd/mm dd/mm/yyyy yyyy-mm-dd
Comment 9 discoleo 2008-04-08 17:44:30 UTC
The automatic date-issue pops up over and over again, but these issues get either closed as WORKSFORME, or for many other reasons, so I will keep a list on this (still open) issue: http://www.openoffice.org/issues/show_bug.cgi?id=13509 http://www.openoffice.org/issues/show_bug.cgi?id=13929 http://www.openoffice.org/issues/show_bug.cgi?id=20804 http://www.openoffice.org/issues/show_bug.cgi?id=22725 http://www.openoffice.org/issues/show_bug.cgi?id=23024 http://www.openoffice.org/issues/show_bug.cgi?id=23328 http://www.openoffice.org/issues/show_bug.cgi?id=23945 http://www.openoffice.org/issues/show_bug.cgi?id=25201 [...] and many more will follow when I have more time [...] http://www.openoffice.org/issues/show_bug.cgi?id=88020 http://www.openoffice.org/issues/show_bug.cgi?id=1145 [closed only for Data-Pilot] http://www.openoffice.org/issues/show_bug.cgi?id=7905 [touches on locale - but same underlying problem] [Partial Matches] http://www.openoffice.org/issues/show_bug.cgi?id=71997 http://www.openoffice.org/issues/show_bug.cgi?id=17146 In the meantime, I have re-opened the discussion on the mailing-lists using the previous date-enhancement proposal: see e.g. http://ux.openoffice.org/servlets/ReadMsg?list=discuss&msgNo=1573 [for the UX-mailing list] As I side-note, the enhancement I propose is called in computer-jargon: a data *input mask*. ;-)
Comment 10 discoleo 2008-04-15 11:39:23 UTC
As said, this issue pops up at least a dozen times on Bugzilla, but the real number is probably more like over 100 times. I'll investigate further when I have some spare time. Some new additions (as always closed as WORKSFORME): http://www.openoffice.org/issues/show_bug.cgi?id=88243 http://www.openoffice.org/issues/show_bug.cgi?id=88271 [...] definitely more to come
Comment 11 mfrasca 2008-04-15 12:12:39 UTC
if you are affected by this issue, please vote for it!
Comment 12 discoleo 2008-04-23 19:51:03 UTC
One more member to the list of similar issues: http://www.openoffice.org/issues/show_bug.cgi?id=77973 [I forgot about this one. It contains also a detailed discussion about automatic date-conversions.]
Comment 13 mfrasca 2008-04-24 06:58:41 UTC
regina, would you agree marking this back as a 'defect' rather than 'enhancement'? 'defect', in the sense of removing/modifying a surprising behaviour, not 'enhancement' in the sense of adding a feature. friendly regards. Mario.
Comment 14 discoleo 2008-04-29 18:31:19 UTC
Issue 33723 (http://www.openoffice.org/issues/show_bug.cgi?id=33723) covers the same problem: automatic number recognition (more specifically, the automatic date conversion is the culprit). That issue has gathered some 31 votes.
Comment 15 cno 2008-05-05 19:09:12 UTC
Comment 16 ensonic 2009-04-23 06:59:41 UTC
I also had the problem: http://www.openoffice.org/issues/show_bug.cgi?id=101286
Comment 17 sos 2009-04-23 10:03:51 UTC
+ 2 same problems for Belgium wher we hav to cope with 3 locals -:)
Comment 18 pointbreak 2009-06-04 06:25:41 UTC
Just to spite the bigots who think everyone should be forced to use their poorly thought out features, I've submitted an issue. Issue 102488 http://www.openoffice.org/issues/show_bug.cgi?id=102488
Comment 19 kyoshida 2009-06-04 17:23:05 UTC
*** Issue 102488 has been marked as a duplicate of this issue. ***
Comment 20 kyoshida 2009-06-04 17:25:22 UTC
BTW, people, try not to swear ok? We are all humans. No one likes to be called names.
Comment 21 mfrasca 2009-06-04 20:29:52 UTC
no-one likes to be called names, that's clear and should be avoided... I assume on the other hand that most of us reporting bugs here are working as advocates of openoffice.org, each of us inside of our respective companies, and we feel the duty to defend something we sometimes don't like and (as in this case) don't understand. please note that issues like this one have been opened and quickly closed as a "works for me" since April 2003 (check http://www.openoffice.org/issues/show_bug.cgi?id=13929), this is over 6 years ago. you maybe can understand some impatience on users' side. any plan to schedule a solution to this for any specific future release? thanks a lot for your attention, Mario Frasca
Comment 22 kyoshida 2009-06-04 21:08:35 UTC
Sure. I understand the impatience of the users. On the other hand, there are currently thousands of issues like this with very many frustrating users, and there are only a few developers writing code. So, there are also equal amount (if not greater) of frustrations coming from the developers side as well. All we can do is fix the issues one at a time, but there are still 999 more issues to go. It's a very painfully slow process, and sometimes it feels endless.
Comment 23 bbouwens 2009-06-04 21:27:33 UTC
It is quite understandable that it can take quite a while before issues are fixed as there are so many. Personally I sometimes find it rewarding to look for some low hanging fruit, which brings down the number of issues down quickly, even if it is intellectually utterly boring. In this case: I really think someone who knows what he's doing can fix this within an hour: it's just deleting support for a totally obsolete German-only format. Oops, now I may be offending some senior Germans ....
Comment 24 mfrasca 2009-06-08 08:32:18 UTC
a possibly relevant link: http://specs.openoffice.org/calc/ease-of-use/enhance_number_recognition.sxw
Comment 25 tuharsky 2009-09-04 11:06:54 UTC
The behaviour is illogical more than that. The way the data are parsed is applied BY CELL! So that it ends up like a useless mess of (bogus) dates and numbers. CSV is usually used as simple data transfer format, not intended for any formatting or so. We can quite reliably assume, that the format of any column, excepting the first line, should be of same type. So, if the program dosen't know for sure, how to eat the values, it should at least keep up with the same resolution for the whole column (excepting the first line). The most secure way to do is simply interpret them as text and let the user decide later, how to deal with them (he can always select them and say "they are all numbers"). That seems for me quite reasonable approach: "Hmm, there are some strange values in the column C, that don't hold together with other values in that column.. I will better mark them as text and let the user decide". At least, there is no dataloss using this approach. This is much more favorable than generating dates from everything that remotely resembles a date, thus destroying original values.
Comment 26 kyoshida 2009-09-04 14:20:03 UTC
@tuharsky: perhaps this feature of mine http://wiki.services.openoffice.org/wiki/Calc/Features/Numbers_import_for_plain_text_files will allow you to do that once integrated. The implementation is done, but due to insufficient time to perform proper QA & other testing, chances of getting that into 3.2 is becoming very very slim.
Comment 27 soerenb 2009-09-05 17:37:13 UTC
@kohei: thank you!!! and thank God!!! something is happening along these lines finally. Being a citizen of one of the concerned countries (Germany), I have been waiting for this from the ages of StaOffice 5.2. I hope I get this right: downloading & installing the file /~kohei/builds/cws/koheicsvimport/OOo_3.2.0_090807_Win32Intel_install.exe gives me an en-US installation pack to install in addition to my german one? But even if it kills that: I am going to put up with quite a lot of trouble for this feature, which, as other folks have stated here ..and there.., is the only logical way to do this in a world of wildly different conventions as far as special numbers are concerned. THANK YOU AGAIN!!!
Comment 28 soerenb 2009-09-05 18:05:42 UTC
again, @kohei: install failed, error: 2etup -that's what it read!- not found
Comment 29 Regina Henschel 2009-09-10 18:44:45 UTC
*** Issue 104913 has been marked as a duplicate of this issue. ***
Comment 30 Mechtilde 2009-09-17 19:22:20 UTC
*** Issue 105137 has been marked as a duplicate of this issue. ***
Comment 31 Rainer Bielefeld 2010-03-20 12:15:13 UTC
*** Issue 110263 has been marked as a duplicate of this issue. ***
Comment 32 gregorystec 2010-03-22 11:29:29 UTC
It's time to resolve this issue (defect). Thats completely crazy if the column with row numbering or us-US style numbers with decimal point (imported to cal from hardware devices - any type of meters) are converted to data type automatically. Date from electronic meters usually are complex and huge data file, and manually correcting this is unacceptable. I think that giving the option to mark "do not convert to data" on the import form is the good way. The better is to automatically convert us-US style numbers with period (.) to comma (,) style.