Apache OpenOffice (AOO) Bugzilla – Issue 4925
Enhancement to increase consistency of CSV file data
Last modified: 2013-08-07 15:13:21 UTC
I do reports that are generated for our web site based on data in a .CSV file saved by Microsoft Works or Excel. I have a program I wrote that parses the data and generates HTML tables. The CSV data fields from Works are always quoted and appear in the same format as they appear on the screen: e.g: date fields as M/D are saved as "M/D" and a currency field displayed as $910.00 is saved as "$910.00". Not that I think MS Works should be considered the defining standard, but what happens when OOo saves a CSV file causes data to become ambiguous. e.g.: a date field as 2/25 is saved unquoted as 02/25/2002 (not a terrible tragedy) and a currency value, $910.00 is saved as simply 910 (more serious; it has become a simple number/integer.) Is there a specification for how data in CSV files should be handled? I'd be happy to change my parser to do this then. But, it would be real nifty if money data and any specifically formatted data were saved the way they are displayed on screen. Thanks.
Created attachment 1667 [details] Converted from same .XLS spreadsheet: 2002checking.csv by Works. 2002checking2.csv by OOo.
Hi Falko, 14U Regards, Peter
*** Issue 15566 has been marked as a duplicate of this issue. ***
Sorry I can help you with this issue since you did not provide the initial Excel file. It all depends on how the formatting went there.
The issue isn't about converting from a .XLS to .CSV. The issue is about how data formatted in a spreadsheet (any spreadsheet) is represented by OOo in .CSV data. The author of issue 15566 actually explained it better. Formatted Date and Money values are not written to .CSV files the way they are displayed on screen. The Money representation is more serious for my purposes. The author of #15566 has a bigger nit to pick with date representation. I will attach a .tgz containing a .sxc file using formatted dates and money values, and the resulting .csv file. (Note that this is not an OS specific issue. I've stopped using WinME and I'm using linux 100% now. The .csv that is in the [soon to be attached] archive was created with OOo 1.0.3 on linux.) My preference for .CSV file creation would be to have every column quoted (even empty ones) and all data represented exactly as OOo displayed it formatted on screen. Perhaps a switch to toggle between my preferred "waste space" option vs the existing "terse and optimal" version :-) ??
Created attachment 9318 [details] tgz archive. 2002checking.sxc with formatted data. 2002checkingOoo.csv is the data converted to text.
*** Issue 20502 has been marked as a duplicate of this issue. ***
*** Issue 20642 has been marked as a duplicate of this issue. ***
FT->ER: I am not a full expert on CSV but from what I know CSV in its 100%-compatibility mode cannot carry any additional formattings. Can you please state your opinion? Thx.
Falko, what is a CSV 100%-compatibility mode? A CSV file may contain anything you like, there is no restriction except that field values that contain the field separator or the quote character or a line feed have to be enquoted (and a quote character within the field value being doubled), but other field values may be enquoted as well. During import it is up to the application to interpret the strings as values or not (which currently isn't handled correctly by Calc, see issue 13331 and the URLs mentioned below). Exporting all values as they are displayed using their number formats may not always be desired, since it can easily create data that can't be interpreted as values by any application, resulting in textual strings only. An export option would be necessary where the user can specify the behavior. Btw: enquoting everything including empty columns is really a waste of space and is not necessary from the file format definition's view. See also: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm http://www.python.org/peps/pep-0305.html
FT->ER: Thank you for the links. I fully agree with the statement of the first link's article. I do not think we should invest much more work into our (legacy) CSV filter but rather choose to "re-route" special wishes through an XSLT sheet. Such filters a way easier to write (by the community). For legacy reasons though we should stick to MS Excel's interpretation of CSV as our standard, too. So, if there are any enhancement to make we should consider this if not I'd like to close this issue.
Falko, Using XSLT for further textual export formats would probably be the way to go in future. However, having an "export field values as displayed, including number formats" option may be desirable inside the Calc CSV export, as this is what a competetive product always does, it doesn't know anything else. I suggest to add that option to the CSV export dialog, implementation of the functionality isn't too much effort.
I thought I did reassign this.. well, doing it now.
See also issue 3516 that discusses CSV export options too.
*** Issue 24983 has been marked as a duplicate of this issue. ***
*** Issue 25136 has been marked as a duplicate of this issue. ***
*** Issue 26949 has been marked as a duplicate of this issue. ***
*** Issue 23252 has been marked as a duplicate of this issue. ***
This is very annoying. Fields should be exported as displayed intstead of standardized format.
*** Issue 33363 has been marked as a duplicate of this issue. ***
Target set.
*** Issue 34958 has been marked as a duplicate of this issue. ***
As far as what to display in the CSV for a field, to my knowledge, there is only one reliable way to get what you want in a CSV file and that is to write in it just what is displayed on the screen. This always gives the user the ability to format the CSV file as they want. I would like to see OpenOffice.org at least offer the option of writing to CSV just what is seen on the screen. Loving OpenOffice.org as much as I do, it pains me when I have to use another product just for the purpose of writing out a CSV file. It's the only reason I have to use another product at times. Any other options are great and I'm sure helpful to those who need to use them but at least with the "write what is displayed" paradigm, every user can control what goes in the CSV file with some formatting. Without it, I believe there are always cases where someone can't get what they want in the CSV.
*** Issue 41743 has been marked as a duplicate of this issue. ***
*** Issue 41082 has been marked as a duplicate of this issue. ***
*** Issue 43750 has been marked as a duplicate of this issue. ***
*** Issue 46641 has been marked as a duplicate of this issue. ***
This is a VERY BIG Issue. Any developer trying to manipulate data and provide an export will most likely use CSV. Seems that the reluctance to correct this issue, is one more reason to stay with MS Excel. Keep in mind most users don't want to build an XLST or learn formula to format a column for exporting. Every Spreadsheet user I know needs this functionality. What is even more mind numbing is that this issue has not received any real attention in THREE YEARS. This blows my mind.
Abeelias, Yeah, There are over a dozen other issues marked as duplicates. I'm thinking that soon I'll become famous for having the oldest, most duplicated issue in OOo. Perhaps it would help if everyone who reported an issue marked as a duplicate of this one voted to have this issue addressed. The system only let me put two votes on this.
Why is this an enhancement? It's a bug! No one would say its an enhancement if the csv exported the formula behind a cell (ie =A1+B1). Its a bug, and it frustrating to see that ft, er, jpbonn, and thehl just keep disregarding duplicate new requests to get this FIXED. Guys, its been 3 years!!! Can this item get some attention?
I hope no one thinks I'm fixing this. I use Microsoft Office. I have been for the past three years ever since the "Bug Report" I submitted got switched to a "Feature Enhancement". It's interesting that they ask you to participate in open software development by submitting bugs only to sit on them for three years.
Abeelias, > Why is this an enhancement? It's a bug! No one would say its an enhancement if > the csv exported the formula behind a cell (ie =A1+B1). Its a bug, Just because something doesn't work like you think it should doesn't mean that it is a bug. It was a design decision years ago to export values such that you can read in the file again and get identical value data. So this works as designed and therefor is not a defect. Implementing another second behavior _is_ an enhancement, especially if it needs UI. > and it frustrating to see that ft, er, jpbonn, and thehl just keep > disregarding duplicate new requests to get this FIXED. It wasn't 'ft' but 'fst' who added the duplicate references, and that is exctly what he's supposed to do because he is a QA person. Don't know why you include me (er) in that list. I added my comment that I think we should simply have an option in the export dialog where the user can specify how to write the data. Then I reassigned the issue to 'ft' who is responsible for deciding about festure designs. > Guys, its been 3 years!!! Can this item get some attention? Seconded. But bear in mind that there are several hundreds or thousands of feature requests and requests for enhancement, which somehow have to be prioritized. Eike
I find it hard to believe that anyone can say this is not a priority. Every devloper I speak to says that this is a major issue with Ooo. In addition, Since when does this affect the UI? Maybe if you want all these crazy save options, but we don't need that. Use the format of the column like MS Excel does it. If the column is formated text, put quotes around it. If its a decimal with four decimal points, export it with four decimal points. If the date is formated YYYY-MM-DD export it that way. If the column is formated "General" don't put quotes around it. This is pretty basic stuff. Why hasn't anyone addressed this in 3 YEARS!?! Whatever I'm done bitching.
If you look at the bug I filed (Issue 23252) the problem I had went away if the locale English (Eire) was used instead of English (USA). OpenOffice is saving the CSV format differently depending on the locale. So the notion (expressed by er) that OO has some sort of consistent external CSV formatting is not true either.
*** Issue 47769 has been marked as a duplicate of this issue. ***
This is completely unreal. Another issue has been duplicated. Is there a way to escalate this? FT you have to delegate this to someone who cares. THIS IS A BUG!! Anyone out there that can help us?
FT and ER. by looking at the comments from user and also by many duplicet issue. I also feel that we need to consider this for OOo 2.0.1 if possible. Thanks
utomo99, I am thankful that you too feel that this is important. What bothers me is that FT has shown no interest in resolving this. Is there someone else that has the skillset to tackle this issue?
Mr. Rathke is it possible that you can get your other Sun Micro buddy Niklas Nebel(nn) to take a look in to this? Maybe someone can schedule this issue for 2.0.1?
I am fully aware of this issue, and what Eike wrote is right. We don't want to silently change the output format of CSV export, possibly breaking other people's use of it. So this really needs an extra setting in the CSV export filter options dialog.
nn, If you actually think anyone (who is logical) is currently using the CSV export the way OOo is currently handling it - you are fooling yourself and doing an injustice to the community which requires this functionality. Futhermore, per my previous post, should someone want to mimic the "defective" functionality that is available to them currently, they could just use the COLUMN FORMAT to get their desired results. To hold back this development, or worse, to over design its resolution is hurting OOo users everywhere. There is no need to further complicate the CSV export with "save options." We just need a plain and simple solution. (read my previous post on how MS Excel handles it.) PLEASE HELP US ITS BEEN THREE YEARS!
Got me thinking.. We can have a ".CSV (cell values)" and a new ".CSV (WYSIWYG)" when saving. This way you satisfy the handful of users that like it this way, and you please the rest of us. Is this a solution we can live with for 2.0.1???
I think that sounds fine. Thanks.
FT NN, is this acceptable?
Any update on this? It would be nice to get rid of Excel once and for all. Just can't seem to do it - solely due to this is bug. Pretty sad don't you all think?
FT & NN please think about abeelias comments. if OOo 2.01 is too short maybe 2.1 still OK. but we need to prepare it so it can go there and not disturbing the help and other documentations too. Thanks ------- Additional comments from abeelias Thu Apr 28 10:07:04 -0700 2005 ------- Got me thinking.. We can have a ".CSV (cell values)" and a new ".CSV (WYSIWYG)" when saving. This way you satisfy the handful of users that like it this way, and you please the rest of us. Is this a solution we can live with for 2.0.1???
*** Issue 52529 has been marked as a duplicate of this issue. ***
This is important to resolve because the facility it provides is necessary for some users (myself included) and also because of compatibility with Excel.
Created attachment 28679 [details] Proposed patch (yet to receive comments from mmeeks/jody)
So - the UI weenies would prolly like a screenshot of the new dialog to ensure it looks as ugly as before :-) The patch looks fine - although currently it seems 'bSaveAsRendered' is an exact synonym for 'bFixedWidth', which surely can't be right (?) - what if you don't want fixed width but do want it as it's rendered ? :-)
Its a bit confusing in the patch (i.e. in .diff file). bFixedWidth and bSaveAsRendered are similar only in getting the 'string' from the value cell but they are different otherwise. So it should work fine when you want .csv (',' separated), with rendered option and without fixed width option.
Created attachment 28704 [details] Screen Shot 1
Created attachment 28705 [details] Screen Shot 2
Created attachment 28706 [details] A small UI change to remove overlapping of 'fixed column width' and 'save as rendered' checkboxes
Please check if now, where space of the original checkbox text is of half width, it is sufficient for other localizations, for example Finnish (long strings in Latin script) or Chinese respectively Japanese (wide display strings due to wide glyphs). Furthermore, as this is an UI change, the User Experience team should take a look at the screenshots and give their ok.
*** Issue 53972 has been marked as a duplicate of this issue. ***
Created attachment 29216 [details] Updated patch - has the option in a new line.
Created attachment 29217 [details] Screenshot of the updated patch
*** Issue 54341 has been marked as a duplicate of this issue. ***
*** Issue 54899 has been marked as a duplicate of this issue. ***
*** Issue 54050 has been marked as a duplicate of this issue. ***
*** Issue 55031 has been marked as a duplicate of this issue. ***
There is still no set target milestone for this.. any comments when that "save as rendered" update is coming? Many thanks for this enhancement/bugfix/whatever, since this finally allows me to move 100% to OO (when the update is available..). It seems I was not the only one..
FT: Re-assigned to requirement default user
Look, I am sorry to be a pain about this, but this feature is absolutely essential for our office. UK businesses now have to move to BACSIP, an internet version of the old Bank Automated Credit System software. For that I need CSV files which show £3.00 (three UK pounds) as 3.00, not ust as 3. Without this facility, we can't prepare files for transmitting payments to our bank. So I am just about to reinstall Excel to our office computers because otherwise I won't be able to process payments. I'll happily give a bounty of 300 Euros for someone to solve this.
markellse, Please see the attachments and screenshots. I have already solved this (great! i can get that 300 euros bounty, i guess ;) ). Please wait until someone commits those to the build - only then will you be able to get the update. The update is available on linux (ooo-build) thou. btw, please read the previous comments before posting one. Muthu
OK, so the problem is solved. But it isn't, it is still there. Still happy to pay the bounty, but it seems to me that a proportion of it is for solving the problem and the remaining proportion is for incorporating the solution in the code. When the latter is done, the problem is solved. Still happy to pay the bounty once the code is out.
Hi Christian, Cc'ing you on this issue. Would the User Experience team please kindly take a look at the proposed solution, regarding the dialog layout and wording, and if satisfied also provide a German translation of "Save as ~Rendered", which is needed for the resource file. Please also provide the formal specification document if needed Thanks Eike
markellse, I, as well, made a commitment to Muthu, however its not in a release that I can test it. Its quite upseting. I'd really don't know why SUN takes so long to implement a rather simple fix to a very agrivating problem.
So - there is something a completely non-technical end-user with no experience can do to help accelerate this - actually it is often preferable to have precisely no experience and understanding [ so you're already way over qualified here ;-]. It is necessary to write a *specifiction* for every feature. IMHO it helps if each specification is contested - and considerable discussion is had over the minutia of each spec. eg. whether strings can be re-phrased; the correct positioning of ','s in various languages etc. This is important so that the specification is of the highest possible quality to ensure ultimate underlying product quality. Of course - none of this adds any functionality whatsoever - and can drastically delay anything actually getting done :-) but there is hope for a new specification process. Head over to http://specs.openoffice.org/ - if SourceCast is responding for you, grab the template & start typing - the more text the better - this feature is important: it needs at least 5000 words to get approved ;->
Hi Eike, it is nice to hear, that the patch can get integrated now. So I've set this issue to OO.o 2.0.3. Please take over. Thank you.
Accepted. Specification is on its way. @Michael: you're right that writing a spec should not be the task of the User Experience team. This process hurdle has definitely to be solved somehow, but I doubt that it helps to add fuel to the flames by writing sarcastic remarks in issues.. Anyway, thanks Eike
heh ;-) well - the new spec. process is somewhat better I guess; but IMHO filling out any of the specs I've involved with could trivially have been written by an end-user - prolly producing a better result too.
Hi, the spec can be found here http://specs.openoffice.org/calc/filters/csv/save-to-csv.odt It counts much less then 5000 words and it took about 1 hour to create it. Reviews included ;-) I guess that is acceptable. Regarding the spec: Until issue 58335 is not integrated into OO.o 2.02 click "Disable" in the Macro Warning. So, I that's all folks -- we'll see this enhancement in 2.03.
Michael (mmeeks), I hope that your comment wasn't directed at me. Since you asked me to go spin my wheels finding other annoying bugs on 8/1/05 and my immediate response on 8/2/05 here is an update for you: bug #4925 - no resolution bug #6732 - no resolution bug #1127 - no resolution So i might be inexperienced and dumb... maybe even stupid. But at least I'm not incompetent when it comes to releasing a quality product. Furthermore, the fact is this -- THREE YEARS!!!!!!! So have fun with your specs, red tape, and cronyism at SUN. I'm going to spend my time on a project that responds timely to the community. 'Say good night to the bad guy' Muthu, when this gets implemented I'll send you a 1000.00 (USD) for your talented efforts. (I have your contact info)
Hi abeelias, Thanks. How did you get my contact info!? BTW, mmeeks doesn't work for SUN and I work on OOo on weekends and during free time - you know, some little things that I can do for opensource. [ This patch(attachment: sc-csvfix-ui.diff) was available on the novell's ooo-build long time back ] Atleast, lets celebrate that this will be available in SUN's 2.0.3 build! - when is it planned to release? - I can't wait to see my patch! Muthu
In CWS dr46: sc/source/ui/dbgui/imoptdlg.cxx 1.11.106.1 sc/source/ui/dbgui/imoptdlg.hrc 1.5.106.1 sc/source/ui/dbgui/imoptdlg.src 1.32.106.1 sc/source/ui/dbgui/scuiimoptdlg.cxx 1.4.106.1 sc/source/ui/docshell/docsh.cxx 1.83.10.1 sc/source/ui/inc/imoptdlg.hxx 1.11.106.1 sc/source/ui/inc/scuiimoptdlg.hxx 1.4.106.1 Note that I tweaked the logic in ScDocShell::AsciiSave() since not only value cells can have number formats applied, but also string cells and formula cells of course. Furthermore, introduced checks to properly enquote the resulting string if necessary as required by the CSV file format.
Reassigning to QA. re-open issue and reassign to oc@openoffice.org
reassign to oc@openoffice.org
reset resolution to FIXED
verified in internal build cws_dr46
closed because fix available in OpenOffice.org Developer Snapshot Build src680_m167
OK chaps, this now works. Many thanks indeed. I offered 300 euros as a bounty. Am I right that Muthu did most of the work? If so, Muthu, please send me your contact details so that I can send you the bounty. Of if you want me to make the donation elsewhere, let me know. If anyone else contributed, please let me know and I will have to make a decision about distribution. Thanks again, Mark
Mark, I'll be sending muthusuba a 1000(USD). I hope this doesn't adversely affect your descision.
abeelias Yes, I'd notice that as well. Don't think it should affect what I do, so I'll send anyway. Do you have contact details?
Thanks abeelias & markellse. My e-mail contact: muthusuba_AT_gmail_DOT_com abeelias: [in reply to 'Say good night to the bad guy'] I hope you will still continue to work on openoffice.org to use, report and improve openoffice - this is a small request from my side. Thanks again.
I received 300euros from markellse - Thanks. I am yet to receive $1000 from abeelias.