Apache OpenOffice (AOO) Bugzilla – Issue 107867
Plain CSV import erroneous because of incorrect row termination
Last modified: 2010-01-07 15:53:28 UTC
A commonly reported problem is strange CSV import troubles which elicit a too many rows to import error when the number of rows is far short of the actual limit. I created a large CSV file, very simple straight CSV, comma separated, any text field " delimited and with a " quoted header line. EOL is crlf, same if I use just lf. No strangeness, no character encoding, old fashioned upper case plain ascii. Tried various import character encodings, no effect. Exclude header row, no effect. I have discovered a major clue on what is going on. I did a CSV import, it read maybe 5,000 out of 7,500 rows. I then exported the CSV to try and discover what it thought CSV ought to look like. On looking with a binary capable editor I discover it thought there was a massive numnber of **columns**, exported data is fine but appended to each row are many ,,,,,,,,,,,,,,,,,,,,,,,,,,,, On looking I then see it had adjusted many column widths on CSV import. As a hint the import CSV was 675k, mostly imported. Exporting that to a new CSV produced a 6.92M file, mostly commas. Gnumeric imports the original CSV without bothering with import dialog. Various other software imports without problem. Even if it is bad CSV it ought to be handled gracefully. If a developer wants a test file, please email me. (don't want the file public)
@tim_c: Please attach sample documents, you can also send to me a document by personal mail, if it's too big to be attached!
Created attachment 66794 [details] zip of csv file
Test CSV file sent. 7281 rows including header row 13 columns, he says superstitiously If this does not induce the problem I expect I can produce variants because it was programatically created here.
Reproducible with sample document and "Ooo 3.2.0 RC1 WIN XP DE-multilingual version German UI activated [OOo320m8 (Build 9472)]", also with "2.4.1 Multilingual version English UI WIN XP: [680m17(Build9310)]"! I checked the sample document and I believe the message is caused by Issue 75199. When I scroll down in the open csv dialog, I see a problem in row 1347, where in column M seems to be an incorrect line feed ('<cntrl>+<enter>'?, that repeats several times), so that here we have an incorrect row termination and this might be only a "damaged document" problem. But: 1.1.4 imports the sample document without problems. Looks like Issue 834, but that one should have been fixed?
Not line ends but looks like you have found a problem line, and this raises some CSV parser issues. Row 1347 shows a normal cr/lf pair. Previously tried single lf version of file, same effect. (is actually being created by Lua f:write(blah, '\n') What does exist is row 1347 is a quoted field containing text with periods. ,"CMS"VICE.DO.M", and a quote, so this spins off into the problem of define what exactly is CSV, include if and how escapes are done. The original fixed field width data being translated into CSV contain both ' and " within text fields. I was unaware of the " within text fields, sorry. The solution for OO is likely to be firming up the design of the CSV parser.
Confirmed the data causal by substituting " for ' within " quoted CSV fields. Import than works correctly.
Data contains unescaped " delimiters. *** This issue has been marked as a duplicate of 78926 ***
Closing dup.