Apache OpenOffice (AOO) Bugzilla – Issue 31385
Quoted strings are not handled in a delimited text database.
Last modified: 2008-01-27 13:22:43 UTC
Using a tab delimited database, strings with beginning and ending quotes, which contain line ending characters terminate the record. The parser should ignore any such characters occuring within/between quotes, as the record end will occur outside of the quoted string. Example: the following quoted string should parse into one field/column. "This is a big dog. He likes his fleas." For reference, MSWord 2000, exhibits this very useful behavior.
Created attachment 16423 [details] This file contains a single tab delimited record including quoted strings.
mine
clu->patrickstanley: 1. may you add a step by step description 2. what settings do you have putted in (seperator etc.)? have you uses " for any seperator? in principal it works fine for me - i cannot see a bug
The attachment includes two tab-delimited records. One header and one data. When I open it, the headers are parsed properly, the remainder is parsed very strangely into 11 records. I had assumed that the cause was simply that the quotes were not matched before validating a line end character, as the attachedment includes multiple paragraphs between quotes, and the multiple records seemed to line up that way. But there certainly other parseing problems as record 11 is the record that does not have the data from a single column duplicated in every column. This appears to be because there are paragraph/line-ending marks between quotes in these cases. While it makes no difference here, an unquoted character, [`], is included as the last in each of the two records in the attachment as described above. The settings are for a tab delimited text database containing text headers, containing quoted text and us numeric seperators. The file extension is text and the character set is set for Western European. Please try the attachment.
Melissa Benua, 9/24/04: I did reproduce the erroneous import of the .csv file provided above. However, when I went into Excel 2003 to verify how the data was supposed to appear, it did create the spreadsheet differently, but it didn't seem any more correct. I did the following procedure in Excel 2003 to create a spreadsheet: 1) Open a new, blank spreadsheet 2) Data->Import External Data->Import Data 3) Select the file ReportData.csv 4) Import the file as Delimited, and type as Western European, and select next 5) User a tab delimiter, and set the text qualifier to ' instead of " and hit next 6) Hit finish and save file as .xls This will create the following .xls file that I have attached (ReportDataExcel.xls). To import the file into OOo, I did the following: 1) Open ReportData.csv in Calc 2) Select the Western European character set 3) Set as tab delimitated, with the text delimitator set to ' instead of " and hit okay 4) Save file as .sxc This created the following .sxc file that I have attached (ReportDataOOo.sxc). Neither one of these, and the .sxc file in particular, do not properly delimitate based on tabs. Instead, they also delimitate based on carriage returns, resulting in a strange and improper spreadsheet. My system is as follows: English Windows XP Pro SP1 (US) OOo 1.1.2 Microsoft Office 2003 Athlon XP 2700+ Processor 1 GB RAM
Created attachment 17917 [details] What Excel produces from the tab-delimited text file
Created attachment 17918 [details] What OOo produces from the tab-delimited text file
Please note that the Report data.txt cannot be expected to produce the proper results in either spreadsheet because it contains over 400 fields/columns. They are limited to a width of 255 fields/columns. (I use Excel 2000, and have not seen XP.)
I have seen that the last entry was in September and that this issue is OOo-1.1.2 related. Have you tried a newer version of OOo like 1.1.3, 1.1.4 or some 1.9.xx build? Does this problem occur there also?
There is no improvement in 1.1.4
But have you tried an 1.9.xx build? Does this problem occur there too? I have heard that the DB module was really improved there (although I have not tested it yet) ...
I believe I was able to reproduce mbenua's steps for this issue, but I'm not sure what I confirmed. The sample txt file is simply too big. It has a line in it 1,105 characters long. patrickstanley, I believe this is a valid issue, but I think it would go a long way to getting this solved if a file that was only as big as necessary to prove the issue was supplied. That means pretty small. Thanks.
can anybody acknowledge this issue in a current version? http://download.openoffice.org/680/index.html thx
no respond - no repro - seems to be fixed in between - do not hestitate to reopen if you have anything to add and/or still occure in current version thx
close
Yes, It still does not work in version 1.9.125. It appears that the headers all go in but it bails at the first carriage return in the data and starts a new record. All remains the same.
i can repro that - short description: 1. create a text database like: aaa bbb "lala lulu" abc expected output => "lalalulu" abc (in one row) effective output => (in two rows) "lala lulu" abc clu: works fine in spreadsheet with spreadsheet filter 'Text CSV' - should also work in database (like i agreed with oj)
change owner
The expected output as proposed in the previous post "expected output => "lalalulu" abc (in one row)" Implies that the <CRLF>, or <EOL> if you prefer, is filtered out. The basis of the issue is that what is between the quotes should not be used as codes, but it also should not be filtered. Therefore, I believe the output from the test case should be "expected output => "lala<CRLF>lulu" abc (in one row)"
*** Issue 53459 has been marked as a duplicate of this issue. ***
*** Issue 62957 has been marked as a duplicate of this issue. ***
It looks like this is a limitation in HSQLDB. I can't find neither specific documentation as to whether newlines are (or are not) allowed in a TEXT table, nor any bug report/feature request, and I'm not going spelunking in the HSQLDB source ;-), but : Q: http://sourceforge.net/forum/message.php?msg_id=2048917 A: http://sourceforge.net/forum/message.php?msg_id=2050639
Not sure if this is still the case with HSQLDB. I am attaching an embedded database file. Open this file, then open the SQL window. Copy and paste these commands into it. CREATE TEXT TABLE "DISCIPLINARY_ACTION_TXT"("ActionID" INTEGER NOT NULL PRIMARY KEY,"StudentID" INTEGER,"StaffID" INTEGER,"ActionType" VARCHAR(50),"ActionDate" DATE,"ActionComments" LONGVARCHAR); SET TABLE DISCIPLINARY_ACTION_TXT SOURCE "DISCIPLINARY_ACTION_TXT.CSV;all_quoted = true"; INSERT into DISCIPLINARY_ACTION_TXT ( SELECT "ActionID", "StudentID", "StaffID", "ActionType", "ActionDate", "ActionComments" FROM "DISCIPLINARY_ACTION" AS "DISCIPLINARY_ACTION" ); Close the database and open the file DISCIPLINARY_ACTION_TXT.CSV with any text editor. The second record includes an embedded lineend character.
Created attachment 38958 [details] test file for embedded cr/lf in text table export
The "normal" CSV handling when using the Text Driver from OOo Base doesn't use the hsql database. So it's a issue I have to fix ;-)
Fixed in cws dba24d
Please verify. Thanks.
verified in cws
Tested with 680m4, XP Altered reportData.txt file by adding CR after every pair of data items. Created TEXT type Base file - all data imports to table properly. Closing
Created attachment 51185 [details] Bug doc changed to be 2 columns of data