Issue 31385 - Quoted strings are not handled in a delimited text database.
Summary: Quoted strings are not handled in a delimited text database.
Status: CLOSED FIXED
Alias: None
Product: Base
Classification: Application
Component: code (show other issues)
Version: OOo 1.1.2
Hardware: PC Windows 2000
: P3 Trivial with 4 votes (vote)
Target Milestone: OOo 2.4
Assignee: christoph.lukasiak
QA Contact: issues@dba
URL:
Keywords:
: 53459 62957 (view as issue list)
Depends on:
Blocks:
 
Reported: 2004-07-12 19:50 UTC by patrickstanley
Modified: 2008-01-27 13:22 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
This file contains a single tab delimited record including quoted strings. (6.91 KB, text/txt)
2004-07-12 19:58 UTC, patrickstanley
no flags Details
What Excel produces from the tab-delimited text file (24.00 KB, application/vnd.ms-excel)
2004-09-25 03:54 UTC, mbenua
no flags Details
What OOo produces from the tab-delimited text file (9.22 KB, application/vnd.sun.xml.calc)
2004-09-25 03:56 UTC, mbenua
no flags Details
test file for embedded cr/lf in text table export (17.96 KB, application/vnd.sun.xml.base)
2006-09-05 02:07 UTC, drewjensen.inbox
no flags Details
Bug doc changed to be 2 columns of data (6.26 KB, text/plain)
2008-01-27 13:22 UTC, drewjensen.inbox
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description patrickstanley 2004-07-12 19:50:24 UTC
Using a tab delimited database, strings with beginning and ending quotes, which
contain line ending characters terminate the record.  The parser should ignore
any such characters occuring within/between quotes, as the record end will occur
outside of the quoted string.

Example: the following quoted string should parse into one field/column.

"This is a big dog.
He likes his fleas."

For reference, MSWord 2000, exhibits this very useful behavior.
Comment 1 patrickstanley 2004-07-12 19:58:10 UTC
Created attachment 16423 [details]
This file contains a single tab delimited record including quoted strings.
Comment 2 christoph.lukasiak 2004-09-14 09:56:00 UTC
mine
Comment 3 christoph.lukasiak 2004-09-14 10:03:21 UTC
clu->patrickstanley: 1. may you add a step by step description 2. what settings
do you have putted in (seperator etc.)? have you uses " for any seperator? 

in principal it works fine for me - i cannot see a bug
Comment 4 patrickstanley 2004-09-14 18:34:56 UTC
The attachment includes two tab-delimited records. One header and one data. 
When I open it, the headers are parsed properly, the remainder is parsed very
strangely into 11 records. I had assumed that the cause was simply that the
quotes were not matched before validating a line end character, as the
attachedment includes multiple paragraphs between quotes, and the multiple
records seemed to line up that way. But there certainly other parseing problems
as record 11 is the record that does not have the data from a single column
duplicated in every column. This appears to be because there are
paragraph/line-ending marks between quotes in these cases.
While it makes no difference here, an unquoted character, [`], is included as
the last in each of the two records in the attachment as described above.  
The settings are for a tab delimited text database containing text headers,
containing quoted text and us numeric seperators.  The file extension is text
and the character set is set for Western European.
Please try the attachment.
Comment 5 mbenua 2004-09-25 03:39:13 UTC
Melissa Benua, 9/24/04:

I did reproduce the erroneous import of the .csv file provided above. However,
when I went 

into Excel 2003 to verify how the data was supposed to appear, it did create the 

spreadsheet differently, but it didn't seem any more correct. I did the following 

procedure in Excel 2003 to create a spreadsheet:

1) Open a new, blank spreadsheet
2) Data->Import External Data->Import Data
3) Select the file ReportData.csv
4) Import the file as Delimited, and type as Western European, and select next
5) User a tab delimiter, and set the text qualifier to ' instead of " and hit next
6) Hit finish and save file as .xls

This will create the following .xls file that I have attached (ReportDataExcel.xls).

To import the file into OOo, I did  the following:

1) Open ReportData.csv in Calc
2) Select the Western European character set
3) Set as tab delimitated, with the text delimitator set to ' instead of " and
hit okay
4) Save file as .sxc

This created the following .sxc file that I have attached (ReportDataOOo.sxc).

Neither one of these, and the .sxc file in particular, do not properly
delimitate based on 

tabs. Instead, they also delimitate based on carriage returns, resulting in a
strange and 

improper spreadsheet.

My system is as follows:
English Windows XP Pro SP1 (US)
OOo 1.1.2
Microsoft Office 2003
Athlon XP 2700+ Processor
1 GB RAM
Comment 6 mbenua 2004-09-25 03:54:12 UTC
Created attachment 17917 [details]
What Excel produces from the tab-delimited text file
Comment 7 mbenua 2004-09-25 03:56:50 UTC
Created attachment 17918 [details]
What OOo produces from the tab-delimited text file
Comment 8 patrickstanley 2004-09-26 16:52:03 UTC
Please note that the Report data.txt cannot be expected to produce the proper
results in either spreadsheet because it contains over 400 fields/columns.  They
are limited to a width of 255 fields/columns. (I use Excel 2000, and have not
seen XP.)
Comment 9 thackert 2005-01-02 14:18:54 UTC
I have seen that the last entry was in September and that this issue is OOo-1.1.2 related. Have you tried a newer version 
of OOo like 1.1.3, 1.1.4 or some 1.9.xx build? Does this problem occur there also?
Comment 10 patrickstanley 2005-01-07 17:37:45 UTC
There is no improvement in 1.1.4
Comment 11 thackert 2005-01-16 16:45:08 UTC
But have you tried an 1.9.xx build? Does this problem occur there too? I have
heard that the DB module was really improved there (although I have not tested
it yet) ...
Comment 12 haxwell 2005-02-11 19:23:46 UTC
I believe I was able to reproduce  mbenua's steps for this issue, but I'm not
sure what I confirmed. The sample txt file is simply too big. It has a line in
it 1,105 characters long. 

patrickstanley, I believe this is a valid issue, but I think it would go a long
way to getting this solved if a file that was only as big as necessary to prove
the issue was supplied. That means pretty small.

Thanks.
Comment 13 christoph.lukasiak 2005-04-26 14:26:58 UTC
can anybody acknowledge this issue in a current version?
http://download.openoffice.org/680/index.html

thx
Comment 14 christoph.lukasiak 2005-06-01 12:49:27 UTC
no respond - no repro - seems to be fixed in between - do not hestitate to
reopen if you have anything to add and/or still occure in current version

thx
Comment 15 christoph.lukasiak 2005-06-01 12:49:49 UTC
close
Comment 16 patrickstanley 2005-10-22 01:33:37 UTC
Yes, It still does not work in version 1.9.125.

It appears that the headers all go in but it bails at the first carriage return
in the data and starts a new record.

All remains the same.
Comment 17 christoph.lukasiak 2005-11-30 14:15:51 UTC
i can repro that - short description: 

1. create a text database like:

aaa	bbb
"lala
lulu"	abc

expected output => "lalalulu"	abc (in one row)

effective output => (in two rows)
"lala
lulu"	abc

clu: works fine in spreadsheet with spreadsheet filter 'Text CSV'  - should also
work in database (like i agreed with oj)
Comment 18 christoph.lukasiak 2005-11-30 14:18:47 UTC
change owner
Comment 19 patrickstanley 2005-11-30 18:41:32 UTC
The expected output as proposed in the previous post

"expected output => "lalalulu"	abc (in one row)"

Implies that the <CRLF>, or <EOL> if you prefer, is filtered out. 

The basis of the issue is that what is between the quotes should not be used as
codes, but it also should not be filtered.  Therefore, I believe the output from
the test case should be

"expected output => "lala<CRLF>lulu"	abc (in one row)"
Comment 20 christoph.lukasiak 2006-02-08 12:07:33 UTC
*** Issue 53459 has been marked as a duplicate of this issue. ***
Comment 21 marc.neumann 2006-03-09 09:24:32 UTC
*** Issue 62957 has been marked as a duplicate of this issue. ***
Comment 22 Joe Smith 2006-09-04 16:50:01 UTC
It looks like this is a limitation in HSQLDB.

I can't find neither specific documentation as to whether newlines are (or are
not) allowed in a TEXT table, nor any bug report/feature request, and I'm not 
going spelunking in the HSQLDB source ;-), but :

Q: http://sourceforge.net/forum/message.php?msg_id=2048917
A: http://sourceforge.net/forum/message.php?msg_id=2050639
Comment 23 drewjensen.inbox 2006-09-05 02:05:57 UTC
Not sure if this is still the case with HSQLDB.

I am attaching an embedded database file.

Open this file, then open the SQL window.

Copy and paste these commands into it.

CREATE TEXT TABLE "DISCIPLINARY_ACTION_TXT"("ActionID" INTEGER NOT NULL PRIMARY
KEY,"StudentID" INTEGER,"StaffID" INTEGER,"ActionType" VARCHAR(50),"ActionDate"
DATE,"ActionComments" LONGVARCHAR);

SET TABLE DISCIPLINARY_ACTION_TXT SOURCE "DISCIPLINARY_ACTION_TXT.CSV;all_quoted
= true";

INSERT into DISCIPLINARY_ACTION_TXT ( SELECT "ActionID", "StudentID", "StaffID",
"ActionType", "ActionDate", "ActionComments" FROM "DISCIPLINARY_ACTION" AS
"DISCIPLINARY_ACTION" );

Close the database and open the file DISCIPLINARY_ACTION_TXT.CSV with any text
editor.
The second record includes an embedded lineend character.


Comment 24 drewjensen.inbox 2006-09-05 02:07:04 UTC
Created attachment 38958 [details]
test file for embedded cr/lf in text table export
Comment 25 ocke.janssen 2006-09-05 06:37:24 UTC
The "normal" CSV handling when using the Text Driver from OOo Base doesn't use
the hsql database. So it's a issue I have to fix ;-)
Comment 26 ocke.janssen 2007-11-26 07:51:21 UTC
Fixed in cws dba24d
Comment 27 ocke.janssen 2007-12-03 08:28:31 UTC
Please verify. Thanks.
Comment 28 christoph.lukasiak 2007-12-12 12:11:03 UTC
verified in cws
Comment 29 drewjensen.inbox 2008-01-27 13:21:05 UTC
Tested with 680m4, XP
Altered reportData.txt file by adding CR after every pair of data items.
Created TEXT type Base file - all data imports to table properly.

Closing
Comment 30 drewjensen.inbox 2008-01-27 13:22:43 UTC
Created attachment 51185 [details]
Bug doc changed to be 2 columns of data