Issue 113658 - CSV Imports limits row to 65535 characters
Summary: CSV Imports limits row to 65535 characters
Status: CONFIRMED
Alias: None
Product: Calc
Classification: Application
Component: open-import (show other issues)
Version: OOo 3.2.1
Hardware: PC All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: needhelp, oooqa
Depends on:
Blocks:
 
Reported: 2010-08-04 20:02 UTC by adamsei
Modified: 2013-01-29 21:42 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Subset CSV (185.26 KB, text/plain)
2010-08-04 20:03 UTC, adamsei
no flags Details
python3 script to count characters in a line (252 bytes, text/plain)
2010-08-29 05:40 UTC, adamsei
no flags Details
CSV file that illustrates; Cell "OX2" is truncated (64.00 KB, text/plain)
2010-08-29 05:41 UTC, adamsei
no flags Details
python3 script to generate psudo-random CSV files (521 bytes, text/plain)
2010-08-29 05:42 UTC, adamsei
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description adamsei 2010-08-04 20:02:08 UTC
When importing a CSV file OpenOffice will truncate an entry after 65536
characters. Discovered after random survey responses were found to be missing
data.  I selectivly modified records to determine 65536 was the limit.

The included file gives a sample of the of our dataset which was exported from a
third-party survey software.  You will notice Harvard's entry truncates
prematurely and actually causes the next entry to begin on the same line.
Comment 1 adamsei 2010-08-04 20:03:08 UTC
Created attachment 70961 [details]
Subset CSV
Comment 2 adamsei 2010-08-04 20:06:16 UTC
I failed to mention the problem has presented itself on Mac OS X (unsure
version), Windows (XP through 7), and Arch Linux.  In all cases OOo was 3.2.1.
Comment 3 Rainer Bielefeld 2010-08-05 07:47:50 UTC
Reproducible with "Ooo-Dev 3.3.0 multilingual version English UI WIN XP:
[DEV330m1 (Build 9514)]"!
Opening sample.csv wich separator "comma" in cell 'W4' of new CALC spreadsheet
text ends with string "systems and to learning through advanced research". When
I copy cell contents to a new WRITER document, I count 466 characters in the cell.
Opening sample.csv wich with "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" in
cell 'W4' of new CALC spreadsheet text ends with string "systems and to learning
through advanced research". When I copy cell contents to a new WRITER document,
I count 3220 characters in the cell.

Additionally I see that in 3.3 document only shows 4 rows "ResponseID", "1879",
"1881", "1886", With 3.1.1: ResponseID, 1879, 1881, 1886, 1889, 1892
what might be related, but also that might be a different problem.


adamsei:
How did you check that 65536 characters limitation?

Please always contribute a clear instruction how to find the problem! At least
you have to mention your csv import settings and an explicit description what
exactly your problem is.
May I ask you to read our guidelines on
<http://qa.openoffice.org/issue_handling/pre_submission.html> and  
<http://www.openoffice.org/bugs/bug_writing_guidelines.html> before you file
further issues or post again here? Then please contribute a clear step by step
instruction containing all observations (error
messages ...), EVERY key press and EVERY mouse click how to reproduce the
problem, and explain why you believe that your results are unexpected.
That means (for example): 
 do not write something like "I am not able to ...", but
  6. left mouse click on …
     expected: …, color of … changes, … 
     actual: no …., color remains white, no …
Comment 4 Rainer Bielefeld 2010-08-05 09:45:10 UTC
I'm sorry, there is a mistake concerning final string in my latest comment.
Correct comment must be:
"Opening sample.csv wich with "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" in
cell 'W4' of new CALC spreadsheet text ends with string
"/oe/HU_Temp_Policy_July09.pdf". 

Some of the differences I saw between 3.1.1 and 3.3. were caused by using
different languages.

The truncating problem seems to be caused by text delimiter '
Using this one with 3.1.1, I get message "The maximum number of rows has been
exceeded" (Issue 75199?), and Text in 'W4' will be truncated. 

3.3 will show 'W4' contents until end "/oe/HU_Temp_Policy_July09.pdf" if I use "
as delimiter.

Currently I can't see any bug.
Comment 5 adamsei 2010-08-29 05:39:48 UTC
It appears I was wrong, and 65535 is the number one is limited to, 65536 is the
number that breaks it.  This is not a particulary surpising number since it is
the maximum value of an unsigned short int.  I have updated the summary accordingly.

To be clear the issue is not the number of characters in a cell but rather the
number of characters in a line of the CSV file.

A script similar to count_public.py was used to determine character count of a
line, recognize it does count non-printing characters (CR, LF).

I had used python to generate psudo-random characters to flush one of our survey
responses that was below the 65536 limit to the limit, and then manually added
and removed characters using a text editor (mousepad and wordpad) to confirm
that 65536 was the problem number.

CSV files were imported with what I understand to be the settings for standard
CSV file according to RFC 4180. Namely, Seperated by Comma, Text delimiter "
nothing else selected.  All intelligent encoding setting for the file preset the
same problem, though UTF-8 has been most thoroughly tested.

Since I am unable to make availabe much of our working data set and the
complexity of our survey resposes seem to create some confusion, I wrote a
python script to generate a suitable CSV file to illustrate the issue.

These latest files were only tested on OpenOffice.org 3.2.1 OOO329m19
(Build:9505) on ArchLinux (x86_64)

To reproduce open dummy_data.csv with the standard CSV settings and UTF-8
encoding.  Note cell "OX1" reads "IF" where if should read "IFNOT"
dummy_data.py can be used to produce additional testing files.

I also noticed this limitation affects the Text database option in Base.  Should
a seperate Issue be opened for that?
Comment 6 adamsei 2010-08-29 05:40:46 UTC
Created attachment 71457 [details]
python3 script to count characters in a line
Comment 7 adamsei 2010-08-29 05:41:45 UTC
Created attachment 71458 [details]
CSV file that illustrates; Cell "OX2" is truncated
Comment 8 adamsei 2010-08-29 05:42:26 UTC
Created attachment 71459 [details]
python3 script to generate psudo-random CSV files
Comment 9 Rainer Bielefeld 2010-08-29 09:06:52 UTC
Reproducible with "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" and with
"Ooo-Dev 3.3.0 multilingual version English UI WIN XP: [OOo330m1 (Build 9514)]",
not in all details (because of my simplyfied test method), but for the general
effect. 

I see a limitation opening "dummy_data.csv" with 3.1.1.
In final cells 'OW' / 'OX' I see contents "TQLXHAIMWFCYKRBQWQREKAOAM" / "IF"
opening sample "dummy_data.csv" Latest cell should contain "IFNOT", as I see
with Editpadlite.

I did an additional test with "dummy_data.csv":
1. opened document
2. marked 'OW' with contents "TQLXHAIMWFCYKRBQWQREKAOAM"
3. go to 'OY'
4. <ntrl>+<v>, contents was pasted as expected
5. marked cell moving mouse pointer with pressed left button to 'TG'
  (exact position does not matter)
7. Menu 'Edit -> Fill -> right' (cells will be filled with contents)
8. save as "dummy_data_new.csv" and "dummy_data_new.ods"
9. close "dummy_data_new.ods"
10. reopen both documents with CALC
    expected: both with new additional contents
    actual: "dummy_data_new.csv" adds some new contents until 'PB', further 
            contents is truncated
            only "dummy_data_new.ods" with new contents until 'TG'

It's an "open-import" problem, I see added cell contents in "dummy_data_new.csv"
with Editpadlite.

Because the limitation does not exist for .ods documents, I believe it's a bug.