Issue 115308

Summary: Base: Auto-type recognition reads extra line on import
Product: Base Reporter: nrs <nrsi.cheuk>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P3 CC: george.gentili, issues, r4zoli
Version: OOO330m12   
Target Milestone: ---   
Hardware: PC   
OS: All   
Issue Type: DEFECT Latest Confirmation in: 3.4.1
Developer Difficulty: ---
Issue Depends on:    
Issue Blocks: 115420    
Attachments:
Description Flags
Test .ods file
none
Test .xls file
none
Test .ods file #2
none
Test .xls file #2 none

Description nrs 2010-10-29 08:51:11 UTC
Base: Auto-type recognition reads extra line on import

When importing into Base a spreadsheet with a header row, if the max lines for
auto-type recognition is /x/, OOo actually auto-recognises /x/+1 data rows. 
This results in incorrect field lengths being set.

This problem is attested in the following builds and OS:
1. OOO300m9 on 32-bit Simplified Chinese Windows XP Professional SP3
   The error occurs when importing from Calc but not from Excel 2003 SP3.
2. OOO320m12 on Ubuntu 10.04 in Oracle VM VirtualBox 3.2.10 r66523 on 64-bit
English Windows Vista Business SP2
   The error occurs when importing from Calc.  Excel is not available for
testing in this OS.
3. OOO320m18 on 64-bit English Windows Vista Business SP2
   The error occurs when importing from either Calc or Excel 2007 SP2.  
4. OOO330m12 on 32-bit English Windows XP Professional SP3
   The error occurs when importing from either Calc or Excel 2003 SP3.

This is a regression from a previous build, for although importing from Calc has
always been incorrect, at least importing from Excel worked correctly before
(OOO300m9) but now doesn't.  This may be due to the introduction of the option
'Use first line as column names' in the later builds unmasking an existing bug.

Test case:
1. Open the attached .ods file, which contains 4 rows of data of various lengths.
2. Copy all data:
   a. Place <ctrl-home>.
   b. Press <ctrl-shift-end>.
   c. Press <ctrl-c>.
3. Create new database:
   a. Go to File|New|Database.
   b. Select 'Create a new database'.
   c. Click 'Next'.
   d. Select 'Yes, register...'.
   e. Check 'Open the database for editing'.
   f. Uncheck 'Create tables...'.
   g. Click 'Finish'.
   h. Type an arbitrary file name.
   i. Click 'Save'.
4. Set up table:
   a. Select 'Tables' in the Database panel.
   b. Right-click in the 'Tables' pane.
   c. Select 'Paste'.
   d. Give the table an arbitrary name (or use default).
   e. Select 'Definition and data'.
   f. Check or uncheck 'Use first line...' (makes no difference) if this option
is available.
   g. Check or uncheck 'Create primary key'. (Doesn't matter.)
   h. Click 'Next'.
   i. Click double right arrows.
   j. Click 'Next'.
5. Set field types:
   a. Set max lines for auto-type recognition to 1.
   b. Click 'Auto'.  One would expect 10 is enough for the field length since
data 1 only has 3 chars, but the resulting length is actually 70, the length of
data 2 rounded up.
   c. Repeat steps a-b with max lines = 2.  Length remains 70.
   d. Repeat steps a-b with max lines = 3.  One would expect the length to be
the same (70) since data 3 is even shorter than data 2, but the resulting length
is now 130, the rounded up length of data 4.
   e. Repeat steps a-b with max lines = 4.  Length remains 130.
   f. Repeat steps a-b with max lines = 0.  This should not have been allowed as
a valid value since it doesn't make sense to 'auto-recognise' anything with zero
lines of input.  However, Base accepts it and the resulting length is now 10,
the length of data 1 rounded up.

Repeating the test case with the attached .xls file, which has the same content
as the .ods file, yields the same erroneous result in OOO320m18 and OOO330m12
but not in OOO300m9.

The above confirms that auto-recognition is reading /x/+1 data rows vs. the
expected /x/ and thereby setting incorrect field lengths.
Comment 1 nrs 2010-10-29 08:55:09 UTC
Created attachment 72777 [details]
Test .ods file
Comment 2 nrs 2010-10-29 08:56:02 UTC
Created attachment 72778 [details]
Test .xls file
Comment 3 r4zoli 2010-10-29 10:59:19 UTC
I followed your description, but it is not too clear to me, why this is a problem.

When you not change the automatic type recognition number and click on auto, you
get the 130 value what is good for you data selected to insert, no data loss
during copy. 

And field length same as the your longest data. No longer what you needs.  

I tested only in OOo 3.3 RC2 on win7 (OOO330m10).

 

In what scenario, cause to you the data loss, and not warned about it?

Or created longer fields what you needs?
Comment 4 nrs 2010-11-05 05:15:40 UTC
This bug per se is not about data loss but *wrong behaviour*.  As long as it can
be reproduced, it is a confirmed defect & should be fixed.  Note that this bug
is likely a cause to another bug which can result in data loss.  See Issue
#115420 for more details.  If the current bug is fixed, we can better isolate
the cause to Issue #115420 &, if it is really the cause to the latter, even fix
2 bugs in one shot.  I'm not sure whether it's right to specify a likely (but
not certain) dependency above.  I'll set it here & in Issue #115420 for now but
if you think it's not right, pls. feel free to remove it.  But in any case,
*pls.* fix it at your earliest convenience, for either or both of these issues
is/are preventing me from importing my data from Calc; if they are not fixed,
I'll but have to use Excel!
Comment 5 r4zoli 2010-11-05 08:31:00 UTC
I think it is not wrong behavior.
If you go through table copy wizard without changing anything, it creates table
with 255 character long, and no data loss, it is an extra places in text fields.

If you change default settings, you needs to be aware of consequence of changes.

  
Comment 6 nrs 2010-11-10 04:10:31 UTC
I'm afraid you've missed my point.  This defect is about *auto-type
recognition*.  If you don't click 'Auto', of course you won't trigger this bug!

If you click 'Auto', then you'll encounter *wrong* behaviour.  If you specify 10
max lines for auto-type recognition & OOo actually reads 11 lines, how correct
can it be?  This behaviour clearly violates the function's spec & needs to be
fixed.  Pls. be well aware that this defect did not exist in 300m9 if importing
from Excel, i.e., if you specify 10 max lines in 300m9 & import from Excel, OOo
will only take in 10 max lines for auto-type recognition.  In later builds,
however, it overshoots.  More importantly, as mentioned before, this defect
affects the more serious Issue #115420 which *does* result in data loss in
certain cases.  Pls. fix the current defect ASAP & see how it will change the
results of Issue #115420.

To help you better understand the scenario, I'm attaching here the test files
used in Issue #115420, which incorrectly recognises text to have a date type. 
Data rows 5 & 10 contain an empty cell in the 'Date' column.  You may repeat the
current test with whichever sheet in those files with a modified step 5:
5. Set field types:
   a. Set max lines for auto-type recognition to 10.
   b. Click 'Auto'.
   c. Select the 'Desc' field.  Note the field type has been incorrectly
recognised as 'Date [ DATE ]', which matches the misbehaviour triggered by an
empty date cell in the preceding column as described in Issue #115420.
   d. Repeat steps a-c with max lines = 9.  One would expect the 'Desc' column
to receive a correct type (Text) this time since data row 9 does contain a valid
date in the 'Date' column (& therefore should NOT trigger wrong type
recognition).  However, we get the same wrong result with 'Desc' set to a date type.
   e. Repeat steps a-c with max lines = 8.  This time 'Desc' has a correct type
(Text).
   f. Repeat steps a-c with max lines = 5.  One would expect the 'Desc' column
to receive a wrong type (Date) since data row 5 contains an empty cell in the
'Date' column (& therefore should trigger wrong type recognition).  However, we
get a correct result with 'Desc' set to a Text type.
   g. Repeat steps a-c with max lines = 4.  One would expect the 'Desc' column
to receive a correct type (Text) this time since data row 4 does contain a valid
date in the 'Date' column (& therefore should NOT trigger wrong type
recognition).  However, we get a wrong result with 'Desc' set to a date type.
   h. Repeat steps a-c with max lines = 3.  This time 'Desc' has a correct type
(Text).

The above clearly confirms that auto-type recognition is reading /x/+1 lines of
data instead of the specified /x/.  If for some reason (e.g., my poor English)
this explanation still doesn't seem quite clear to you, then pls. kindly ask
someone else to look at it.  Collective intelligence is always a good thing :)
Comment 7 nrs 2010-11-10 04:11:50 UTC
Created attachment 72921 [details]
Test .ods file #2
Comment 8 nrs 2010-11-10 04:12:20 UTC
Created attachment 72922 [details]
Test .xls file #2
Comment 9 nrs 2010-11-10 05:27:07 UTC
BTW, this defect also occurs in 330m13 on 32-bit English Windows XP Pro SP3. 
I've changed the Version no. above accordingly.
Comment 10 r4zoli 2010-11-10 06:24:05 UTC
Please not change version it is needs to be set to version where it was find first.
It helps to find regressions easier. 
I set back to OOO330m12.
Comment 11 Rob Weir 2013-02-02 02:55:25 UTC
This Issue requires more information ('needmoreinfo'), but has not been updated
within the last year. Please provide feedback as requested and re-test with the the latest version of OpenOffice - the problem(s) may already be addressed. 

You can download Apache OpenOffice 3.4.1 from http://www.openoffice.org/download

Please report back the outcome of your testing, so this Issue may be closed or
progressed as necessary - otherwise the issue may be Resolved as Invalid in the
future.
Comment 12 gg43 2013-04-03 00:11:01 UTC
Platform: 
Operating System: Windows 7 Home Premium Version 6.1.7601 Service Pack 1 Build 7601 (x64)
Apache Open Office 3.4.1 AOO341m1 Build 9593 Rev 1372282


Function Copy Table
  - Map Format Type
    - Section Automatic detection of

As indicated in the operative notes (help button) and under the normal operating behaviour field "Rows (max)" should contain the maximum number of rows of data to be examined to determine the type of the field.
Therefore it should not be accepted a value less than 1, 
In my test if typed negative values ​​currently set 0 and accepts the value 0, in addition of course to the values​​> 0.
In my opinion in the case of values ​​<= 0 the field should be set to 1.
In addition, the examination of input lines should start from line 1 if not active the "Use first row as column name" and line 2 if active this field.
Currently however, the examination of input lines always starts from line 2 for a number of rows corresponding to the number of rows present in the field "Rows (max)" + 1.

Confirm the bug.