Apache OpenOffice (AOO) Bugzilla – Issue 115308
Base: Auto-type recognition reads extra line on import
Last modified: 2014-08-25 17:11:53 UTC
Base: Auto-type recognition reads extra line on import When importing into Base a spreadsheet with a header row, if the max lines for auto-type recognition is /x/, OOo actually auto-recognises /x/+1 data rows. This results in incorrect field lengths being set. This problem is attested in the following builds and OS: 1. OOO300m9 on 32-bit Simplified Chinese Windows XP Professional SP3 The error occurs when importing from Calc but not from Excel 2003 SP3. 2. OOO320m12 on Ubuntu 10.04 in Oracle VM VirtualBox 3.2.10 r66523 on 64-bit English Windows Vista Business SP2 The error occurs when importing from Calc. Excel is not available for testing in this OS. 3. OOO320m18 on 64-bit English Windows Vista Business SP2 The error occurs when importing from either Calc or Excel 2007 SP2. 4. OOO330m12 on 32-bit English Windows XP Professional SP3 The error occurs when importing from either Calc or Excel 2003 SP3. This is a regression from a previous build, for although importing from Calc has always been incorrect, at least importing from Excel worked correctly before (OOO300m9) but now doesn't. This may be due to the introduction of the option 'Use first line as column names' in the later builds unmasking an existing bug. Test case: 1. Open the attached .ods file, which contains 4 rows of data of various lengths. 2. Copy all data: a. Place <ctrl-home>. b. Press <ctrl-shift-end>. c. Press <ctrl-c>. 3. Create new database: a. Go to File|New|Database. b. Select 'Create a new database'. c. Click 'Next'. d. Select 'Yes, register...'. e. Check 'Open the database for editing'. f. Uncheck 'Create tables...'. g. Click 'Finish'. h. Type an arbitrary file name. i. Click 'Save'. 4. Set up table: a. Select 'Tables' in the Database panel. b. Right-click in the 'Tables' pane. c. Select 'Paste'. d. Give the table an arbitrary name (or use default). e. Select 'Definition and data'. f. Check or uncheck 'Use first line...' (makes no difference) if this option is available. g. Check or uncheck 'Create primary key'. (Doesn't matter.) h. Click 'Next'. i. Click double right arrows. j. Click 'Next'. 5. Set field types: a. Set max lines for auto-type recognition to 1. b. Click 'Auto'. One would expect 10 is enough for the field length since data 1 only has 3 chars, but the resulting length is actually 70, the length of data 2 rounded up. c. Repeat steps a-b with max lines = 2. Length remains 70. d. Repeat steps a-b with max lines = 3. One would expect the length to be the same (70) since data 3 is even shorter than data 2, but the resulting length is now 130, the rounded up length of data 4. e. Repeat steps a-b with max lines = 4. Length remains 130. f. Repeat steps a-b with max lines = 0. This should not have been allowed as a valid value since it doesn't make sense to 'auto-recognise' anything with zero lines of input. However, Base accepts it and the resulting length is now 10, the length of data 1 rounded up. Repeating the test case with the attached .xls file, which has the same content as the .ods file, yields the same erroneous result in OOO320m18 and OOO330m12 but not in OOO300m9. The above confirms that auto-recognition is reading /x/+1 data rows vs. the expected /x/ and thereby setting incorrect field lengths.
Created attachment 72777 [details] Test .ods file
Created attachment 72778 [details] Test .xls file
I followed your description, but it is not too clear to me, why this is a problem. When you not change the automatic type recognition number and click on auto, you get the 130 value what is good for you data selected to insert, no data loss during copy. And field length same as the your longest data. No longer what you needs. I tested only in OOo 3.3 RC2 on win7 (OOO330m10). In what scenario, cause to you the data loss, and not warned about it? Or created longer fields what you needs?
This bug per se is not about data loss but *wrong behaviour*. As long as it can be reproduced, it is a confirmed defect & should be fixed. Note that this bug is likely a cause to another bug which can result in data loss. See Issue #115420 for more details. If the current bug is fixed, we can better isolate the cause to Issue #115420 &, if it is really the cause to the latter, even fix 2 bugs in one shot. I'm not sure whether it's right to specify a likely (but not certain) dependency above. I'll set it here & in Issue #115420 for now but if you think it's not right, pls. feel free to remove it. But in any case, *pls.* fix it at your earliest convenience, for either or both of these issues is/are preventing me from importing my data from Calc; if they are not fixed, I'll but have to use Excel!
I think it is not wrong behavior. If you go through table copy wizard without changing anything, it creates table with 255 character long, and no data loss, it is an extra places in text fields. If you change default settings, you needs to be aware of consequence of changes.
I'm afraid you've missed my point. This defect is about *auto-type recognition*. If you don't click 'Auto', of course you won't trigger this bug! If you click 'Auto', then you'll encounter *wrong* behaviour. If you specify 10 max lines for auto-type recognition & OOo actually reads 11 lines, how correct can it be? This behaviour clearly violates the function's spec & needs to be fixed. Pls. be well aware that this defect did not exist in 300m9 if importing from Excel, i.e., if you specify 10 max lines in 300m9 & import from Excel, OOo will only take in 10 max lines for auto-type recognition. In later builds, however, it overshoots. More importantly, as mentioned before, this defect affects the more serious Issue #115420 which *does* result in data loss in certain cases. Pls. fix the current defect ASAP & see how it will change the results of Issue #115420. To help you better understand the scenario, I'm attaching here the test files used in Issue #115420, which incorrectly recognises text to have a date type. Data rows 5 & 10 contain an empty cell in the 'Date' column. You may repeat the current test with whichever sheet in those files with a modified step 5: 5. Set field types: a. Set max lines for auto-type recognition to 10. b. Click 'Auto'. c. Select the 'Desc' field. Note the field type has been incorrectly recognised as 'Date [ DATE ]', which matches the misbehaviour triggered by an empty date cell in the preceding column as described in Issue #115420. d. Repeat steps a-c with max lines = 9. One would expect the 'Desc' column to receive a correct type (Text) this time since data row 9 does contain a valid date in the 'Date' column (& therefore should NOT trigger wrong type recognition). However, we get the same wrong result with 'Desc' set to a date type. e. Repeat steps a-c with max lines = 8. This time 'Desc' has a correct type (Text). f. Repeat steps a-c with max lines = 5. One would expect the 'Desc' column to receive a wrong type (Date) since data row 5 contains an empty cell in the 'Date' column (& therefore should trigger wrong type recognition). However, we get a correct result with 'Desc' set to a Text type. g. Repeat steps a-c with max lines = 4. One would expect the 'Desc' column to receive a correct type (Text) this time since data row 4 does contain a valid date in the 'Date' column (& therefore should NOT trigger wrong type recognition). However, we get a wrong result with 'Desc' set to a date type. h. Repeat steps a-c with max lines = 3. This time 'Desc' has a correct type (Text). The above clearly confirms that auto-type recognition is reading /x/+1 lines of data instead of the specified /x/. If for some reason (e.g., my poor English) this explanation still doesn't seem quite clear to you, then pls. kindly ask someone else to look at it. Collective intelligence is always a good thing :)
Created attachment 72921 [details] Test .ods file #2
Created attachment 72922 [details] Test .xls file #2
BTW, this defect also occurs in 330m13 on 32-bit English Windows XP Pro SP3. I've changed the Version no. above accordingly.
Please not change version it is needs to be set to version where it was find first. It helps to find regressions easier. I set back to OOO330m12.
This Issue requires more information ('needmoreinfo'), but has not been updated within the last year. Please provide feedback as requested and re-test with the the latest version of OpenOffice - the problem(s) may already be addressed. You can download Apache OpenOffice 3.4.1 from http://www.openoffice.org/download Please report back the outcome of your testing, so this Issue may be closed or progressed as necessary - otherwise the issue may be Resolved as Invalid in the future.
Platform: Operating System: Windows 7 Home Premium Version 6.1.7601 Service Pack 1 Build 7601 (x64) Apache Open Office 3.4.1 AOO341m1 Build 9593 Rev 1372282 Function Copy Table - Map Format Type - Section Automatic detection of As indicated in the operative notes (help button) and under the normal operating behaviour field "Rows (max)" should contain the maximum number of rows of data to be examined to determine the type of the field. Therefore it should not be accepted a value less than 1, In my test if typed negative values currently set 0 and accepts the value 0, in addition of course to the values> 0. In my opinion in the case of values <= 0 the field should be set to 1. In addition, the examination of input lines should start from line 1 if not active the "Use first row as column name" and line 2 if active this field. Currently however, the examination of input lines always starts from line 2 for a number of rows corresponding to the number of rows present in the field "Rows (max)" + 1. Confirm the bug.