Apache OpenOffice (AOO) Bugzilla – Issue 62492
Incorrect opening of XLS files using Excel 2003 XML format
Last modified: 2013-08-07 15:13:10 UTC
If you try to open some XLS files created in Excel 2003 format (the files are plain XML files with 3 more bytes before first tag - EF BB BF) the suite will open them in Writer or open them in Calc with the tab separated filter dialog. The problem seamn to be inside the component that decide who will open the document because opening the same file with scalc.exe -o filename.xls will work opening the XLS file without problems (only if you have JRE installed - it seamns that this format is parsed by a filter writen in java) I would put a P5 on it but some will yell. I think that is maybe the most important bug of 2.0 release. Just wonder how hot are support lines with users calling that they received a XLS document that is no longer opening and the previous version 1.x work just working. I think that this sould be fixed before 2.0.2 release. One workaround with no visible effects was to change the association from soffice.exe to scalc.exe for xls files like below: [code] [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\OpenOffice.org.xls\shell\open\command] @="\"C:\\Program Files\\OpenOffice.org 2.0\\program\\scalc.exe\" -o \"%1\"" [/code]
I don't know if this is a side effect or another bug but the file is still locked even the close of the XML document. Quiting quick starter is unlocking the file but this should not be normal behaviour. More information about the submitted bug can be found on http://www.oooforum.org/forum/viewtopic.phtml?p=129135#129135
Created attachment 34417 [details] Sample XLS 2003 file that is not opening well by default
Something for you? Please have a look.
Hi, the described behaviour is not a bug. The file attached to the Issue isn't a valid Excel Binary file as the file extension sugests. So Calc is looking into the file and foinds that it is a text file and now switches over to writer which import the file correctly. Using the correct filter or extension solves the problem. The problem with the file locking has to be examined by the framework team. Frank
File locking: issue 21747 *** This issue has been marked as a duplicate of 21747 ***
dupe
This clearly not a duplicate of issue 21747 as anyone can read it's description and it's not resolved. Microsoft Office is writing XLS files in both formats: binary and XML and if OpenOffice.org is supposed to load XLS files It must be able to open them. This is why I think this issue must have a high priority because it fully make the opening of some XLS files imposible for normal user (not talking about guru kind). Normal user expects that an XLS file it's a spreadsheet and it doesn't have to know what kind of spreadsheets it contains - all it wants to do is doble click on the XLS in order to open the file in his spreadsheet editor. The number of this kind of files (XLS in XML format) is keep growing and we give support to hundreds of users.
Please read carefully: File locking is duplicate to issue 21747. The attached file is invalid as fst has described. For upcomming issue you would like to write: Please describe only one problem in one issue. *** This issue has been marked as a duplicate of 21747 ***
reclosed
I think I read at least what I wrote - the description of the issue has nothing to do with the file-locking issue. If you read more carefully you'll see that the first comment was just referring to the filelocking bug so THIS ISSUE IS NOT THE DUPLICATE of the file locking bug. Sorry for this open-close ....maybe somebody is making pression on issue closing :)
So if this issue has nothing to do with the file locking problem someone of the spreadsheet team should decide if it would make sense to have this registry key: [code] [HKEY_LOCAL_MACHINE\SOFTWARE\Classes\OpenOffice.org.xls\shell\open\command] @="\"C:\\Program Files\\OpenOffice.org 2.0\\program\\scalc.exe\" -o \"%1\"" [/code]
This was my suggested workaround but I don't know if changing the installer in order to associate XLS directly to scalc.exe it's the correct solution. I think that this bug must be treated with special care: it can be hidden including my workaround in the installer OR by chanigng the eay the suite is detecting the file type. I think the best solution would be to do both of them. This way we'll be sure that any .XLS file would be open by Calc.
So I close this Issue as invalid. The file extension XLS is for Excel binary files and therefore a Excel XML file isn't a valid xls file. Changing the registry code will work in this case but not in all cases. The filter detection is looking into the file and did not find a Excel binary header, so the filter dfetection looks further and detects a text file. So it opens it as text in writer. If you use the correct file extension .xml for Excel 2003 XML files, the document will allways open in Calc. So this Issue isn't a bug and therrefore closed invalid. Frank
closed invalid
Let me be more specific: here are the real facts: The submited file it's 100% valid Microsoft Office Spreadsheet in XML format WITH and additional 3 bytes header before the first XML tag. We know that this file was generated by MS Office. Microsoft office not only open this file but also keep it's header (0xEF 0xBb 0xBF). This is working with ANY EXTENSION - even if you use XLS or XML. Now let's investigate OpenOffice.org behavior: Case 1: File > Open > test.XML in SCALC.EXE works. Case 2: File > Open > test.XLS in SCALC.EXE does not work. So I think there is a bug in filter detection algoritm. It should work like this: if recognize_binary_format() { ... open with binary filter } else /* not known binary */ { if (recognize_XML_format()) { ... open using detected filter } else if is_text_only() { ... open as text in writer } else { ... message: unrecognized binary format } XML file extension has not any meaning for the normal user - the file can be a HTML page, spreadsheet, configuration or any king of file. People must need to know what application is going to open their files in order to act. Normal people is not opening formated files (like XML) in order to edit them as plain text. They expect to edit the data from them not the file. So OOo must (not should) detect corectly the file format. As any power user who blamed MS for hiding the file extension from files any application must be able to open all suported files without requesting a specific extension for them. Extension is only informative and It can't guarantee you that the data inside it's valid. Currently hundreds (maybe more) developers are generating spreadsheet reports in microsoft XML format because it's very simple and it's just plain uncompressed xml. They are generating those files with XLS extension in order to make users to open them in a spreadsheet application BUT how nice OpenOffice.org it's smarter It can't accept opening an spreadsheet file (in XML format) stored in a file with XLS extension. If all file formats used were XML how do you expect for a user to make his computer open those files with a specific aapplication. File extension association it's a user configurable option, but recognizind the file format it the job of the application. The problem should stay open and must be assigned to the filter detection.
I have to agree. If it is a major objective of OOo to provide file exchange with microsoft formats, then no matter how stupid and cavelier microsoft may be in the allocation of extensions then OOo should do what the user expects
Hi, Excel is a single Application and therefore handels such file by opening them into itself. OOo Calc is part of an integrated Office suite sharing a lot of code and especially the file open dialogs. So the decission is made by the filterdetection how a file is opened. The first point to look is the extension and based on that the header of the file. So the filterdetection has no choice as to define the XML file named as XLS as text. This is how it works. Please have a look at Issue 8967 which basically is a double to this one. Frank *** This issue has been marked as a duplicate of 8967 ***
closed double
I agree with that it's a good thing that OOo share the opendialog code for all applications. You are wrong about this "...filterdetection has no choice as to define the XML file named as XLS as text. This is how it works.". If the filter detection parse the header it can parse XML header too in order to detect correct file type. Because filterdetection can receive any kind of files it must be able to do a good decision. The case of Excel XML saved with XLS extension it's one of the cases that can be resolved without breaking current behavior for current files. The "linked to" issue it's a very close relative of this but have a big difference: it's about CSV (tab separated) and plain TXT -> it can't be resolved in order to satisfy all. Should I reopen this issue?
Based on the comments on the issue 18228 I conclude that this is a bug in filterdetection. So the issue must be opened. In the issue 18228 it's clearly written that the filter detection it's detecting the file reading his header and the file extension it's not important. A new better description of this bug is: The filterdetection it's not detecting corectly Excel XML file format if the extension of the file it's XLS or other.
Sorry - but these are the problems of this file: a) It's not xls - because it's not a binary file. b) It's not XML - because it contains 3 binary signs. c) If MS generate such files realy - might be we have to react and try to workaround this "bug" of MS. The only bug I would accept here is: if these 3 bytes was removed from the file and the extension was not set right then this file is loaded into the writer a plaint text everytime. Thats why our txt filter has no real detection and cant differen between ASCII - all UniCode formated files - and even binaries. This cant be solved realy because nobody cant write such detection because it has to check all 64 K CodePages existing of this worl. Which by the way will affect detection time and will increase it up to "never ending" .-) The only workaround for this: ask the text filter explicitly at the end of the detection process so it can overrule any other detection service. That would help to solve other detection problems as well. I will file such issue to myself and try to fix it for OOo 2.0.3. (number of this new issue will be set as depedency to this isseu here later).
Hello intersol, let us gather some facts concerning this file format of your example document. 1) Although it has the suffix XLS it is clearly no binary XLS 2) When you take a look into the W3C XML specification http://www.w3.org/TR/2004/REC-xml-20040204/#sec-well-formed You see that a well-formed XML document starts with a prolog [1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' As you see for a well-formed XML document, there are no arbitrary prefix before the prolog. A document that has these bytes is therefore not well-formed and thereby can not be valid at all. Although aside of this three bytes the document would be well-formed and most likely valid. Therefore you have bring up an interesting bug of Microsoft Office, that we should take into account to ease the life of StarOffice / OpenOffice.org users.
Finally some really positive feedback. Here it's what if discovered right now: The header it's something very common: it's a byte order mask so this file it's a valid XML. Take a look at http://en.wikipedia.org/wiki/Byte_Order_Mark From what i know this mask it's supported by XML specification. PS. I've tested with and without masks and Excel it's accepting the both files. The result of my investigation with Ooo (same file with and without 3-byte mask): test_with_mask.xml -> opens corectly in OOo 2.0 (opens calc) test_without_mask.xml -> opens corectly in OOo 2.0 (opens calc) test_with_mask.xls -> opens the ASCII Filter Options dialog (BAD) test_without_mask.xls -> opens as plain text in Writer (BAD) The correct beahaviour would be the same (first). The mask it's optional and it's telling the text encoding of the file (so it's text). I think the bug is that after detecting that the file is text it doesn't test for XML format. Something it's clear: OOo it behaving different on different file extensions even if the content it's the same.
Here it's something interesting too related to encodings and XML: http://www.opentag.com/xfaq_enc.htm
Hi, regarding the three bytes , I've saved a file from within Excel 2003 as Excel XML and these file clearly does not have these bytes. So the question is how does these bytes get into your file ??? Frank
I'm testing on Microsoft Office Proffesional 2003 - Romanian Localized version so this could vary by the locale. Excel is accepting both kind of files and is keeping the mask on save.
I must admit, I was wrong and he is completly right. The so called Byte-Order-Mark (BOM) is part of the Unicode specification and might occure: http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G31703 "The BOM is not considered part of the content of the text." Therefore it is still well-formed XML. His link gave a good summary of the possible entities. Thanks for giving us this hint.
So there two different issues here one related to the opening of Excel XML files when they are not having XML extension (current issue) and a new one: "Incorrect recognizing of XML files with BOM (Byte-order-mask)" - issue #63077 I've created a new issue #63077 for the BOM problem. This issue must remain on Excel XML opening that will not work even it the BOM is resolved.
Hi Andreas, please have a look at this Issue again. Proceed as needed and close if nothing can be done. Frank
.
The current bug it's very important because it's bloking the transition to OOo 2.0 for many user. The number of files received by email in new microsoft formats is growing. Just to make ti more complicated few days ago I received another .doc generated with Microsoft Office 2003 that is not going to open in OOo in any way. Looking at the file it seamns to be an archived version of the XML format. I will attach it too.
Created attachment 36401 [details] DOC saved from Office 2003 - file format it's a sort of archived xml
AS->SUS: Please make sure that such BOM does not disturb detection/loading of such excel 2003 xml files. THX.
JA: the Excel 2003 file attached can be loaded in Calc if you manually select the filter "Microsoft Excel 2003 (*.xml). I don't see an issue that needs to be fixed.
IBTD. Though I agree that the techincal arguments of fst and others are correct this is not a technical but a usability issue. If people want to replace MSO by OOo they have some expectations we need to take into consideration. IMHO the format problems explained here are even more annoying than the UI differences we tried to remove in OOo2.0. Especially as in the current case where it should be easy to fix: If it is correct that "scalc.exe filename.xls" opens the file correctly the only thing that we need to do is registerung scalc.exe as application for "xls" files instead soffice.exe as we do now. This works with OOo2.0.2, IIRC it doesn't work in earlier versions. So this could be solved as a registration issue. @intersol: please try to verify this. Does it help to change the registration of "xls" files from soffice.exe to scalc.exe?
I confirm that setting associations of XLS to scalc.exe solve the problem reported. Also I will copy here an email received by me from Hoffmann Gisbert. Maybe some of you already received a copy. He has a point in it! Personally I'm about to loose other client because of this issue. -- begin cite I write to you because I do not want to create an issue for the same problen again (and because I would need a training before I could create an issue). Fact is that certain xls-files are correctly opened in Excel while with OO 2.x and SO8 they are opened with Writer. Details are described in the issues. Obviously this is a problem discussed since years (Issue 8967). The consequences seem to be not clear to you. The arguments of the developers are that MS does not show the correct behavior, but OO 2.x or SO 8 do. OO/SO should show the same behavior like MS, even if its wrong. No matter what the W3C XML specifications are. Your will not force MS and the provider of reporting tools (see below) to act as your want and as it would be correct in your opinion. We (and you should also) want to replace MS. All reporting-tools in the BI-market (so do Crystal Reports, Cognos ReportNet, Information Builders etc.) provide xls-output in the format described in the issues. This output opens Excel automatically and correct, so the user can further manipulate the output as needed. We can manage, that these tools start OO/SO instead of Excel. But then its opened in Writer or Writer/Web respectively. The user can try to cut the output in Writer, start a Calc-document and paste it into Calc, headers and footers seperate from the main page, and adust all cells. No user will accept this crippled way to work. It worked correct until SO 6/OO 1.x. For your information: I fighted more than 2 years now to replace MS with OO/SO. The result of your useless discussions for years now is that I lost the run and our company has to update more than 500 MS-Licences. Thats it. Good night OO/SO. --end of cite
Created attachment 37016 [details] sample document generated by common web reporting tools.
The new file i've attached is generated using one of the well known reporting tools available. In this case the document will not open corectly with scalc.exe but it will open with soffice.exe as html. Opening it with scalc.exe will open an empty spreadsheet.
If you get an empty document from an HTML document by loading it in scalc.exe it is a bug in Calc. IIRC Calc by default uses the "WebQuery" filter, but that should make only minor differences. We should create a separate issue for this as it needs to be fixed in Calc. I will work on the other necessary changes.
As we meanwhile changed the "xls" registration from soffice.exe to scalc.exe this issue is fixed. Can you please verify this in a recent version of OOo?