Issue 62492 - Incorrect opening of XLS files using Excel 2003 XML format
Summary: Incorrect opening of XLS files using Excel 2003 XML format
Status: RESOLVED FIXED
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 2.0.2
Hardware: PC Windows, all
: P4 Trivial with 3 votes (vote)
Target Milestone: ---
Assignee: Mathias_Bauer
QA Contact: issues@sc
URL:
Keywords:
Depends on: 63077
Blocks:
  Show dependency tree
 
Reported: 2006-02-23 17:46 UTC by intersol
Modified: 2013-08-07 15:13 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample XLS 2003 file that is not opening well by default (3.15 KB, application/vnd.ms-excel)
2006-02-23 17:58 UTC, intersol
no flags Details
DOC saved from Office 2003 - file format it's a sort of archived xml (13.52 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2006-05-11 13:47 UTC, intersol
no flags Details
sample document generated by common web reporting tools. (21.75 KB, application/vnd.ms-excel)
2006-06-08 22:30 UTC, intersol
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description intersol 2006-02-23 17:46:46 UTC
If you try to open some XLS files created in Excel 2003 format (the files are
plain XML files with 3 more bytes before first tag - EF BB BF) the suite will
open them in Writer or open them in Calc with the tab separated filter dialog.

The problem seamn to be inside the component that decide who will open the
document because opening the same file with scalc.exe -o filename.xls will work
opening the XLS file without problems (only if you have JRE installed - it
seamns that this format is parsed by a filter writen in java)

I would put a P5 on it but some will yell. I think that is maybe the most
important bug of 2.0 release. Just wonder how hot are support lines with users
calling that they received a XLS document that is no longer opening and the
previous version 1.x work just working.

I think that this sould be fixed before 2.0.2 release. One workaround with no
visible effects was to change the association from soffice.exe to scalc.exe for
xls files like below:

[code]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\OpenOffice.org.xls\shell\open\command]
@="\"C:\\Program Files\\OpenOffice.org 2.0\\program\\scalc.exe\" -o \"%1\""
[/code]
Comment 1 intersol 2006-02-23 17:51:37 UTC
I don't know if this is a side effect or another bug but the file is still
locked even the close of the XML document. Quiting quick starter is unlocking
the file but this should not be normal behaviour.

More information about the submitted bug can be found on
http://www.oooforum.org/forum/viewtopic.phtml?p=129135#129135
Comment 2 intersol 2006-02-23 17:58:49 UTC
Created attachment 34417 [details]
Sample XLS 2003 file that is not opening well by default
Comment 3 Olaf Felka 2006-03-07 08:47:31 UTC
Something for you? Please have a look.
Comment 4 frank 2006-03-07 09:17:23 UTC
Hi,

the described behaviour is not a bug. The file attached to the Issue isn't a
valid Excel Binary file as the file extension sugests. So Calc is looking into
the file and foinds that it is a text file and now switches over to writer which
import the file correctly. Using the correct filter or extension solves the problem.

The problem with the file locking has to be examined by the framework team.

Frank
Comment 5 Olaf Felka 2006-03-07 09:25:54 UTC
File locking: issue 21747

*** This issue has been marked as a duplicate of 21747 ***
Comment 6 Olaf Felka 2006-03-07 09:26:07 UTC
dupe
Comment 7 intersol 2006-03-07 09:46:26 UTC
This clearly not a duplicate of issue 21747 as anyone can read it's description
and it's not resolved.

Microsoft Office is writing XLS files in both formats: binary and XML and if
OpenOffice.org is supposed to load XLS files It must be able to open them.

This is why I think this issue must have a high priority because it fully make
the opening of some XLS files imposible for normal user (not talking about guru
kind).

Normal user expects that an XLS file it's a spreadsheet and it doesn't have to
know what kind of spreadsheets it contains - all it wants to do is doble click
on the XLS in order to open the file in his spreadsheet editor.

The number of this kind of files (XLS in XML format) is keep growing and we give
support to hundreds of users.
Comment 8 Olaf Felka 2006-03-07 09:56:44 UTC
Please read carefully: File locking is duplicate to issue 21747. The attached
file is invalid as fst has described.
For upcomming issue you would like to write: Please describe only one problem in
one issue.

*** This issue has been marked as a duplicate of 21747 ***
Comment 9 Olaf Felka 2006-03-07 09:56:58 UTC
reclosed
Comment 10 intersol 2006-03-07 10:22:55 UTC
I think I read at least what I wrote - the description of the issue has nothing
to do with the file-locking issue. 

If you read more carefully you'll see that the first comment was just referring
to the filelocking bug so THIS ISSUE IS NOT THE DUPLICATE of the file locking bug. 

Sorry for this open-close ....maybe somebody is making pression on issue closing :)
Comment 11 Olaf Felka 2006-03-07 10:33:46 UTC
So if this issue has nothing to do with the file locking problem someone of the
spreadsheet team should decide if it would make sense to have this registry key:
[code]
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\OpenOffice.org.xls\shell\open\command]
@="\"C:\\Program Files\\OpenOffice.org 2.0\\program\\scalc.exe\" -o \"%1\""
[/code]
Comment 12 intersol 2006-03-07 10:44:34 UTC
This was my suggested workaround but I don't know if changing the installer in
order to associate XLS directly to scalc.exe it's the correct solution. 

I think that this bug must be treated with special care: it can be hidden
including my workaround in the installer OR by chanigng the eay the suite is
detecting the file type. 

I think the best solution would be to do both of them. This way we'll be sure
that any .XLS file would be open by Calc.
Comment 13 frank 2006-03-07 11:05:03 UTC
So I close this Issue as invalid.

The file extension XLS is for Excel binary files and therefore a Excel XML file
isn't a valid xls file. Changing the registry code will work in this case but
not in all cases. The filter detection is looking into the file and did not find
a Excel binary header, so the filter dfetection looks further and detects a text
file. So it opens it as text in writer. If you use the correct file extension
.xml for Excel 2003 XML files, the document will allways open in Calc.

So this Issue isn't a bug and therrefore closed invalid.

Frank
Comment 14 frank 2006-03-07 11:05:17 UTC
closed invalid
Comment 15 intersol 2006-03-07 13:13:41 UTC
Let me be more specific: here are the real facts:

The submited file it's 100% valid Microsoft Office Spreadsheet in XML format
WITH and additional 3 bytes header before the first XML tag. We know that this
file was generated by MS Office.

Microsoft office not only open this file but also keep it's header (0xEF 0xBb
0xBF). This is working with ANY EXTENSION - even if you use XLS or XML.

Now let's investigate OpenOffice.org behavior:
Case 1: File > Open > test.XML in SCALC.EXE works.
Case 2: File > Open > test.XLS in SCALC.EXE does not work.

So I think there is a bug in filter detection algoritm. It should work like this:

if recognize_binary_format()
{
  ... open with binary filter
}
else /* not known binary */
{
   if (recognize_XML_format())
   {
      ... open using detected filter
   }
   else if is_text_only()
   {
      ... open as text in writer
   }
   else
   {
      ... message: unrecognized binary format
   }


XML file extension has not any meaning for the normal user - the file can be a
HTML page, spreadsheet, configuration or any king of file. People must need to
know what application is going to open their files in order to act.

Normal people is not opening formated files (like XML) in order to edit them as
plain text. They expect to edit the data from them not the file. So OOo must
(not should) detect corectly the file format. 

As any power user who blamed MS for hiding the file extension from files any
application must be able to open all suported files without requesting a
specific extension for them. Extension is only informative and It can't
guarantee you that the data inside it's valid.

Currently hundreds (maybe more) developers are generating spreadsheet reports in
microsoft XML format because it's very simple and it's just plain uncompressed
xml. They are generating those files with XLS extension in order to make users
to open them in a spreadsheet application BUT how nice OpenOffice.org it's
smarter It can't accept opening an spreadsheet file (in XML format) stored in a
file with XLS extension.

If all file formats used were XML how do you expect for a user to make his
computer open those files with a specific aapplication. File extension
association it's a user configurable option, but recognizind the file format it
the job of the application.

The problem should stay open and must be assigned to the filter detection.

Comment 16 bobharvey 2006-03-08 10:56:37 UTC
I have to agree.
If it is a major objective of OOo to provide file exchange with microsoft
formats, then no matter how stupid and cavelier microsoft may be in the
allocation of extensions then OOo should do what the user expects
Comment 17 frank 2006-03-09 10:55:33 UTC
Hi,

Excel is a single Application and therefore handels such file by opening them
into itself. OOo Calc is part of an integrated Office suite sharing a lot of
code and especially the file open dialogs. So the decission is made by the
filterdetection how a file is opened. The first point to look is the extension
and based on that the header of the file. So the filterdetection has no choice
as to define the XML file named as XLS as text. This is how it works. 

Please have a look at Issue 8967 which basically is a double to this one.

Frank

*** This issue has been marked as a duplicate of 8967 ***
Comment 18 frank 2006-03-09 10:55:50 UTC
closed double
Comment 19 intersol 2006-03-09 11:28:56 UTC
I agree with that it's a good thing that OOo share the opendialog code for all
applications.

You are wrong about this "...filterdetection has no choice as to define the XML
file named as XLS as text. This is how it works.". If the filter detection parse
the header it can parse XML header too in order to detect correct file type.

Because filterdetection can receive any kind of files it must be able to do a
good decision. 

The case of Excel XML saved with XLS extension it's one of the cases that can be
resolved without breaking current behavior for current files.

The "linked to" issue it's a very close relative of this but have a big
difference: it's about CSV (tab separated) and plain TXT -> it can't be resolved
 in order to satisfy all.

Should I reopen this issue?
Comment 20 intersol 2006-03-10 10:40:25 UTC
Based on the comments on the issue 18228 I conclude that this is a bug in
filterdetection. So the issue must be opened.

In the issue 18228 it's clearly written that the filter detection it's detecting
the file reading his header and the file extension it's not important.

A new better description of this bug is: The filterdetection it's not detecting
corectly Excel XML file format if the extension of the file it's XLS or other.
Comment 21 andreas.schluens 2006-03-10 12:25:39 UTC
Sorry - but these are the problems of this file:
a) It's not xls - because it's not a binary file.
b) It's not XML - because it contains 3 binary signs.
c) If MS generate such files realy - might be we have to react and try to workaround 
this "bug" of MS.

The only bug I would accept here is: if these 3 bytes was removed from the file and 
the extension was not set right then this file is loaded into the writer a plaint text 
everytime. Thats why our txt filter has no real detection and cant differen between 
ASCII - all UniCode formated files - and even binaries. This cant be solved realy 
because nobody cant write such detection because it has to check all 64 K 
CodePages existing of this worl. Which by the way will affect detection time and 
will increase it up to "never ending" .-)
The only workaround for this: ask the text filter explicitly at the end of the detection 
process so it can overrule any other detection service. That would help to solve 
other detection problems as well.
I will file such issue to myself and try to fix it for OOo 2.0.3. (number of this new 
issue will be set as depedency to this isseu here later).
Comment 22 svante.schubert 2006-03-10 12:34:28 UTC
Hello intersol,

let us gather some facts concerning this file format of your example document.

1) Although it has the suffix XLS it is clearly no binary XLS
2) When you take a look into the W3C XML specification
http://www.w3.org/TR/2004/REC-xml-20040204/#sec-well-formed

You see that a well-formed XML document starts with a prolog
[1]   	document ::=  prolog element Misc*
[22]   	prolog	   ::=   XMLDecl? Misc* (doctypedecl Misc*)?
[23]   	XMLDecl	   ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

As you see for a well-formed XML document, there are no arbitrary prefix before
the prolog. A document that has these bytes is therefore not well-formed and
thereby can not be valid at all.

Although aside of this three bytes the document would be well-formed and most
likely valid. 
Therefore you have bring up an interesting bug of Microsoft Office, that we
should take into account to ease the life of StarOffice / OpenOffice.org users.
Comment 23 intersol 2006-03-10 12:46:23 UTC
Finally some really positive feedback. Here it's what if discovered right now:
The header it's something very common: it's a byte order mask so this file it's
a valid XML.

Take a look at http://en.wikipedia.org/wiki/Byte_Order_Mark

From what i know this mask it's supported by XML specification. 

PS. I've tested with and without masks and Excel it's accepting the both files.
The result of my investigation with Ooo (same file with and without 3-byte mask):

test_with_mask.xml    -> opens corectly in OOo 2.0 (opens calc)
test_without_mask.xml -> opens corectly in OOo 2.0 (opens calc)
test_with_mask.xls    -> opens the ASCII Filter Options dialog (BAD)
test_without_mask.xls -> opens as plain text in Writer (BAD)

The correct beahaviour would be the same (first). The mask it's optional and
it's telling the text encoding of the file (so it's text). I think the bug is
that after detecting that the file is text it doesn't test for XML format.

Something it's clear: OOo it behaving different on different file extensions
even if the content it's the same.



Comment 24 intersol 2006-03-10 12:51:30 UTC
Here it's something interesting too related to encodings and XML:
http://www.opentag.com/xfaq_enc.htm
Comment 25 frank 2006-03-10 13:06:48 UTC
Hi,

regarding the three bytes , I've saved a file from within Excel 2003 as Excel
XML and these file clearly does not have these bytes. So the question is how
does these bytes get into your file ???

Frank
Comment 26 intersol 2006-03-10 13:44:21 UTC
I'm testing on Microsoft Office Proffesional 2003 - Romanian Localized version
so this could vary by the locale. 

Excel is accepting both kind of files and is keeping the mask on save.
Comment 27 svante.schubert 2006-03-10 17:47:17 UTC
I must admit, I was wrong and he is completly right.
The so called Byte-Order-Mark (BOM) is part of the Unicode specification and
might occure:
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G31703

"The BOM is not considered part of the content of the text."

Therefore it is still well-formed XML.
His link gave a good summary of the possible entities.

Thanks for giving us this hint.
Comment 28 intersol 2006-03-12 12:37:38 UTC
So there two different issues here one related to the opening of Excel XML files
when they are not having XML extension (current issue) and a new one: "Incorrect
recognizing of XML files with BOM (Byte-order-mask)" - issue #63077

I've created a new issue #63077 for the BOM problem. This issue must remain on
Excel XML opening that will not work even it the BOM is resolved.
Comment 29 frank 2006-05-11 12:32:17 UTC
Hi Andreas,

please have a look at this Issue again. Proceed as needed and close if nothing
can be done.

Frank
Comment 30 frank 2006-05-11 12:32:50 UTC
.
Comment 31 intersol 2006-05-11 13:42:33 UTC
The current bug it's very important because it's bloking the transition to OOo
2.0 for many user. The number of files received by email in new microsoft
formats is growing.

Just to make ti more complicated few days ago I received another .doc generated
with Microsoft Office 2003 that is not going to open in OOo in any way. Looking
at the file it seamns to be an archived version of the XML format. I will attach
it too.
Comment 32 intersol 2006-05-11 13:47:26 UTC
Created attachment 36401 [details]
DOC saved from Office 2003 - file format it's a sort of archived xml
Comment 33 andreas.schluens 2006-05-18 14:16:31 UTC
AS->SUS: Please make sure that such BOM does not disturb detection/loading of such 
excel 2003 xml files. THX.
Comment 34 Joost Andrae 2006-06-08 15:50:07 UTC
JA: the Excel 2003 file attached can be loaded in Calc if you manually select
the filter "Microsoft Excel 2003 (*.xml). I don't see an issue that needs to be
fixed. 
Comment 35 Mathias_Bauer 2006-06-08 16:30:45 UTC
IBTD.
Though I agree that the techincal arguments of fst and others are correct this
is not a technical but a usability issue.

If people want to replace MSO by OOo they have some expectations we need to take
into consideration. IMHO the format problems explained here are even more
annoying than the UI differences we tried to remove in OOo2.0.

Especially as in the current case where it should be easy to fix:

If it is correct that "scalc.exe filename.xls" opens the file correctly the only
thing that we need to do is registerung scalc.exe as application for "xls" files
instead soffice.exe as we do now. This works with OOo2.0.2, IIRC it doesn't work
in earlier versions.

So this could be solved as a registration issue.

@intersol: please try to verify this. Does it help to change the registration of
"xls" files from soffice.exe to scalc.exe? 
Comment 36 intersol 2006-06-08 21:31:01 UTC
I confirm that setting associations of XLS to scalc.exe solve the problem reported.

Also I will copy here an email received by me from Hoffmann Gisbert. Maybe some
of you already received a copy. He has a point in it! Personally I'm about to
loose other client because of this issue.

-- begin cite
I write to you because I do not want to create an issue for the same problen
again (and because I would need a training before I could create an issue). Fact
is that certain xls-files are correctly opened in Excel while with OO 2.x and
SO8 they are opened with Writer. Details are described in the issues. Obviously
this is a problem discussed since years (Issue 8967). The consequences seem to
be not clear to you.

The arguments of the developers are that MS does not show the correct behavior,
but OO 2.x or SO 8 do. OO/SO should show the same behavior like MS, even if its
wrong. No matter what the W3C XML specifications are. Your will not force MS and
the provider of reporting tools (see below) to act as your want and as it would
be correct in your opinion. We (and you should also) want to replace MS.

All reporting-tools in the BI-market (so do Crystal Reports, Cognos ReportNet,
Information Builders etc.) provide xls-output in the format described in the
issues. This output opens Excel automatically and correct, so the user can
further manipulate the output as needed. We can manage, that these tools start
OO/SO instead of Excel. But then its opened in Writer or Writer/Web
respectively. The user can try to cut the output in Writer, start a
Calc-document and paste it into Calc, headers and footers seperate from the main
page, and adust all cells.  No user will accept this crippled way to work.  It
worked correct until SO 6/OO 1.x.

For your information: I fighted more than 2 years now to replace MS with OO/SO.
The result of your useless discussions for years now is that I lost the run and
our company has to update more than 500 MS-Licences. Thats it.  Good night OO/SO.
--end of cite
Comment 37 intersol 2006-06-08 22:30:59 UTC
Created attachment 37016 [details]
sample document generated by common web reporting tools.
Comment 38 intersol 2006-06-08 22:35:21 UTC
The new file i've attached is generated using one of the well known reporting
tools available. In this case the document will not open corectly with scalc.exe
but it will open with soffice.exe as html. 

Opening it with scalc.exe will open an empty spreadsheet.
Comment 39 Mathias_Bauer 2006-06-29 16:41:56 UTC
If you get an empty document from an HTML document by loading it in scalc.exe it
is a bug in Calc. IIRC Calc by default uses the "WebQuery" filter, but that
should make only minor differences.

We should create a separate issue for this as it needs to be fixed in Calc. I
will work on the other necessary changes.
Comment 40 Mathias_Bauer 2008-01-11 10:40:05 UTC
As we meanwhile changed the "xls" registration from soffice.exe to scalc.exe
this issue is fixed. Can you please verify this in a recent version of OOo?