Issue 33331 - excel to html conversion
Summary: excel to html conversion
Status: CLOSED NOT_AN_OOO_ISSUE
Alias: None
Product: App Dev
Classification: Unclassified
Component: api (show other issues)
Version: 3.3.0 or older (OOo)
Hardware: All Linux, all
: P4 Trivial
Target Milestone: ---
Assignee: stephan.wunderlich
QA Contact: issues@api
URL:
Keywords:
: 33282 (view as issue list)
Depends on:
Blocks:
 
Reported: 2004-08-24 06:15 UTC by pmakde
Modified: 2013-02-24 21:09 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
MS Excel spreadsheet with single sheet (46.50 KB, application/vnd.ms-excel)
2004-08-24 06:19 UTC, pmakde
no flags Details
The html output for excel spreadsheet(appenda.xls) (25.22 KB, text/html)
2004-08-24 06:20 UTC, pmakde
no flags Details
html file as it is produced by excel (31.43 KB, text/html)
2004-08-24 13:00 UTC, stephan.wunderlich
no flags Details
XHTML export filter (state before SO8 EA) (70.71 KB, application/x-compressed)
2004-08-24 16:04 UTC, svante.schubert
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description pmakde 2004-08-24 06:15:43 UTC
I am loading & converting MS Excel spreadsheet ( containing only one sheet )to 
html using OpenOffice SDK API.
But when OpenOffice saves it as html.. sheet name is missing ( html does not 
have H1 tag with sheet name)
This happens when excel spreadsheet has only one sheet.If a spreadsheet has 
multiple sheets, it properly generates all sheet names ( in overview )

pls find attached sheet and its html output.
Comment 1 pmakde 2004-08-24 06:19:02 UTC
Created attachment 17303 [details]
MS Excel spreadsheet with single sheet
Comment 2 pmakde 2004-08-24 06:20:03 UTC
Created attachment 17304 [details]
The html output for excel spreadsheet(appenda.xls)
Comment 3 stephan.wunderlich 2004-08-24 09:14:01 UTC
SW->pmakde: first of all this is no API issue since the same happens when you
use the saveAs entry in the file-menu. Then it is supposed to be that way, the
different headers are thought to navigate through the safed sheets ... you
wouldn't create a table of contents when you only have one chapter would you ?
... and last but not least this would even if it would be an issue for sure no
P1 ;-) ... I set it to invalid because I don't think that it is an issue at all
Comment 4 pmakde 2004-08-24 11:57:13 UTC
I think you didnt get the point. A sheet name in a spreadsheet ( even if it has
only single sheet) carries some information. It gives some information about
contents. I think we just cant ignore that part. i think lossing any data from 
the source document should be a big issue.

I checked the same with Microsoft Excel ( save as web page ) and the html
ouput had the sheet name information.

Comment 5 stephan.wunderlich 2004-08-24 12:59:55 UTC
SW->pmakde: mmm I just opened your document in excel and saved it as webpage ...
no sheet name visible when I open it in a browser :-( ... I attached the saved file.
Comment 6 stephan.wunderlich 2004-08-24 13:00:52 UTC
Created attachment 17313 [details]
html file as it is produced by excel
Comment 7 pmakde 2004-08-24 13:25:44 UTC
I can see sheet name in your html output file. If you open it with notepad or 
something, you can see the html source has the sheet name. 
The sheet name data is there in the html souce and its not lost by converting 
xls to html file.

I mean we can generate Sheet Name as part of the output ( as some <H1> tag like 
that ).

Comment 8 svante.schubert 2004-08-24 14:25:37 UTC
In our OOo XML document we have the spreadsheet name saved as an
attribute of the table:

/office:document/office:body/table:table/@table:name

How do suppose to map it to XHTML tables?

XHTML is descibed in 'xhtml1-strict.dtd' as:

<!ELEMENT table (caption?, (col* | colgroup*), thead?, tfoot?, (tbody+ |
tr+))>
<!ATTLIST table
   %attrs;
   summary %Text; #IMPLIED
   width %Length; #IMPLIED
   border %Pixels; #IMPLIED
   frame %TFrame; #IMPLIED
   rules %TRules; #IMPLIED
   cellspacing %Length; #IMPLIED
   cellpadding %Length; #IMPLIED

>

The name is NOT the summary, which is used by further informations.
Remember the OpenOffice.org XML format in general contains more
informations than (X)HTML.
Every transformation from OOo XML to (X)HTML is therefor a
transformation loss transformation (filter).
We might give it out as an comment in the XHTML export filter, which is
useable since StarOffice7.

Remember that the first rule of a filter is to keep the exported document as
close to the original as possible to make a round-trip (reloading) possible.
Inserting more elements as Headings e.g. <h1> would break this rule.
Comment 9 pmakde 2004-08-24 15:24:50 UTC
If we have sheet name data in some xml file.. we can as well generate it
in the html file. We can always put it in such way that reverting back should
not be an problem. Anyway we are doing that for a document with multiple sheets.
In my application I need the sheet name ( its important for me) as part of the 
html output.
thanks again

Comment 10 svante.schubert 2004-08-24 16:04:03 UTC
Created attachment 17318 [details]
XHTML export filter (state before SO8 EA)
Comment 11 svante.schubert 2004-08-24 16:05:27 UTC
> If we have sheet name data in some xml file.. we can as well generate it
in the html file. 
We could, but there is no adequate HTML node for putting it into. A spreadsheet
is a simple table and the name of a table is not shown in HTML.

> We can always put it in such way that reverting back should
not be an problem. 
It is, as there is no element/attribute (in general a node) for it in HTML. This
is what I tried to explain you earlier.

> Anyway we are doing that for a document with multiple sheets.
Yes, and I personally dislike it very much. As it alters the orginal document by
adding information, which were before not existing.

> In my application I need the sheet name ( its important for me) as part of the 
html output.
I can offer you only a compromise. Install yourself a new version of OpenOffice
(e.g. 1.1.2). Add under Tools -> XML Settings->NEW a new XSLT transformation,
which you find attached as ZIP. 
- Name might be "SO8 EA XHTML export"
- Choose Application "Calc"
- Name of Filetype "XHTML1.0"
- File Extension "xhtml"

Under Label just put into XSLT for export the path of the unzipped file of
ooo2xhtml.xsl.

Via File->export you now can export as XHTML, where the table:name is written in
a comment.

But you going to have problems if you try to import it again. Don't forget it is
an export not a save option.
Comment 12 svante.schubert 2004-08-24 16:11:38 UTC
Last note: 
1) You need to install a JRE/JDK 1.4, as it's XSLT engine use it's optional feature.
2) If you install the new OOo 1.1.2, don't forget to enable the optional "XSLT
Sample
Filter" during setup.


Comment 13 svante.schubert 2004-08-24 18:24:57 UTC
Just put myself on CC
Comment 14 bettina.haberer 2004-08-25 09:29:59 UTC
*** Issue 33282 has been marked as a duplicate of this issue. ***
Comment 15 pmakde 2004-08-25 10:16:41 UTC
thanks sus. I could get xml file based on your filter settings. But
I need to generate HTML output ( with sheet name ). I am using Java OO SDK API
calls to do this. is there any solution to this? thank again for your help.
Comment 16 svante.schubert 2004-08-25 11:42:52 UTC
So far so good, you receive XML.
But you should receive XHTML with the stylesheet name as comment included (I
added it for you in the attached stylesheets).

Anyway, we shouldn't talk about this in a RESOLVED INVALID issue - in an issue
anyway - and switch to a newsgroup as openoffice.xml.dev, when you have problems
using the XSLT transformation via OOo 1.1.2 or an api related group if you do
not know how to access this filter via API.
By this others might take advantage of your questions as well.

I will close this isse so far as there won't be any futher fix due to the
reasons mentioned above.
Comment 17 pmakde 2004-08-25 15:31:17 UTC
I dont see any comment included in xhtml file. Am i missing something here.
thanks 
Comment 18 svante.schubert 2004-08-25 15:50:42 UTC
yes, you missed something somehow.
I installed the XSLT stylesheets from the zip on another computer with
StarOffice7 pp3 doing the procedure I written and it works fine, again.

Creating valid strict XHTML1.0 containing the following:

<table border="0" cellspacing="0" cellpadding="0" class="ta1">
<!--@table:name=SIGNATUR_SHEET-->
<colgroup>
Comment 19 pmakde 2004-08-25 16:03:40 UTC
sorry i didnt put my question properly. I wanted to know, how can i get HTML
file from the XHTML file ( generated using new filter)
Comment 20 svante.schubert 2004-08-26 15:07:31 UTC
The question is still ambigious, do you want to know how to reimport the XHTML
into OOo?
This would be done with a XHTML import filter, which is currently not provided.
But im my opinion it would be the wrong approach anyway, as you better should
work top-down to the XHTML. It means you would only edit the Office document and
export it, when it is ready to XHTML. Otherwise you might loose information in HTML.

If you still have questions, please post them on the appropriate mailing list:
http://www.openoffice.org/mail_list.html