Issue 107201 - loading large html file in scalc hangs
Summary: loading large html file in scalc hangs
Status: CLOSED FIXED
Alias: None
Product: Calc
Classification: Application
Component: open-import (show other issues)
Version: OOO310m9
Hardware: PC Windows Vista
: P3 Trivial with 1 vote (vote)
Target Milestone: 3.4.0
Assignee: kla
QA Contact: issues@sc
URL:
Keywords: oooqa, regression
Depends on:
Blocks:
 
Reported: 2009-11-25 12:20 UTC by b_ambrosius
Modified: 2017-05-20 10:30 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
large html file that doesn't load in scalc (7.83 MB, text/html)
2009-11-25 12:26 UTC, b_ambrosius
no flags Details
html test file (2.00 MB, text/html)
2011-02-25 10:06 UTC, tkramer3
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description b_ambrosius 2009-11-25 12:20:01 UTC
When I load a large html file in scalc, the program always crashes eventually.

The bug is not related to 7553 because my file doesn't contain linked graphics,
and I'm loading in scalc, not in swriter.

The large html file itself is allright; it loads in a few seconds in MS Excel
2002 (yes, 2002). I would have attached my file but it appears that I cannot do
that here.

I'm using OOO310m11 (build:9399) which is not in the pick list.
Comment 1 b_ambrosius 2009-11-25 12:26:56 UTC
Created attachment 66337 [details]
large html file that doesn't load in scalc
Comment 2 tkramer3 2009-11-26 14:31:30 UTC
Loading somewhat bigger HTML files in scalc is slow. Probably the file was very
big and the patience of the user ran out or  some OS timeout finished the
loading? There are more reports on slow loading of HTML files in scalc. Imho
some conversion process is very slow/clumsy, it would be a great advantage if
this was resolved. Also, cutting and "pasting special" HTML into an scalc table
is slow and if the file is big enough, it produces a stupid looking result. All
information of the end of the table is dropped in one cell.  
Comment 3 b_ambrosius 2009-12-01 03:21:16 UTC
tkramer asks: "Probably the file was very
big and the patience of the user ran out or  some OS timeout finished the
loading?"
That is possible. Please try to load the attached large html file, then you can
conform the issue.
Comment 4 tomwb 2009-12-02 01:26:57 UTC
b_ambrosius:

I could not get the file to open in Calc.  OOO310m19(build:9420) on Vista Sp2. 
I started the process to open the file in calc about 5:30 am.  When I arrived
home at 8 pm, the file still had not opened.(14.5 hours!)  I killed the process,
which was showing memory usage of over 300 meg on a 4 gig machine.  OO
Web/Writer will open it, but it takes a minute or so.  Also Firefox 3.5.5 needed
a minute or better to open it.

TomW
Comment 5 b_ambrosius 2009-12-03 08:25:32 UTC
Thanks TomW. You have confirmed the issue. Now, if you have MS Excel you can
also confirm that the file loads in 5 seconds or less, at least it does in my
Excel 2002(!) version.
Comment 6 Regina Henschel 2009-12-03 12:27:16 UTC
I cannot confirm it for DEV300 m65. What filter do you use?
Comment 7 b_ambrosius 2009-12-04 15:57:06 UTC
I'm not aware of any filter. I just load the html file with File.. Open (in
scalc), in the ooo version I mentioned in my first comment.
Comment 8 Regina Henschel 2009-12-04 20:05:20 UTC
If you do not use a filter, the html page will not open in Calc but in Writer/Web.
You find the filter in the open dialog. It is the drop-down list beneath the
file name field. Open it and scroll down to the filter "Web Page Query
(OpenOffice.org Calc) (*.html;*.htm)". It is the last one in the group of
spreadsheet filters. The filter "HTML Document (OpenOffice.org Calc)
(*.html;*.htm)" might work too.

Opening your attached file in Writer/Web lasts <1 minute here.
Comment 9 b_ambrosius 2010-09-18 11:08:16 UTC
Regina, 
Thank you for your comments. A long time has passed since your last comment,
which has its reasons. I'm back now.
I now use version 3.2.1, OOO320m18 (Build:9502). Dutch language. In that version
your solution does indeed load the html file in < 1 minute. However I cannot
load it in calc. While within scalc, I load the document with the filter
Webpagina's (*.html; *.htm, etc.), and it opens with Writer/Web. With the filter
HTML Document (OpenOffice.org Writer) it opens in Writer in < 1 minute, but it
takes several minutes before the application responds to a menu mouse click. In
both cases the markup (color) is lost.

I can only see these two filters for html documents. Perhaps I've misread your
instructions. If so, please advise.

So the issue still stands. I'm sorry to be using my old Excel 2002 because of this.

Comment 10 Regina Henschel 2010-09-18 12:15:30 UTC
Please look carefully at the list of file types. You will see some dashed lines,
which group the filters according module. You have to scroll down to the section
which contains the spreadsheet filters, beginning with "ODF spreadsheet". In
that section you will find the filter "Web Page Query
(OpenOffice.org Calc) (*.html;*.htm)". Please contact a mailinglist or forum to
get help using OOo.
Comment 11 b_ambrosius 2010-09-20 12:46:01 UTC
Regina, thank you for your patience with this stupid user who does not read the
documentation carefully, and for the promptness of you response. I had not
noticed that the box was scrollable. Now I've found "Web Page Query
(OpenOffice.org Calc) (*.html;*.htm)" [no dashed lines for grouping modules
however in my version: version 3.2.1, OOO320m18 (Build:9502). Dutch language.]
and the file IS loaded with a reasonable rendering in < 1 minute. Hurray for
ooo. I've also tried "HTML Document (OpenOffice.org Calc) (*.html;*.htm)" but
the results are worse (small columns).
One issue remains: I have some markup (font, different colored lines) in my html
file which is not rendered. Perhaps scalc cannot understand my css class
attribute for the <tr> element?
Comment 12 Regina Henschel 2010-09-20 19:32:33 UTC
The import of css is not implemented. The feature request is in issue 70981.
Perhaps you want to vote for it?

I have tested the filter "Web Page Query (OpenOffice.org Calc) (*.html;*.htm)"
now again. In OOo3.2.0 (OOo320m12) the file opens in abut 25 seconds. In
DEV300m87 it needs nearly 4 minutes to open. Set keyword regression.

The problems might be connected to issues 111579 and 96958?
Comment 13 b_ambrosius 2010-09-21 04:40:57 UTC
Regina, thanks. I have voted 70981. For me, the issue is closed now.
Comment 14 niklas.nebel 2011-02-11 18:14:51 UTC
reassigning
Comment 15 niklas.nebel 2011-02-11 18:16:14 UTC
The original problem in 3.1 was because of issue 86650.

The performance regression after 3.2 was caused by the fix for issue 48303. It's
now fixed again in CWS "dr78".
Comment 16 tkramer3 2011-02-25 10:06:11 UTC
Created attachment 75912 [details]
html test file

I tested this again in Scalc with the file attached, with Ooo version 3.3, under Debian Linux AMD64. The loading is still slow (it took about 45 seconds), and at the end of the Scalc document, after import you can see the data at the end of the file is loaded all in one cell. Under Windows, with MSO, this file loads in a few seconds.
Comment 17 niklas.nebel 2011-02-28 18:01:02 UTC
Reassigning to QA for verification.
tkramer3, this is all in a CWS build so far, not in 3.3. See http://wiki.services.openoffice.org/wiki/ChildWorkSpace for general information on child workspaces.
Comment 18 kla 2011-03-11 10:18:50 UTC
First, no crash occur anymore, and the "loading time" is for me more than acceptable -> verified