Apache OpenOffice (AOO) Bugzilla – Issue 107201
loading large html file in scalc hangs
Last modified: 2017-05-20 10:30:42 UTC
When I load a large html file in scalc, the program always crashes eventually. The bug is not related to 7553 because my file doesn't contain linked graphics, and I'm loading in scalc, not in swriter. The large html file itself is allright; it loads in a few seconds in MS Excel 2002 (yes, 2002). I would have attached my file but it appears that I cannot do that here. I'm using OOO310m11 (build:9399) which is not in the pick list.
Created attachment 66337 [details] large html file that doesn't load in scalc
Loading somewhat bigger HTML files in scalc is slow. Probably the file was very big and the patience of the user ran out or some OS timeout finished the loading? There are more reports on slow loading of HTML files in scalc. Imho some conversion process is very slow/clumsy, it would be a great advantage if this was resolved. Also, cutting and "pasting special" HTML into an scalc table is slow and if the file is big enough, it produces a stupid looking result. All information of the end of the table is dropped in one cell.
tkramer asks: "Probably the file was very big and the patience of the user ran out or some OS timeout finished the loading?" That is possible. Please try to load the attached large html file, then you can conform the issue.
b_ambrosius: I could not get the file to open in Calc. OOO310m19(build:9420) on Vista Sp2. I started the process to open the file in calc about 5:30 am. When I arrived home at 8 pm, the file still had not opened.(14.5 hours!) I killed the process, which was showing memory usage of over 300 meg on a 4 gig machine. OO Web/Writer will open it, but it takes a minute or so. Also Firefox 3.5.5 needed a minute or better to open it. TomW
Thanks TomW. You have confirmed the issue. Now, if you have MS Excel you can also confirm that the file loads in 5 seconds or less, at least it does in my Excel 2002(!) version.
I cannot confirm it for DEV300 m65. What filter do you use?
I'm not aware of any filter. I just load the html file with File.. Open (in scalc), in the ooo version I mentioned in my first comment.
If you do not use a filter, the html page will not open in Calc but in Writer/Web. You find the filter in the open dialog. It is the drop-down list beneath the file name field. Open it and scroll down to the filter "Web Page Query (OpenOffice.org Calc) (*.html;*.htm)". It is the last one in the group of spreadsheet filters. The filter "HTML Document (OpenOffice.org Calc) (*.html;*.htm)" might work too. Opening your attached file in Writer/Web lasts <1 minute here.
Regina, Thank you for your comments. A long time has passed since your last comment, which has its reasons. I'm back now. I now use version 3.2.1, OOO320m18 (Build:9502). Dutch language. In that version your solution does indeed load the html file in < 1 minute. However I cannot load it in calc. While within scalc, I load the document with the filter Webpagina's (*.html; *.htm, etc.), and it opens with Writer/Web. With the filter HTML Document (OpenOffice.org Writer) it opens in Writer in < 1 minute, but it takes several minutes before the application responds to a menu mouse click. In both cases the markup (color) is lost. I can only see these two filters for html documents. Perhaps I've misread your instructions. If so, please advise. So the issue still stands. I'm sorry to be using my old Excel 2002 because of this.
Please look carefully at the list of file types. You will see some dashed lines, which group the filters according module. You have to scroll down to the section which contains the spreadsheet filters, beginning with "ODF spreadsheet". In that section you will find the filter "Web Page Query (OpenOffice.org Calc) (*.html;*.htm)". Please contact a mailinglist or forum to get help using OOo.
Regina, thank you for your patience with this stupid user who does not read the documentation carefully, and for the promptness of you response. I had not noticed that the box was scrollable. Now I've found "Web Page Query (OpenOffice.org Calc) (*.html;*.htm)" [no dashed lines for grouping modules however in my version: version 3.2.1, OOO320m18 (Build:9502). Dutch language.] and the file IS loaded with a reasonable rendering in < 1 minute. Hurray for ooo. I've also tried "HTML Document (OpenOffice.org Calc) (*.html;*.htm)" but the results are worse (small columns). One issue remains: I have some markup (font, different colored lines) in my html file which is not rendered. Perhaps scalc cannot understand my css class attribute for the <tr> element?
The import of css is not implemented. The feature request is in issue 70981. Perhaps you want to vote for it? I have tested the filter "Web Page Query (OpenOffice.org Calc) (*.html;*.htm)" now again. In OOo3.2.0 (OOo320m12) the file opens in abut 25 seconds. In DEV300m87 it needs nearly 4 minutes to open. Set keyword regression. The problems might be connected to issues 111579 and 96958?
Regina, thanks. I have voted 70981. For me, the issue is closed now.
reassigning
The original problem in 3.1 was because of issue 86650. The performance regression after 3.2 was caused by the fix for issue 48303. It's now fixed again in CWS "dr78".
Created attachment 75912 [details] html test file I tested this again in Scalc with the file attached, with Ooo version 3.3, under Debian Linux AMD64. The loading is still slow (it took about 45 seconds), and at the end of the Scalc document, after import you can see the data at the end of the file is loaded all in one cell. Under Windows, with MSO, this file loads in a few seconds.
Reassigning to QA for verification. tkramer3, this is all in a CWS build so far, not in 3.3. See http://wiki.services.openoffice.org/wiki/ChildWorkSpace for general information on child workspaces.
First, no crash occur anymore, and the "loading time" is for me more than acceptable -> verified