Issue 96958 - Problem in importing big HTML files in Writer 3.0
Summary: Problem in importing big HTML files in Writer 3.0
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOo 3.0
Hardware: PC Windows Vista
: P3 Trivial with 7 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
Keywords: needmoreinfo, oooqa
Depends on:
Reported: 2008-12-05 16:06 UTC by rbattistoni
Modified: 2013-08-07 14:44 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---

HTML file with table (3.27 KB, text/plain)
2010-03-04 09:08 UTC, pawo509
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description rbattistoni 2008-12-05 16:06:39 UTC
Writer in OpenOffice 3.0 on Windows VISTA has many problems to open big HTML
files (> ~10 Mbyte). Sometime Writer hangs and the opening time is too much!

Writer in OpenOffice 2.4.1 doesn't have these kind of problems.
Comment 1 eric.savary 2008-12-05 16:53:42 UTC
10 Mb of pure (text) HTML????!!!

Please upload your document to a free Internet storage site and post the link here.
Comment 2 rbattistoni 2008-12-05 17:20:23 UTC
Ok. This is a rebuilded HTML of 15 Mbyte.  


Note: on this test file, Writer 3.0 doesn't hung but it use a lot of time to
open the test file. Once it has opened the file the navigation through the file
is very difficult. With another HTML test file Writer hangs, but I cannot give
you this HTML because it contains confidential information.

Comment 3 eric.savary 2008-12-05 23:36:28 UTC
Well, indeed it's a pure HTML document which is bigger than 16 Mb and has 338339
lines! No wonder that it takes time to load!

My tests have given that Firefox takes also very long to load it (after 9'30 I
interrupted the load process), MS Word displays an error in *.css reference and
loads only 6 pages of text.

I wanted to close as WONTFIX because it's an extreme case which simply reaches
the  limit of the software capacity. But, yes, OOo 2.4.1 loads it at least as
"fast" as Firefox.

@AMA: what have we changed between 2.4.1 and 3.0 in this area?
Comment 4 rbattistoni 2008-12-06 07:28:34 UTC
The reliablity of OOo 3.0 in opening this kind of files could be a serious
problem when you want to use OOo as a conversion server (via UNO). It's not a
problem of loading time only because in same cases OOo 3.0 halts and kill itself
(only for 3.0. in 2.4.1 and 2.4.2 it works).

My test file has some problem in the HTML format because it's not well formed
and doesn't reproduce perfectly the problem I had. I'll change the file and I'll
resend you ASAP.

I think it's normal that Firefox, as a browser, has some difficulty to open very
big file. But this shouldn't happen for a Text Editor. The first one is a
browser and has timeout limits, the second one should load big files too.

The message Word 2007 shows you on CSS missing is not a problem in a conversion
process using Word API. I don't want to promote Word vs. OOo battle but in this
case Word 2007 works perfectly as a conversion server and in the previous
version OOo loads very fast big HTML files (but I'd like to use OSS).

It seems that in the older version of OOo loads the file step by step because
after few seconds you can see the file loaded in Writer (< 3.0). Instead Writer
3.0 seems to load the *entire* file in memory before showing you.

Comment 5 pawo509 2010-03-04 09:07:26 UTC
I confirm the problem with importing large HTML files. I use OpenOffice 3.2
under Windows XP as a conversion service to PDF.

My file isn't so big like file mentioned above. It has about 10000 lines and its
size is about 388 KB. File contains table with 97 wors and 33 columns.
1. When I try to load this HTML file using UNO, OpenOffice hangs.
2. When I try to load this HTML file in Writer using File -> Open - OpenOffice
also hangs.

By "hangs" I mean using about 50% CPU time and approximately 100 MB of memory
(memory usage is going up in time of working) for several minutes - afer about 7
minutes I killed OpenOffice process.
Comment 6 pawo509 2010-03-04 09:08:23 UTC
Created attachment 68141 [details]
HTML file with table
Comment 7 pyrix 2010-06-25 00:19:35 UTC
I have experimented regarding this problem (Using OO 3.2.0 Build 9483 on 
Windows XP SP3 Box) and found that HTML files (with large tables) up to around 
200 KB seem to be OK but over this size they may load but when the Writer 
window is sized or maximized it hangs using up to 100% CPU. Large html tables 
seem to be the problem. (if the file does not have a large table it seems to be