Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing
|Summary:||Opening a file (except office files) as encoded text, crashes OOo|
|Component:||editing||Assignee:||AOO issues mailing list <issues>|
|Status:||CONFIRMED ---||QA Contact:|
|Priority:||P2||CC:||amy2008, andreas.martens, frank.meies, issues, kpalagin, orw|
|Issue Type:||DEFECT||Latest Confirmation in:||---|
Description bstqc_caozy 2006-11-20 03:25:16 UTC
1. open a new writer document 2. insert a file with music format via "Insert|File...",such as a *.wma file 3. in the dialog box of "ASCII Filter Options", select "Unicode" for Character Set item, select "Arial" for Default fonts item, select "chinese(simplified)" for Language item 4. click the "OK" button 5. =>openoffice.org crashes same thing will happen while inserting other format of music file, such as .mp3, .ram, .rm or .swf formats.
Comment 1 michael.ruess 2006-11-20 15:41:06 UTC
MRU->HBRINKM: open a wmv or wma as explained with mentioned options in "Encoded text" dialog -> OO will end without an error message.
Comment 2 Mathias_Bauer 2008-01-11 14:45:36 UTC
Comment 3 Mathias_Bauer 2008-04-25 17:29:56 UTC
Comment 4 Mathias_Bauer 2008-06-23 15:04:29 UTC
Seems to be heap corruption caused by endless loop (at least in my try with an mp3 file). I get a lot of assertions "What a guess!".
Comment 5 Regina Henschel 2008-12-25 22:26:23 UTC
*** Issue 97575 has been marked as a duplicate of this issue. ***
Comment 6 kpalagin 2008-12-26 12:37:56 UTC
Andreas, are we on track for 3.1 with this issue? Regards, KP.
Comment 7 amy2008 2008-12-30 05:30:34 UTC
Insert a pdf file, can also induce the same problem. Because opening a file (except office files) can make it
Comment 8 oleghitekschool 2009-01-15 06:44:15 UTC
The issue has been reprodused on PC, WIN XP on version DEV300m37.
Comment 9 andreas.martens 2009-01-19 10:30:19 UTC
ama->KP: no, we are not on track for OOo3.1 Due to our workload and resources (development as well as QA) I've to retarget this issue to OOo3.2.
Comment 10 Mathias_Bauer 2009-02-02 13:15:39 UTC
Comment 11 Mathias_Bauer 2009-05-07 18:40:56 UTC
I think that you can always bring OOo to crash or loop by insering useless content as "text". The question is, how to deal with it? How can we detect if something is text or just a bunch of bytes, especially if you don't know the encoding. Some options: - repeat type detection after the user has entered an encoding and then reject all files that still contain zero bytes or have lines with more than n characters - try a language guessing and reject all files that can't be detected; this way files could become rejected just because we don't check for their language, though we check for a lot of them - the assertions show us that at least deep below, in the text formatter, we detect that something is wrong, so at least here we could stop. But it seems that this is very late (insertion already done) and it might be tricky to recover from the detected error without creating new problems. The first two options have the advantage that they try to reject files before they are actually inserted So far I think the first option is the best one to start with.
Comment 12 kpalagin 2009-05-08 06:51:13 UTC
IMHO, fixing the code that crashes is the most viable option. Trying to avoid passing "bad" data to code that would crash on it is an band- aid approach.
Comment 13 Mathias_Bauer 2009-05-08 07:19:56 UTC
Andreas, Frank, Oliver, do you see any chance to recover from the situation? It seems that the TextGuess is confused and OOo fails to recognize portions in the text. The result is unpredictable, and basically it is impossible to format such "text" at all. IMHO the only way to prevent that is detecting the error as early as possible and discard this "document". If it isn't possible before the "text" is inserted, we need a health check or something similar that can find out fast and easily if the "text" can be formatted at all.
Comment 14 Mathias_Bauer 2009-05-08 15:36:07 UTC
So here's a proposal. We will extend the filter dialog with a preview, like in Calc where you always can see how the first few rows will look with the current settings. We will have a text engine in that dialog that displays the first few thousand characters of the document using the current settings. This will enable us to test the "text" in a "sandbox". Even if the user presses "OK" though he just sees garbage, we can still run a detection over the previewed text and reject it in case it does not match our criteria. Once the text is in the Writer core it is close to impossible to handle all the problems that might appear deep inside the text formatting or the VCL text output caused by text garbage.