Issue 71757

Summary: Opening a file (except office files) as encoded text, crashes OOo
Product: Writer Reporter: bstqc_caozy <caozy>
Component: editingAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Trivial    
Priority: P2 CC: amy2008, andreas.martens, frank.meies, issues, kpalagin, orw
Version: OOo 2.0.4Keywords: crash
Target Milestone: ---   
Hardware: PC   
OS: Windows XP   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---

Description bstqc_caozy 2006-11-20 03:25:16 UTC
1. open a new writer document
2. insert a file with music format via "Insert|File...",such as a *.wma file
3. in the dialog box of "ASCII Filter Options", select "Unicode" for Character 
Set item, select "Arial" for Default fonts item, select "chinese(simplified)" 
for Language item
4. click the "OK" button
5. =>openoffice.org crashes

same thing will happen while inserting other format of music file, such 
as .mp3, .ram, .rm or .swf formats.
Comment 1 michael.ruess 2006-11-20 15:41:06 UTC
MRU->HBRINKM: open a wmv or wma as explained with mentioned options in "Encoded
text" dialog -> OO will end without an error message.
Comment 2 Mathias_Bauer 2008-01-11 14:45:36 UTC
target 3.0
Comment 3 Mathias_Bauer 2008-04-25 17:29:56 UTC
taking over
Comment 4 Mathias_Bauer 2008-06-23 15:04:29 UTC
Seems to be heap corruption caused by endless loop (at least in my try with an
mp3 file). I get a lot of assertions "What a guess!".
Comment 5 Regina Henschel 2008-12-25 22:26:23 UTC
*** Issue 97575 has been marked as a duplicate of this issue. ***
Comment 6 kpalagin 2008-12-26 12:37:56 UTC
Andreas,
are we on track for 3.1 with this issue?
Regards,
KP.
Comment 7 amy2008 2008-12-30 05:30:34 UTC
Insert a pdf file, can also induce the same problem.
Because opening a file (except office files) can make it
Comment 8 oleghitekschool 2009-01-15 06:44:15 UTC
The issue has been reprodused on PC, WIN XP on version DEV300m37.
Comment 9 andreas.martens 2009-01-19 10:30:19 UTC
ama->KP: no, we are not on track for OOo3.1

Due to our workload and resources (development as well as QA) I've to retarget
this issue to OOo3.2.
Comment 10 Mathias_Bauer 2009-02-02 13:15:39 UTC
Taking over
Comment 11 Mathias_Bauer 2009-05-07 18:40:56 UTC
I think that you can always bring OOo to crash or loop by insering useless
content as "text". The question is, how to deal with it? How can we detect if
something is text or just a bunch of bytes, especially if you don't know the
encoding. Some options:

- repeat type detection after the user has entered an encoding and then reject
all files that still contain zero bytes or have lines with more than n characters

- try a language guessing and reject all files that can't be detected; this way
files could become rejected just because we don't check for their language,
though we check for a lot of them

- the assertions show us that at least deep below, in the text formatter, we
detect that something is wrong, so at least here we could stop. But it seems
that this is very late (insertion already done) and it might be tricky to
recover from the detected error without creating new problems. The first two
options have the advantage that they try to reject files before they are
actually inserted

So far I think the first option is the best one to start with.
Comment 12 kpalagin 2009-05-08 06:51:13 UTC
IMHO, fixing the code that crashes is the most viable option. 

Trying to avoid passing "bad" data to code that would crash on it is an band-
aid approach.
Comment 13 Mathias_Bauer 2009-05-08 07:19:56 UTC
Andreas, Frank, Oliver, do you see any chance to recover from the situation? It
seems that the TextGuess is confused and OOo fails to recognize portions in the
text. The result is unpredictable, and basically it is impossible to format such
"text" at all. IMHO the only way to prevent that is detecting the error as early
as possible and discard this "document". If it isn't possible before the "text"
is inserted, we need a health check or something similar that can find out fast
and easily if the "text" can be formatted at all.
Comment 14 Mathias_Bauer 2009-05-08 15:36:07 UTC
So here's a proposal.

We will extend the filter dialog with a preview, like in Calc where you always
can see how the first few rows will look with the current settings.

We will have a text engine in that dialog that displays the first few thousand
characters of the document using the current settings. This will enable us to
test the "text" in a "sandbox". Even if the user presses "OK" though he just
sees garbage, we can still run a detection over the previewed text and reject it
in case it does not match our criteria.

Once the text is in the Writer core it is close to impossible to handle all the
problems that might appear deep inside the text formatting or the VCL text
output caused by text garbage. 
Comment 15 Marcus 2017-05-20 11:13:18 UTC
Reset assigne to the default "issues@openoffice.apache.org".