Issue 126339 - Corrupted Files
Summary: Corrupted Files
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: save-export (show other issues)
Version: 4.1.1
Hardware: All OS X 10.10
: P5 (lowest) Normal with 1 vote (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-28 10:41 UTC by Steven Lott
Modified: 2016-03-12 00:16 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: 4.1.2
Developer Difficulty: ---


Attachments
Sample Corrupted file. (95.98 KB, application/vnd.oasis.opendocument.text)
2015-05-28 10:41 UTC, Steven Lott
no flags Details
Another Sample Corrupted FIle (95.98 KB, application/vnd.oasis.opendocument.text)
2015-05-28 10:42 UTC, Steven Lott
no flags Details
Another Corrupted File (87.96 KB, application/vnd.oasis.opendocument.text)
2015-05-28 10:44 UTC, Steven Lott
no flags Details
Sample file before editing and saving. (81.20 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-05-29 11:18 UTC, Steven Lott
no flags Details
Your docx has been converted without problem as ODT (47.43 KB, application/vnd.oasis.opendocument.text)
2015-05-29 13:55 UTC, oooforum (fr)
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description Steven Lott 2015-05-28 10:41:31 UTC
Created attachment 84759 [details]
Sample Corrupted file.

Corrupted file. This started as a .docx file, which was then saved as an .odt file.
Comment 1 Steven Lott 2015-05-28 10:42:40 UTC
Created attachment 84760 [details]
Another Sample Corrupted FIle

This is another corrupted file.
Comment 2 Steven Lott 2015-05-28 10:44:52 UTC
Created attachment 84761 [details]
Another Corrupted File

A second example of a corrupted file.
Comment 3 oooforum (fr) 2015-05-28 16:09:21 UTC
Provide a corrupted ODT document doesn't help.
Thanks to attach the original docx file.
Comment 4 Steven Lott 2015-05-29 11:18:07 UTC
Created attachment 84767 [details]
Sample file before editing and saving.

Sample file before corruption. Note that it was heavily edited, so there's almost no relationship between this file and the corrupted file. But it was requested.
Comment 5 oooforum (fr) 2015-05-29 13:55:05 UTC
Created attachment 84768 [details]
Your docx has been converted without problem as ODT

I don't reproduce.
a) Open the .docx 
b) Save as .odt
c) Close document
d) Open the .odt
Result: no corruption
Comment 6 Steven Lott 2015-05-30 11:22:22 UTC
That's not what I did. I'm not surprised that a simple, no-edit conversion happened to work.

Please actually look at the corrupt files. They don't load.

My sequence was this.

1) open the .docx
2) edit heavily.
3) save as .odt.

This leads to corruption. Please look at the corrupt files to see that they are corrupted.
Comment 7 oooforum (fr) 2015-06-01 06:30:30 UTC
Please provide more details about step 2.
What kind of "heavy" handling did you set?
Comment 8 Steven Lott 2015-06-01 11:31:39 UTC
Sorry. Didn't take notes. Change tracking was on, so the details are readily available. If only the file wasn't corrupt.
Comment 9 oooforum (fr) 2015-06-01 14:59:10 UTC
(In reply to Steven Lott from comment #6)
> My sequence was this.
> 1) open the .docx
> 2) edit heavily.
> 3) save as .odt.

Not good
Try as my comment 5
Comment 10 Steven Lott 2015-06-02 11:15:53 UTC
Good advice. However. It doesn't uncorrupt the file. Nor does it fix the bug that corrupted the file. It may be a good way of avoiding the corruption bug, but it doesn't fix the bug that corrupted the file.
Comment 11 oooforum (fr) 2015-06-02 12:13:10 UTC
DOCX format is proprietary and undocumented and some advanced features may not be available in Writer. Also, some Users have experienced crashes with large (100's of pages) or complex documents. So the best way to avoid problem is to convert in ODF in first time.
Comment 12 Steven Lott 2015-06-03 00:51:04 UTC
This advice doesn't fix the problem with these documents and doesn't correct the bug that lead to the corrupted documents.
Comment 13 Steven Lott 2015-06-03 01:47:03 UTC
This is neither resolved nor fixed. The files are corrupt. The procedure given does not fix the corruption. The files remain unreadable. No bug has been identified in the software, so this problem will continue to occur.  It is not resolved. It is not fixed.
Comment 14 Regina Henschel 2015-06-03 19:38:06 UTC
There are different problems.

File B03671_06_1d_SFL copy.odt (Sample Corruptes file) has a real problem. It is a duplicate attribute at position 5262 in line 2 in the part content.xml. You can enter the package (e.g. with 7zip) and remove the duplicate attribute. Then the file will be readable. Such error happens from time to time, but there is no reproducible scenario up to now, and therefore it is not possible to nail it down to a place which needs to be fixed. Searching in internet will give you some instructions, how to repair such files. For the attached one I can send you a repaired version, simple drop me a note.

The file B03671_02_2d_SFL copy.odt (Another Corrupted File) has a different problem and I will take this report for it. You can open the file with an older versions like AOO 4.0 or OOo3.2. and resave it with that version. Then the resaved version is readable in the current AOO too. You can install the older AOO version "administrative" (if necessary ask on forum) so that you can use it parallel to the current version.
Comment 15 Armin Le Grand 2015-06-04 08:51:45 UTC
Could you mention which attribute is duplicate? Is it the same or does it change from case to case? This might give hints in following the creation of exactly that attribute
Comment 16 Regina Henschel 2015-06-04 09:56:59 UTC
It is attribute office:name in node
<style:style office:name="__Annotation__1853_16733545811" office:name="__Annotation__1855_16733545811" style:name="P1" style:family="paragraph" style:parent-style-name="Chapter_20_Number_20__5b_PACKT_5d_" style:master-page-name="First_20_Page">
Comment 17 Joe Smith 2015-10-07 21:36:42 UTC
This looks like the same xml error that appears in Issue 126219
Comment 18 orcmid 2016-03-11 23:01:47 UTC
(In reply to Steven Lott from comment #0)
> Created attachment 84759 [details]
> Sample Corrupted file.
> 
> Corrupted file. This started as a .docx file, which was then saved as an
> .odt file.

(In reply to Steven Lott from comment #2)
> Created attachment 84761 [details]
> Another Corrupted File
> 
> A second example of a corrupted file.

I confirm that Attachment 84767 [details] (B03671_02_2d_SFL copy.odt) fails to open with Apache OpenOffice 4.1.2 running on Windows 10.  However, the file does open in Microsoft Word 2016 without complaint.  That is how I can determine that it is Chapter 2 of what appears to be the text of a programming book.  

This appears to be a peculiar case of AOO writing a document that it cannot itself read.  I cannot determine how much of the document is presented by Word 2016.

The .odt file has a complex structure involving embedded subdocuments. It deserves further exploration.

The workaround for these situations is to save the .docx as an .odt *before* any editing, while one has the opportunity to see whether there is any loss of fidelity before going farther.  

That does not alter the fact that there is a confirmed problem here.
Comment 19 orcmid 2016-03-11 23:30:56 UTC
(In reply to Steven Lott from comment #0)
> Created attachment 84759 [details]
> Sample Corrupted file.
> 
> Corrupted file. This started as a .docx file, which was then saved as an
> .odt file.

This is rather different than the second attachment (B03671_02_2d ...).

Fie B03671_06_1d_SFL copy.odt fails to open in both Microsoft Word 2016 and Apache OpenOffice 4.1.2 on Microsoft Windows 10.  In both cases there is claimed to be a defect in one of the data streams in the .odt package.

The AOO error message is the most informative and not that great:

  Read-error.
  Format error discovered in the file in sub-document content.xml at 2,5261 (row,col).

There are 11 sub-documents and I notice that the top-level content.xml (the only one big enough) has extensive tracked changes, many involving annotations.

The Zip package checks as complete and without Zip-level defects, so this appears to be a more serious case in that the document was completely saved, but with a defect in its formatting.

Although of no consolation, this is enough to exclude these two cases from Issue 126869.
Comment 20 orcmid 2016-03-11 23:54:26 UTC
(In reply to orcmid from comment #19)
> (In reply to Steven Lott from comment #0)
> > Created attachment 84759 [details]
> > Sample Corrupted file.
> > 
> > Corrupted file. This started as a .docx file, which was then saved as an
> > .odt file.
> 
> This is rather different than the second attachment (B03671_02_2d ...).

I failed to notice that Regina had already analyzed this one in Comment 16 and the error is an element with a duplicate attribute, making the XML invalid.

This is a very different bug that deserves its own issue.
Comment 21 orcmid 2016-03-12 00:14:30 UTC
(In reply to orcmid from comment #20)
> (In reply to orcmid from comment #19)
> > (In reply to Steven Lott from comment #0)
> > > Created attachment 84759 [details]
> > > Sample Corrupted file.
> > > 
> > > Corrupted file. This started as a .docx file, which was then saved as an
> > > .odt file.
> > 
> > This is rather different than the second attachment (B03671_02_2d ...).
> 
> I failed to notice that Regina had already analyzed this one in Comment 16
> and the error is an element with a duplicate attribute, making the XML
> invalid.
> 
> This is a very different bug that deserves its own issue.

In the case of the second attachment only, this is the same issue that Regina has described in detail in Issue 126339.  I am not certain that should be treated as irreproducible, since it is a confirmed and identified error in the file that is produced.  There is no requirement that we be able to cause it with external testing.  We know there's an error.
Comment 22 orcmid 2016-03-12 00:16:18 UTC
(In reply to orcmid from comment #21)
> (In reply to orcmid from comment #20)
> > (In reply to orcmid from comment #19)
> > > (In reply to Steven Lott from comment #0)
> > > > Created attachment 84759 [details]
> > > > Sample Corrupted file.
> > > > 
> > > > Corrupted file. This started as a .docx file, which was then saved as an
> > > > .odt file.
> > > 
> > > This is rather different than the second attachment (B03671_02_2d ...).
> > 
> > I failed to notice that Regina had already analyzed this one in Comment 16
> > and the error is an element with a duplicate attribute, making the XML
> > invalid.
> > 
> > This is a very different bug that deserves its own issue.
> 
> In the case of the second attachment only, this is the same issue that
> Regina has described in detail in Issue 126339.  I am not certain that
> should be treated as irreproducible, since it is a confirmed and identified
> error in the file that is produced.  There is no requirement that we be able
> to cause it with external testing.  We know there's an error.

I am getting the cross-reference backwards.  Another issue on this duplicate attribute error is Issue 126479.