Apache OpenOffice (AOO) Bugzilla – Issue 126339
Corrupted Files
Last modified: 2016-03-12 00:16:18 UTC
Created attachment 84759 [details] Sample Corrupted file. Corrupted file. This started as a .docx file, which was then saved as an .odt file.
Created attachment 84760 [details] Another Sample Corrupted FIle This is another corrupted file.
Created attachment 84761 [details] Another Corrupted File A second example of a corrupted file.
Provide a corrupted ODT document doesn't help. Thanks to attach the original docx file.
Created attachment 84767 [details] Sample file before editing and saving. Sample file before corruption. Note that it was heavily edited, so there's almost no relationship between this file and the corrupted file. But it was requested.
Created attachment 84768 [details] Your docx has been converted without problem as ODT I don't reproduce. a) Open the .docx b) Save as .odt c) Close document d) Open the .odt Result: no corruption
That's not what I did. I'm not surprised that a simple, no-edit conversion happened to work. Please actually look at the corrupt files. They don't load. My sequence was this. 1) open the .docx 2) edit heavily. 3) save as .odt. This leads to corruption. Please look at the corrupt files to see that they are corrupted.
Please provide more details about step 2. What kind of "heavy" handling did you set?
Sorry. Didn't take notes. Change tracking was on, so the details are readily available. If only the file wasn't corrupt.
(In reply to Steven Lott from comment #6) > My sequence was this. > 1) open the .docx > 2) edit heavily. > 3) save as .odt. Not good Try as my comment 5
Good advice. However. It doesn't uncorrupt the file. Nor does it fix the bug that corrupted the file. It may be a good way of avoiding the corruption bug, but it doesn't fix the bug that corrupted the file.
DOCX format is proprietary and undocumented and some advanced features may not be available in Writer. Also, some Users have experienced crashes with large (100's of pages) or complex documents. So the best way to avoid problem is to convert in ODF in first time.
This advice doesn't fix the problem with these documents and doesn't correct the bug that lead to the corrupted documents.
This is neither resolved nor fixed. The files are corrupt. The procedure given does not fix the corruption. The files remain unreadable. No bug has been identified in the software, so this problem will continue to occur. It is not resolved. It is not fixed.
There are different problems. File B03671_06_1d_SFL copy.odt (Sample Corruptes file) has a real problem. It is a duplicate attribute at position 5262 in line 2 in the part content.xml. You can enter the package (e.g. with 7zip) and remove the duplicate attribute. Then the file will be readable. Such error happens from time to time, but there is no reproducible scenario up to now, and therefore it is not possible to nail it down to a place which needs to be fixed. Searching in internet will give you some instructions, how to repair such files. For the attached one I can send you a repaired version, simple drop me a note. The file B03671_02_2d_SFL copy.odt (Another Corrupted File) has a different problem and I will take this report for it. You can open the file with an older versions like AOO 4.0 or OOo3.2. and resave it with that version. Then the resaved version is readable in the current AOO too. You can install the older AOO version "administrative" (if necessary ask on forum) so that you can use it parallel to the current version.
Could you mention which attribute is duplicate? Is it the same or does it change from case to case? This might give hints in following the creation of exactly that attribute
It is attribute office:name in node <style:style office:name="__Annotation__1853_16733545811" office:name="__Annotation__1855_16733545811" style:name="P1" style:family="paragraph" style:parent-style-name="Chapter_20_Number_20__5b_PACKT_5d_" style:master-page-name="First_20_Page">
This looks like the same xml error that appears in Issue 126219
(In reply to Steven Lott from comment #0) > Created attachment 84759 [details] > Sample Corrupted file. > > Corrupted file. This started as a .docx file, which was then saved as an > .odt file. (In reply to Steven Lott from comment #2) > Created attachment 84761 [details] > Another Corrupted File > > A second example of a corrupted file. I confirm that Attachment 84767 [details] (B03671_02_2d_SFL copy.odt) fails to open with Apache OpenOffice 4.1.2 running on Windows 10. However, the file does open in Microsoft Word 2016 without complaint. That is how I can determine that it is Chapter 2 of what appears to be the text of a programming book. This appears to be a peculiar case of AOO writing a document that it cannot itself read. I cannot determine how much of the document is presented by Word 2016. The .odt file has a complex structure involving embedded subdocuments. It deserves further exploration. The workaround for these situations is to save the .docx as an .odt *before* any editing, while one has the opportunity to see whether there is any loss of fidelity before going farther. That does not alter the fact that there is a confirmed problem here.
(In reply to Steven Lott from comment #0) > Created attachment 84759 [details] > Sample Corrupted file. > > Corrupted file. This started as a .docx file, which was then saved as an > .odt file. This is rather different than the second attachment (B03671_02_2d ...). Fie B03671_06_1d_SFL copy.odt fails to open in both Microsoft Word 2016 and Apache OpenOffice 4.1.2 on Microsoft Windows 10. In both cases there is claimed to be a defect in one of the data streams in the .odt package. The AOO error message is the most informative and not that great: Read-error. Format error discovered in the file in sub-document content.xml at 2,5261 (row,col). There are 11 sub-documents and I notice that the top-level content.xml (the only one big enough) has extensive tracked changes, many involving annotations. The Zip package checks as complete and without Zip-level defects, so this appears to be a more serious case in that the document was completely saved, but with a defect in its formatting. Although of no consolation, this is enough to exclude these two cases from Issue 126869.
(In reply to orcmid from comment #19) > (In reply to Steven Lott from comment #0) > > Created attachment 84759 [details] > > Sample Corrupted file. > > > > Corrupted file. This started as a .docx file, which was then saved as an > > .odt file. > > This is rather different than the second attachment (B03671_02_2d ...). I failed to notice that Regina had already analyzed this one in Comment 16 and the error is an element with a duplicate attribute, making the XML invalid. This is a very different bug that deserves its own issue.
(In reply to orcmid from comment #20) > (In reply to orcmid from comment #19) > > (In reply to Steven Lott from comment #0) > > > Created attachment 84759 [details] > > > Sample Corrupted file. > > > > > > Corrupted file. This started as a .docx file, which was then saved as an > > > .odt file. > > > > This is rather different than the second attachment (B03671_02_2d ...). > > I failed to notice that Regina had already analyzed this one in Comment 16 > and the error is an element with a duplicate attribute, making the XML > invalid. > > This is a very different bug that deserves its own issue. In the case of the second attachment only, this is the same issue that Regina has described in detail in Issue 126339. I am not certain that should be treated as irreproducible, since it is a confirmed and identified error in the file that is produced. There is no requirement that we be able to cause it with external testing. We know there's an error.
(In reply to orcmid from comment #21) > (In reply to orcmid from comment #20) > > (In reply to orcmid from comment #19) > > > (In reply to Steven Lott from comment #0) > > > > Created attachment 84759 [details] > > > > Sample Corrupted file. > > > > > > > > Corrupted file. This started as a .docx file, which was then saved as an > > > > .odt file. > > > > > > This is rather different than the second attachment (B03671_02_2d ...). > > > > I failed to notice that Regina had already analyzed this one in Comment 16 > > and the error is an element with a duplicate attribute, making the XML > > invalid. > > > > This is a very different bug that deserves its own issue. > > In the case of the second attachment only, this is the same issue that > Regina has described in detail in Issue 126339. I am not certain that > should be treated as irreproducible, since it is a confirmed and identified > error in the file that is produced. There is no requirement that we be able > to cause it with external testing. We know there's an error. I am getting the cross-reference backwards. Another issue on this duplicate attribute error is Issue 126479.