Issue 126479

Summary: Read error, Format error discovered in the file in sub-document content.xml at 2,2992(row,col)
Product: Writer Reporter: lloydthorndyke
Component: formattingAssignee: AOO issues mailing list <issues>
Status: CLOSED DUPLICATE QA Contact:
Severity: Normal    
Priority: P5 (lowest) CC: ardovm, jes, orcmid, rb.henschel
Version: 4.1.1   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
The corrupted file.
none
double attribute removed none

Description lloydthorndyke 2015-08-19 13:22:55 UTC
Created attachment 84887 [details]
The corrupted file.

this error happened previously on the same file.  I divided the file into sub-files, concerned that the length was causing the problem.  The error was resolved through attaching the file and sending here.  is there a way to open and fix this file myself?  Attached is the file with the error.  What causes such an error?  Would deleting and reloading Open Office fix this?
Comment 1 Regina Henschel 2015-08-19 14:01:56 UTC
It often helps, when you start with a new, empty document and then copy&paste the old content, as pure text without formatting, to be on the save side.

It helps, when you do not drag along recorded changes, but resolve them before making new changes. Same for comments, keep a version without comments as additional backup.

The length of the document should be no problem, unless you are working with thousands of equations.

If you are not using the official release, then deinstalling and then installing the official release might help. But if you have got the official release already, reinstalling will not solve the problem.

Yes, you can repair it yourself. You need a text editor. I unpack the file and open the part content.xml in SeaMonkey, which is my browser. It tells me not only the place but it tells me the reason too. Then I open the file in an editor and correct the problem. At the end I pack it and test the result. The file is actually a zip-container. If your packer does not notice this, you can rename the file to file name extension .zip and later on back to .odt.
Comment 2 lloydthorndyke 2015-08-19 14:37:54 UTC
Would Microsoft note book work as the text editor?
Comment 3 Regina Henschel 2015-08-19 15:01:09 UTC
It is better to use a tool, that keeps encoding and line ends and provides syntax highlighting. Give Notepad++ a try. You can get it from https://notepad-plus-plus.org/download/v6.8.1.html, for example. You need not install it. Take the zip or 7z variant, unpack it and run it from that folder.

When you later on zip your repaired document, make sure, that you are inside the folder and pack all of its content. Do not zip the folder itself.

Do you remember what the last actions on the document were, before you get a read error in opening it?
Comment 4 lloydthorndyke 2015-08-19 15:31:19 UTC
(In reply to Regina Henschel from comment #3)
> It is better to use a tool, that keeps encoding and line ends and provides
> syntax highlighting. Give Notepad++ a try. You can get it from
> https://notepad-plus-plus.org/download/v6.8.1.html, for example. You need
> not install it. Take the zip or 7z variant, unpack it and run it from that
> folder.
> 
> When you later on zip your repaired document, make sure, that you are inside
> the folder and pack all of its content. Do not zip the folder itself.
> 
> Do you remember what the last actions on the document were, before you get a
> read error in opening it?

It was a read error upon opening after I'd saved my previous work.
Comment 5 lloydthorndyke 2015-08-19 15:40:52 UTC
How do I locate the trouble with Notepad++?  I'm not certain what I'm looking for to fix.
Comment 6 Regina Henschel 2015-08-19 16:48:19 UTC
As I said, I use my browser to find the problem. Here the output from Chrome and Seamonkey, Firefox might work too, but I have not installed it, and therefore cannot test it.

Chrome:
error on line 2 at column 1357: Attribute office:name redefined

Seamonkey (sorry it is German):
XML-Verarbeitungsfehler: Doppeltes Attribut

And Seamonkey draws long underline till
office:name="__Annotation__5371_1603160509111111111111111" office:name="__Annotation__5492_160316050911111"

In the output of Seamonkey the error is obvious. The part office:name should appear only once. But from the place so early in the document, I guess that both are wrong there and should be deleted. Such attribute should only directly follow the part <office:annotation and the part <office:annotation-end .

Set the cursor into the text. NotePad++ shows Line- and Column-Number in the status bar, you should be able to locate ln2 col2992.

(In reply to lloydthorndyke from comment #4)
> It was a read error upon opening after I'd saved my previous work.

What did you before you saved the work? Are you only using Apache OpenOffice 4.1.1 or is the document worked on in an other application too?
Comment 7 Regina Henschel 2015-08-19 16:54:59 UTC
The management of annotations (comments) seems to be broken in the document. I would no longer work with that document, but copy the pure text to a new document as soon as you can read it or from a previously stored backup.
Comment 8 Regina Henschel 2015-08-19 17:23:29 UTC
Please have a look at https://bugs.documentfoundation.org/show_bug.cgi?id=90330. That report looks very similar to yours, only for LibreOffice.
Comment 9 lloydthorndyke 2015-08-19 20:30:13 UTC
(In reply to Regina Henschel from comment #8)
> Please have a look at
> https://bugs.documentfoundation.org/show_bug.cgi?id=90330. That report looks
> very similar to yours, only for LibreOffice.

Tried fixing myself and just got confused.  Might I impose upon you to create the fix, then I can retrieve this file and then cut and past to a new file.
Comment 10 Regina Henschel 2015-08-20 13:16:31 UTC
Created attachment 84888 [details]
double attribute removed

You should be able to open the document.

I think, you should switch off change tracking before you delete or insert an annotation, so that the annotations not recorded in the change tracking infos. It seems, that the program get confused, if annotations are inside the change records.

We are still looking for a step-by-step description how to produce the problem. I was not able to produce such an erroneous document. Without that it is not possible to fix the bug. Therefore I'll close the issue for now.
Comment 11 Regina Henschel 2015-08-20 13:17:50 UTC
Feel free to reopen, if you can give a scenario to reproduce the problem.
Comment 12 lloydthorndyke 2015-08-20 13:55:15 UTC
Thank you.  I appreciate all your time and effort in helping me recover the file.  I really have no grasp of the intricacies for fixing the XML, so your help has been invaluable.
Comment 13 orcmid 2016-03-12 00:22:11 UTC
(In reply to Regina Henschel from comment #10)
> Created attachment 84888 [details]
> double attribute removed
> 
> You should be able to open the document.
> 
> I think, you should switch off change tracking before you delete or insert
> an annotation, so that the annotations not recorded in the change tracking
> infos. It seems, that the program get confused, if annotations are inside
> the change records.
> 
> We are still looking for a step-by-step description how to produce the
> problem. I was not able to produce such an erroneous document. Without that
> it is not possible to fix the bug. Therefore I'll close the issue for now.

Thank you for the analysis.  The recommendation about change-tracking and annotations should be helpful.  

Part of Issue 126339, that you also analyzed, also exhibits this bug.  

(I would think of this as confirmed. We do fix bugs that can't be triggered by a reproducible test document so long as the defect is clear in documents we are provided.)
Comment 14 orcmid 2016-03-12 17:18:06 UTC
(In reply to orcmid from comment #13)
[ ... ]> Part of Issue 126339, that you also analyzed, also exhibits this bug.  
> 
> (I would think of this as confirmed. We do fix bugs that can't be triggered
> by a reproducible test document so long as the defect is clear in documents
> we are provided.)

I am reopening this defect report.  We do not require a reproducible proof-of-concept in the case of an isolated and confirmed problem.  (If we did, the cases of failed saves that write trash and ones that write incorrect files would all be treated as resolved because they are irreproducible.  I think irreproducible applies more when there is lack of confirmation, including insufficient information and follow-up from the original reporter.
Comment 15 orcmid 2016-03-12 17:18:27 UTC
Marking as confirmed
Comment 16 orcmid 2016-03-12 19:55:22 UTC
(In reply to Regina Henschel from comment #10)
> I think, you should switch off change tracking before you delete or insert
> an annotation, so that the annotations not recorded in the change tracking
> infos. It seems, that the program get confused, if annotations are inside
> the change records.

In general, reliance on change tracking can be a problem in large/complex  documents and this may be related to that.  See Issue 121571 (which does need to be reconfirmed).  Although production of invalid XML (as in this case) is uncommon, there are other cases where a reopened document does not preserve the tracked changes properly, and this might not be noticed immediately.
Comment 17 orcmid 2016-05-31 15:52:02 UTC
*** Issue 126219 has been marked as a duplicate of this issue. ***
Comment 18 Arrigo Marchiori 2021-02-13 14:39:34 UTC
The proposed fix to the data corruption problem (duplicated office:name attribute) is discussed in the report for bug #128356

*** This issue has been marked as a duplicate of issue 128356 ***