Issue 127745 - Read Error: Format error discovered ... at n,nnnn (row,col)
Summary: Read Error: Format error discovered ... at n,nnnn (row,col)
Status: CLOSED DUPLICATE of issue 128356
Alias: None
Product: Writer
Classification: Application
Component: ui (show other issues)
Version: 4.1.5
Hardware: PC Windows 7
: P5 (lowest) Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-01 18:02 UTC by John
Modified: 2021-02-15 17:59 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
File giving Read Error (34.29 KB, application/vnd.oasis.opendocument.text)
2018-04-01 18:02 UTC, John
no flags Details
File with Read Error. (63.72 KB, application/vnd.oasis.opendocument.text)
2018-04-01 18:03 UTC, John
no flags Details
Sammy Russel 1draft - CORRECTED.odt file (35.53 KB, application/vnd.oasis.opendocument.text)
2018-04-15 15:05 UTC, John
no flags Details
Location of problem.GIF - image of content.xml (83.17 KB, image/gif)
2018-04-16 16:25 UTC, John
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description John 2018-04-01 18:02:38 UTC
Created attachment 86380 [details]
File giving Read Error

See uploaded files Sammy Russel 1draft.odt and ARTTRANNIE WITH NOTES Ruth 29 03.odt.

Both files fail to open giving "Read Error:  Format error discovered ... at n,nnnn (row,col)"

Analysis of content.xml shows that the first style definition in each file has been corrupted with multiple redundant office:name definitions.  In one file the first style definition was for P1; in the other the first was for Table1.

The fix to repair the files is to delete this redundant data.

Notes:

1  We often get posts of this problem in the forum
2  They always seem to be files where comments have been added to a range of characters.
3  We suspect but have not confirmed that the problem is caused by MS Word being used to edit the file.  Record Changes may be switched on, and a comment is attached to a range of characters.
4  See [Solved] Read-Error at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=93024#p442216 for first file.
5  See Format error discovered at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=93036 for second file.

Question:  Is it possible that AOO does the corruption?

Why is this important?  

Students exchange .odt files written with AOO with supervisors who use MS Word, where the supervisor adds comments and records changes.  If MS Word is corrupting the file we need to get Microsoft to fix it.

AOO 4.1.5 Windows 7
Comment 1 John 2018-04-01 18:03:20 UTC
Created attachment 86381 [details]
File with Read Error.
Comment 2 oooforum (fr) 2018-04-05 16:35:30 UTC
(In reply to John from comment #0)
> Question:  Is it possible that AOO does the corruption?
It would be best to prevent this error from happening.

So, to investigate in this direction, we need a step-by-step to reproduce.
Comment 3 John 2018-04-05 16:57:59 UTC
Unfortunately we get sent the files to repair it and it is very difficult to get a full history.  I am attempting to get access to MS Word so I can do some tests.

Note that the "annotation error" number seems to have multiple " 1 " digits added to it - the values in the sammy russell file are

Annotation__414_24419901911
Annotation__401_244199019111
Annotation__158_2441990191111
Annotation__248_244199019111111
Annotation__153_24419901911111111

We have noticed this problem of "added 1" in other files which are edited by MS Word so we are wondering:  Is the problem caused by MS Word or is it caused by AOO?

Is it possible for you to answer that question by saying "It is not possible for AOO to add the " 1 " digits as shown above".
Comment 4 oooforum (fr) 2018-04-09 07:54:15 UTC
You talk about MS-Word to read ODT.
But remember that Microsoft use ODF in 1.1 and OpenOffice in 1.2 format.
Comment 5 John 2018-04-15 15:05:17 UTC
Created attachment 86388 [details]
Sammy Russel 1draft - CORRECTED.odt file

I can confirm that Writer is adding this corruption to the .odt file.  It is repeatable - see the attached Sammy Russel 1draft - CORRECTED.odt file.

Steps to cause Writer to corrupt the .odt file.

1  Download Sammy Russel 1draft.odt from [Solved] Read-Error at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=93024#p442216 or it is the first file attached to this report (36kB).

2  Extract content.XML.  Note that the P1 Style definition has been corrupted and redundant and incorrect office:name="__Annotation__153_24419901911111111" office:name="__Annotation__158_2441990191111" office:name="__Annotation__248_244199019111111" office:name="__Annotation__401_244199019111" office:name="__Annotation__414_24419901911" has been inserted into the P1 Style definition.  

3  Delete these redundant items and re-insert content.XML to get the attached file Sammy Russel 1draft - CORRECTED.odt.  At this stage it is thought that the .odt file is OK.

4 Open Sammy Russel 1draft - CORRECTED.odt.  The file opens without problem.

5  Make a trivial edit (add a space in front of Case Summary) and save the file.

Expected result:  File should not be corrupted when saved.

Actual result:  Writer corrupts the P1 Style definition by inserting one or more office:name definitions into the P1 Style definition.

Notes:

1.  It appears that the file was created by author SN using AOO Writer.  The file was sent to reviewer SD who used MS Word and recorded changes on 20 Mar 2018.  Some changes were "Comments attached to a range of characters" and it is these Comments which use the office:name definitions.

2.  Author SN then recorded more changes to the file using AOO on 22 Mar.  Record Changes is still ON. 

3.  At some stage, the file became corrupted.  This probably happened when author SN edited and saved the file after it had been edited with MS Word (and as described in Step 5 above).

4.  Analysis of the time stamps of the edits shows that each change is timed at nn:nn:00.0n seconds.  It seems strange to me that the time is always set to 00.0n seconds.  The times are shown below where 20 = date 20th. 

The first five office:name ... appear in the file, and also corrupt the P1 Style definition.  The sixth, seventh and eight appear in the file but do NOT corrupt the P1 Style definition.  The sixth was the first, recorded at 09:51:00.02.  

The other twenty times are recorded changes which were not Comments added to a range of characters.  Note that the same 12:18:00.06 time is recorded for two different changes.

Note the multiple adding of digits "111...".  

Note how the decimal component of the seconds increments throughout - I would expect it to be more random.

The times below are in the order in which they appear, from start to end, in content.xml.

office:name="__Annotation__153_24419901911111111" line  200  20  9:56:00.04  SD
office:name="__Annotation__158_2441990191111"     line  220  20  9:57:00.04  SD
office:name="__Annotation__248_244199019111111"   line  351  20 10:39:00.04  SD  
office:name="__Annotation__401_244199019111"      line  859  20 12:18:00.06  SD 
office:name="__Annotation__414_24419901911"       line  958  20 12:20:00.06  SD

office:name="__Annotation__3_244199019"           line 1260  20  9:51:00.02  SD
office:name="__Annotation__396_244199019"         line 1522  20 12:18:00.06  SD 
office:name="__Annotation__551_244199019"         line 1636  20 12:50:00.08  SD 

09:54:00:04
11:50:00.04 
10:43:00.05 
10:41:00.05
12:21:00.05 
11:40:00.05 
11:52:00.05
11:56:00.06
12:43:00.06 
12:18:00.06  line 816 
12:27:00.06 
12:29:00.07 
12:28:00.07  
12:39:00.07 
12:40:00.07 
12:42:00.08
12:42:00.08 
12:44:00.08 
12:46:00.08
12:50:00.08
Comment 6 John 2018-04-16 16:25:40 UTC
Created attachment 86389 [details]
Location of problem.GIF - image of content.xml

If Recorded change 72 is accepted the problem disappears and Writer does not corrupt content.XML. 

See Location of problem.GIF - Recorded change 72 is the deletion of the commented text "(report 2)".

Steps to reproduce removal of problem:

1  Open Sammy Russel 1draft.odt.
2  Un-tick Edit > Changes.
3  Edit > Accept/Reject changes.  Scroll to change 72 and accept it.  See image - change 72 is  the second 15:12 change, and is the deletion of the Commented text "(report 2)"
4  Add a space before Case summary at top
5  File > Save.

Expected result:  See first post where making a trivial change by adding a space caused the Read Error problem when the file was then opened.

Actual result:  The file now opens successfully without a Read Error message.

Conclusion.  Accepting Recorded change 72 has "removed" the problem causing the Read Error.
Comment 7 John 2020-12-08 23:39:17 UTC
See also Issue 128356 - Track Changes and Annotations on text range can cause corruption. Applies to 4.x (all versions?) which appears to be very similar.

https://bz.apache.org/ooo/show_bug.cgi?id=128356
Comment 8 Arrigo Marchiori 2021-02-05 21:57:06 UTC
Following up from bug #128356.

On bug #128356 we seem to have fixed the corruption of ODT documents containing a certain type of comments or annotations.

Apparently, this also fixes the data _corruption_ you report here: the repeated XML attribute office:name is not added any more, and so it is never repeated. Editing "Sammy Russel 1draft - CORRECTED.odt" and saving it gives a properly constructed ODT document that can be reopened with no problems.

You are also reporting something else here: the "office:name" of the annotation entries are being _changed_ every time the document is re-saved.

This could be considered a bug... or not, depending on its effect.
In any case, I confirm it as I am also seeing it in test cases for bug #128356.

As a '1' seems to be always appended, we could argue that after many times the document is edited and saved, the name will eventually become "too long". How long is "too long" is hard to tell.

I am not considering this issue urgent -- but this is only my humble opinion. I am open to discussion on this topic.
Comment 9 Arrigo Marchiori 2021-02-13 14:34:07 UTC
I am flagging this bug as a duplicate of bug #128356 because the proposed solutionto the data corruption problem is there.

John, if you believe that the '1' being added to the office:name attribute is a bug, please open a new report.

*** This issue has been marked as a duplicate of issue 128356 ***
Comment 10 John 2021-02-15 17:59:52 UTC
(In reply to Arrigo Marchiori from comment #9)

> John, if you believe that the '1' being added to the office:name attribute
> is a bug, please open a new report.
> 
Arrigo

Thanks.  

I have no evidence that AOO adds the multiple "1111" sequences to the office:name attribute as in "<office:annotation office:name="__Annotation__401_2441990191111">".

In fact, I have an unproven hunch that MS Word adds these "1111" when MS Word edits a .odt file.

My evidence was solely that AOO corrupted the file when a trivial edit was made by bringing existing office:name attribute definitions  into the P1 Style definition and corrupting the file, something now fixed in bug #128356.

I concur that this bug report should be closed.