Issue 126869

Summary: Analysis Task: Lost/Corrupted Documents after Save/Shutdown
Product: General Reporter: orcmid <orcmid>
Component: codeAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Normal    
Priority: P5 (lowest) CC: ajith.titus, alex, ardovm, czeslaw.wolanski, john.ha24, mseidel, ofarrwrk, oooforum, pamelaaploof, pedlino, petko, rodlockwood, villeroy, william
Version: 3.4.0   
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: TASK Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
Lost images - even though the image temporary files were still in the \Temp folderr
none
Broken .odt file - note garbage before the proper PK header
none
File written when PC is powered off during the Save none

Description orcmid 2016-03-11 22:25:47 UTC
This meta-task is a subtask of Issue 126846 on Major Recurring Data/Operation Loss/Corruption situations.  See that issue for general context and an attachment that provides details about the range of cases.

FOCUS

This task focuses on everything that is known and can be determined about incident where thought-to-be-saved documents are found to be corrupted and unusable on subsequent opening.  

This is about how AOO saves a file such that it is not valid, in contrast with a successful save of a valid document, but the content is altered in some manner.  (The consequences can be just as awful, but the causes are presumed to be different.)

Although there may be occasional connections, this task does not address application freezing, auto-recovery problems, decryption problems, locking issues and user-profile damage of the kind where certain functions cease working.

With regard to application freezing, this may result in there being no usable document when shutdown has to be forced.  There may be some relief for those cases with measures that reduce document loss/corruption incidents.
 
There are likely to be very simple mitigations and remedies that can be introduced in stages toward more complete solutions.  That is to avoid destabilizing the software with changes we introduce.  Changes must be reversible until we have confidence in them.

RELATED MATERIAL

Comment 3 of Issue 126846 applies to this task: https://bz.apache.org/ooo/show_bug.cgi?id=126846#c3

Item (2) of Attachment 85292 [details] of Issue 126846 applies to this task.  

Item (1) of Attachment 85292 [details], about all text being replaced by "#" characters, is likely to be a bug of different origin.  This occurs in a different manner in which there is no corruption in the saved file -- it is complete and valid.  The "#" characters are actually there.  This is an extreme case where what appears to have been saved is different than what was actually saved only to be discovered on subsequent opening of the saved file.  The file is valid and completely produced, although not what was though to be saved.  Such cases need to be considered separately.

EXISTING REPORTS

Attachment 85292 [details] lists several Community Forum threads.

There are also bugzilla issues on incidents of damaged files that cannot be opened because they are not a valid saved document.

Some of those will be marked as duplicates of this issue simply as a way to provide cross-referencing.
Comment 1 orcmid 2016-03-11 22:32:52 UTC
(In reply to orcmid from comment #0)
> Item (1) of Attachment 85292 [details], about all text being replaced by "#"
> characters, is likely to be a bug of different origin.  This occurs in a
> different manner in which there is no corruption in the saved file -- it is
> complete and valid.  The "#" characters are actually there.

To be clear, I am not referring to cases where Apache OpenOffice does not recognize what the file is and offers to filter it as text.  The "#" case I am referencing is when the document is in a recognized format and the content is actually nothing but paragraphs having runs of "#".

When a filtering as text is offered, it might be a case of a corrupted/failed save or it may have a different origin.  Individual analysis of available information is required for those cases.
Comment 2 orcmid 2016-03-11 23:43:19 UTC
*** Issue 126743 has been marked as a duplicate of this issue. ***
Comment 3 orcmid 2016-03-12 17:39:39 UTC
Issue 126479 identifies a *particular case* case of "read error" where Apache OpenOffice writes a file having XML that is invalid in a specific way.

This is a qualifying case of writing something that cannot be read.  Such documents can be recovered with some data loss by specialists.  There is no redress for non-technical users.
Comment 4 orcmid 2016-03-12 20:22:54 UTC
*** Issue 111290 has been marked as a duplicate of this issue. ***
Comment 5 orcmid 2016-03-12 20:30:14 UTC
*** Issue 107972 has been marked as a duplicate of this issue. ***
Comment 6 orcmid 2016-03-12 20:35:44 UTC
*** Issue 106865 has been marked as a duplicate of this issue. ***
Comment 7 John 2016-04-14 19:02:08 UTC
I think that Issue 107558 - A hidden step while writing OOo files? may be relevant.

The poster shows that AOO apparently silently continues to do saving operations AFTER the moving blue bar has finished moving.  Users therefore may think that the save has completed whereas it has not been completed.  Forum member RoryOF has long suspected that over hasty "slamming the laptop lid shut" could be a cause of problems like these.
Comment 8 John 2016-04-15 14:33:40 UTC
Why is there a difference between File > Save ..., and copying everything to a new document and File > Save ..., the new document?

There are numerous cases where a corrupted .odt file is magically "un-corrupted" by Edit > Select All > Copy > Paste into a new document > Save ....  Two such forum posts (there are more - copying to a new document is often suggested as a fix) are:

a) Re: Compressing size of an odt containing large image (https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=81238&p=377733&hilit=tangled#p377950) which says "I'm amazed at the improvement of the second file after I saved the contents of the original into a new file ('removing the tangles')"

b) Re: Changing Automatic default footnote anchor symbol [Solved] (https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=82900#p384321) where a corrupted (wrong font?) footnote anchor was reset by pasting the entire .odt contents to a new .odt file.

As copying the entire contents seems to fix problems - possibly because redundant tags and their arguments are discarded??? - could this "clean up" stage be incorporated as a normal part of the File > Save ... process.
Comment 9 orcmid 2016-04-15 15:59:51 UTC
(In reply to John from comment #8)

> There are numerous cases where a corrupted .odt file is magically
> "un-corrupted" by Edit > Select All > Copy > Paste into a new document >
> Save ....  Two such forum posts (there are more - copying to a new document
> is often suggested as a fix) are:

I had been thinking that "corrupted" meant that it wouldn't open and that the offer to repair also failed.  That is, the document can't be used at all.

In the example where the document does open (enough that it can be copied into a fresh document), what is meant by corruption in this case?  Does a Save As provide the same miracle cure?  

I am asking because it strikes me that the document is not known to be corrupted once it is opened, so it is not clear that the user-perceived defects seen would be recognized as something to be cleaned up by File > Save.

Please say more.
Comment 10 orcmid 2016-04-15 16:10:30 UTC
(In reply to John from comment #7)
> I think that Issue 107558 - A hidden step while writing OOo files? may be
> relevant.
> 
> The poster shows that AOO apparently silently continues to do saving
> operations AFTER the moving blue bar has finished moving.  Users therefore
> may think that the save has completed whereas it has not been completed. 
> Forum member RoryOF has long suspected that over hasty "slamming the laptop
> lid shut" could be a cause of problems like these.

The problem could be that Apache OpenOffice indicates it has completed its part of the Saves, but the file system is caching/buffering writes and they are not actually completed yet.  This may be difficult/impossible to detect from within the application.  There may be OS/BIOS settings that apply, but AOO does not have access to those as far as I know.

This will be something to check on though.  This might also be a factor when the application is signalled that the computer is hibernating/shutting-down and there is no opportunity for a Save-Unsaved-Work? dialog.
Comment 11 John 2016-04-15 17:04:10 UTC
(In reply to orcmid from comment #9)
> (In reply to John from comment #8)
> 
> Please say more.

Let me refer you to some forum posts (I searched for my replies using the word "tangled").  Some posts huploaded files which can be examined.

Incidentally, in the first (footnotes), the problem was resolved by the copy/paste, so perhaps "discarding the garbage" takes place at the decision of "what to copy" or the decision of "what to paste".

[Solved] Changing Automatic default footnote anchor symbol at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=82900&hilit=+tangled.  Tsis post uploaded a .odt file with Chinese characters (thought to be the Chinese equivalent of 1, 2, 3...)for footnote anchors.  Saving the document kees the Chinese characters.  But copying the contents to a new document changes the Chinese characters to Roman numerals 1, 2, 3 etc.  Note the "correction" appears after the paste - and the corrected file is saved with Roman numerals.

Yet another book layout question - sorry! at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=76882&hilit=+tangled says "Somewhere around 80-100 pages of included document, formatting starts crashing, behaving randomly, losing graphics, losing frames, etc., all the individual ills mentioned in so many questions here. " One responder suggested "Although I'm new to OO, having been an assembler (machine code) programmer for a typesetting company in my younger days, I was immediately struck by your symptoms. My view is that the developers of OO didn't expect such a large and complex work to be generated using OO. From experience, it was usually that my internal storage/buffers were too small, (and that I think is your problem); even when I thought they could never get bigger - they did! Writing programs to process words and their associated typesetting data, in one program I had to expand my buffer for single word to around 500 bytes" The advice given was to remove all formatting and re-format from scratch and the user reported "I did take the advice to start over, and building things from the first page to the last seems much more stable ... I'm over 100 pages, and past any prior problem lengths."

Backspace causes a change in the font at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=81329&hilit=+tangled.  The user uploaded his file. I posted "Your file has 59 Paragraph styles, 24 Text styles and 50 List styles. content.xml, where all the text and styling is managed, is 104 kBytes.

My "untangled" (ie paste into a new document) version of your file has 1 Paragraph style. content.xml is 49 kBytes.

Cannot save large file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=81788&hilit=+tangled.  This user was advised to paste to a new document.  He did not comem back ...

[Solved] Compressing size of an odt containing large images at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=81238&hilit=+tangled.  A second poster came into the thread with "I have the same problem with file size.
Mine is a Writer file where I've copied & pasted interesting technical information, including large images, from websites over many many years. Now it has come to the point where it takes 5 minutes to save and is extremely slow to move from page to page. I don't want this document to be a .pdf file or in a zip folder or divided into parts, as I'm adding to it almost daily".  AFter copying the content to a new file the user reported "I'm amazed at the improvement of the second file after I saved the contents of the original into a new file ('removing the tangles')"
Comment 12 John 2016-04-20 09:41:33 UTC
Created attachment 85459 [details]
Lost images - even though the image temporary files were still in the \Temp folderr
Comment 13 John 2016-04-20 09:57:05 UTC
Sorry - it lost my text.

I have just lost the images in a 1.5MB .odt file I was editing.  I wrote the analysis which lead to this issue being opened where I suggested that images might be being lost because they were unprotected in the \Temp fiolder and were somehow being deleted.

This case is different BECAUSE THE TEMPORARY FILES WERE STILL IN THE TEMPORARY FOLDER EVEN THOUGH I WAS GETTING THE READ ERROR ERROR MESSAGE.

It seems therefore as though Writer itself "lost contact" with the images while I was editing the document.  Then, when I saved the document, even though the images were still in the temporary folder, Writer did not save them.

I opened the 1.5MB .odt file and switched on Edit > Changes.  I had been working on it for about 5 minutes, and made about 10 changes when I scrolled and noticed the READ ERROR (see uploaded attachment) where there should have been an image.  I scrolled through the entire document and every image was showing the error.  

Note that the attachment shows the THREE error meessages I was getting for the ONE missing image.  Other READ ERROR messages seemed to have bits of the diocument text in the brown frames.

I immediately went to C:\Users\John\AppData\Local\Temp\ and located the temporary folder in use.  I was editing just the one document and there was only one temporary folder which was a Writer file being edited.  When I opened the folder, I could see the "large" first file - which I believe is the .odt text file itself - and many smaller files which are normally, and I took them to be, the image files. 

Stupidly, I did not make a copy of the folder and its contents - sorry :-(.

I then saved the .odt file.  When I re-opened it, it was only 54kB and all the images were lost.  I unzipped the .odt and there was no Pictures folder.
Comment 14 John 2016-04-24 17:04:17 UTC
Created attachment 85497 [details]
Broken .odt file - note garbage before the proper PK header

See forum post https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=83063 where a .odt file was corrupted.

This is the file uploaded by the poster.  

Note how the first part of the file is garbage, and the PK header appears close to the end of the file.  The .odt file opens with 7-ZIP (presumably it ignores the garbage?) and shows the names and sizes of the internal files, but the internal files themselves cannot be extracted.
Comment 15 orcmid 2016-04-24 18:22:15 UTC
(In reply to John from comment #14)
> Created attachment 85497 [details]
> Broken .odt file - note garbage before the proper PK header
> 
> See forum post
> https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=83063 where a .odt
> file was corrupted.
> 
> This is the file uploaded by the poster.  
> 
> Note how the first part of the file is garbage, and the PK header appears
> close to the end of the file.  The .odt file opens with 7-ZIP (presumably it
> ignores the garbage?) and shows the names and sizes of the internal files,
> but the internal files themselves cannot be extracted.

This suggests that the beginning of the file, not the end, has been lost or mangled.  7-ZIP finds the "central" directory, which is on the end, that says what all the parts are and what their offsets are to find them in the main part of the Zip.

Because the file has been beheaded (or otherwise mangled), the parts can't actually be found and extracted by 7-zip (or WinZip) and a test operation will reveal that.  The WinZip report is more to the point:

Errors were detected -- see below for details
 
Testing ...
Error in file #1:  bad Zip file offset (Error local header signature not found):  disk #1  offset: 0
Error in file #2:  bad Zip file offset (Error local header signature not found):  disk #1  offset: 77
Error in file #3:  bad Zip file offset (Error local header signature not found):  disk #1  offset: 543
Error in file #4:  bad Zip file offset (Error local header signature not found):  disk #1  offset: 811
Error in file #5:  bad Zip file offset (Error local header signature not found):  disk #1  offset: 1341
testing: styles.xml               OK
At least one error was detected in D:\orcmid\docs\associazione\standards\OASIS\ODF\Development\Forensics\AOO-FileLoss-Corruption\Corruption-i126869-ForumUpload.odt.

Note that the last component of the Zip (file #6), styles.xml, is there and checks out. (7-zip produces a stranger error report, treating blocks as if the signatures and data are there and then reporting that the data is incorrect.)  

I inspected the file using a hex editor.  I see that the material preceding the intact styles.xml part appears to be what is left of the compressed content.xml part (File #5).  

There is no possible way to recover anything from this Zip.

 - - - - - - - - - - -

By the way, we are loading too many different cases into this meta-task.  It would be great to figure out how to differentiate them, use separate issues as well as we can, and link to them from this issue.  I suspect that ones like this one are very difficult to replicate, but it is definitely in a class by itself.
Comment 16 Andreas Säger 2016-04-24 18:49:41 UTC
Linux zip program can restore the styles.xml

$zip -F inputfile.odt --out-file outputfile.odt
 produces an out file with intact styles.xml
Comment 17 orcmid 2016-04-24 19:28:46 UTC
(In reply to Andreas Säger from comment #16)
> Linux zip program can restore the styles.xml
> 
> $zip -F inputfile.odt --out-file outputfile.odt
>  produces an out file with intact styles.xml

Yes, the styles.xml component of the .odt is intact and verifies.  7-zip and WinZip will recover that file.

That does not help in recovery of the content, since content.xml is completely unrecoverable.  If the end instead of the beginning of the content.xml part were preserved, it would be possible to make a partial recovery.  Not having the beginning of a compressed file makes it extremely difficult to recover anything at all because of the way that the compression works.
Comment 18 John 2016-04-25 11:11:03 UTC
(In reply to John from comment #13)
> 
> I have just lost the images in a 1.5MB .odt file I was editing.  I wrote the
> analysis which lead to this issue being opened where I suggested that images
> might be being lost because they were unprotected in the \Temp fiolder and
> were somehow being deleted.
> 
> This case is different BECAUSE THE TEMPORARY FILES WERE STILL IN THE
> TEMPORARY FOLDER EVEN THOUGH I WAS GETTING THE READ ERROR ERROR MESSAGE.
> 

Amazingly, it has happened to me again today.  Never before in 15 years - now twice in a week.  

I was editing a 3MB .odt file which is the A4 front cover of a parish magazine.  The cover has less than 20 words and a 3MB JPG photo covering most of the page.  I was doing a test layout and I reduced the image to about 3" x 2" and dragged it to top left and set wrap to Optimal.  It was and still is Anchored to a paragraph.  I made no other change.

I then went on the internet to find a Queen's Birthday logo to try it out as the front cover.  I returned to the document about 10 minutes later ... and the image was gone and I have the READ ERROR message.  I quickly took a copy of the temporary folder and error messages and uploaded it at https://www.dropbox.com/s/dc3p8d0pca75dyh/Lost%20image%20files.ZIP?dl=0

I still have the Writer file open if anyone wants me to do any more tests ...
Comment 19 Andreas Säger 2016-04-25 13:59:17 UTC
> (In reply to John from comment #13)
> https://www.dropbox.com/s/dc3p8d0pca75dyh/Lost%20image%20files.ZIP?dl=0
> 
> I still have the Writer file open if anyone wants me to do any more tests ...

sv3vku7z.tmp is a png picture
sv3vbkgv.tmp is a Writer document with that picture embedded
Comment 20 John 2016-04-25 14:14:32 UTC
(In reply to Andreas Säger from comment #19)
> 
> sv3vku7z.tmp is a png picture
> sv3vbkgv.tmp is a Writer document with that picture embedded

Andreas

I posted the temporary folder and its files as an aid to diagnosis.  As far as I can see, when I unzip sv3vbkgv.tmp and extract content.xml, the calls to the images are still there; and the image is in the pictures folder.  I think this is to be expected, because I think sv3vbkgv.tmp was written when I opened the .odt file, and sv3vku7z.tmp was created when the image was flushed from memory.

So it looks to me as though the image is being lost from "the document as held in memory".

I think that has serious implications as the loss of the image must therefore be being CAUSED by Writer.  This image loss is not due to an external factor - it is Writer which has lost the image.

Is there a way in which I can do a memory dump of the Writer file?  I still have the file open.  A memory dump would show what the "in memory file" contains.
Comment 21 John 2016-04-25 15:24:28 UTC
I still have the document open and I have taken memory dumps of the soffice.exe (53MB) and the soffice.bin (310MB) processes if these are any use.  TaskManager > r-click process > Create dump file.

I cannot do a normal memory dump of the PC as I need to reboot the PC after making the settings changes necessary to create a dump file.
Comment 22 orcmid 2016-04-25 17:22:37 UTC
(In reply to John from comment #21)
> I still have the document open and I have taken memory dumps of the
> soffice.exe (53MB) and the soffice.bin (310MB) processes if these are any
> use.  TaskManager > r-click process > Create dump file.
> 
> I cannot do a normal memory dump of the PC as I need to reboot the PC after
> making the settings changes necessary to create a dump file.

It is not necessary to have a full dump of the PC.  

The task memory from one of those processes may be very difficult to work with.

CONFIRMATION OF UNDERSTANDING

I want to confirm my understanding of the situation.

The files in the temporary directory still have whatever their file names would have been when embedded in the ODT file?

But on Save, the images are not in the new ODT and there are no references to them, not even unsatisfiable references?  That is, they are completely gone?

This could be a situation where defensive logic is silently dealing with some sort of lost or corrupted internal information, even an error indication.  The "Read Error" is what you get in the end.  (We also don't know how reliable that message is -- it could be an incorrect reporting of an underlying failure.)

One way to narrow this down is to obtain a "debug build" of Apache OpenOffice 4.1.2.  There may be some sort of invariant that fails that is caught by debug-mode checks that is being silently handled in the production release.  

There are no guarantees.  I suspect isolation of what is happening may be non-trivial.  There is no prediction on how simple or complex a remedy might be.
Comment 23 orcmid 2016-04-25 17:26:39 UTC
(In reply to orcmid from comment #15)

>  - - - - - - - - - - -
> 
> By the way, we are loading too many different cases into this meta-task.  It
> would be great to figure out how to differentiate them, use separate issues
> as well as we can, and link to them from this issue.  I suspect that ones
> like this one are very difficult to replicate, but it is definitely in a
> class by itself.

While it is convenient to tailgate specific cases onto this issue, it becomes extremely difficult to intermingle the different analyses and prospective resolutions.  We should look at introducing separate issues for specific cases (such as the disappearing images) which are clearly different than, e.g., finding that a saved file is not even a valid Zip (and there are different cases for those too).
Comment 24 roryof 2016-04-25 18:44:38 UTC
A similar image problem is being reported on LibreOffice
[url]https://bz.apache.org/ooo/show_bug.cgi?id=126869 -[/url]
Comment 25 John 2016-04-25 21:46:25 UTC
(In reply to orcmid from comment #22)

> CONFIRMATION OF UNDERSTANDING
> 
> I want to confirm my understanding of the situation.
> 
> The files in the temporary directory still have whatever their file names
> would have been when embedded in the ODT file?
> 
> But on Save, the images are not in the new ODT and there are no references
> to them, not even unsatisfiable references?  That is, they are completely
> gone?

Not quite correct.  The process was:

1  I opened cover.odt. [I assume] This created the folder C:\Users\John\AppData\Local\Temp\sv3vbk93.tmp.
  
2  [I assume] This created sv3vbkgv.tmp (3,056 kB) which is the complete cover.odt file just renamed to sv3vbkgv.tmp.  I can unzip this file and see the .odt file folders etc.

3  [I assume] Writer then flushed the 3MB image from memory and wrote sv3vku7z.tmp (3,033 kB).  This is the image - I can open it with an image editor. 

4  I shrunk the image, moved it, deleted a few words and then went on to the internet

5  I returned 10 minutes later and maximised the Writer window.  I saw the error message  and the image was gone.

6  I looked in the temporary folder and I saw the two files from (1) and (2) above, namely  sv3vbkgv.tmp and sv3vku7z.tmp.  

7  I still have the file open and I have not saved it yet.  But Writer is not displaying the image.

8  Because the image temporary file sv3vku7z.tmp is still in C:\Users\John\AppData\Local\Temp\sv3vbk93.tmp I think this means that the Writer "information held in memory" has already "lost contact" with the image.  The error message is not because Writer cannot see the image in the temp folder (which I what I postulated was the problem in my analysys which began this task).  The errror message is because "Writer in memory has already lost the image in memory".

9  If I now save the file, I predict (based on what happened last time) that Writer will save just the text, will not save any information  calling the image into the document, and therefore the saved .odt will have no reference to the image.  When the saved document is opened, Writer will therefore not display an error message because Writer cannot see any information relating to the image - there is no statement in the saved content.xml asking for the image.  Nor will there be a Pictures folder with the image file inside it in the .odt file.

I can be emailed directly on john.ha24-at-yahoo.co.uk and can talk by phone f that will help.

I apologise for reporting this on the metatask.  Has a separate issue been raised for image loss?  If so I will continue there and perhaps we can move th=ese reports there too.
Comment 26 John 2016-04-25 21:59:15 UTC
One more point.  I opened cover.odt at about 11:15am on 25 April.  The directory listing when I first looked in the folder at about 11:30am read:

sv3vbkgv.tmp  17/04/2016 21:43  3,056 kB
sv3vku7z.tmp  25/04/2016 11:24  3,033 kB

The directory listing at 22:52 now reads

sv3vbkgv.tmp  17/04/2016 21:43  3,056 kB
sv4azthv.tmp  25/04/2016 18:35  3,033 kB 

I have done nothing more to the file apart from scroll it occasionally.  Yet Writer seems to have updated the image file and changed the image file name at 18:35.

So, if "Writer in memory" has "lost contact with the image file", how can Writer write it out to the temporary file at 18:35? Or what else was Writer doing at 18:35 to cause the temporary image file to be renamed and given a new time?

I still have not saved the file - it is still open.
Comment 27 orcmid 2016-05-15 17:19:00 UTC
(In reply to John from comment #18)
> (In reply to John from comment #13)
> > 
> > I have just lost the images in a 1.5MB .odt file I was editing.  I wrote the
> > analysis which lead to this issue being opened where I suggested that images
> > might be being lost because they were unprotected in the \Temp fiolder and
> > were somehow being deleted.
> > 
> > This case is different BECAUSE THE TEMPORARY FILES WERE STILL IN THE
> > TEMPORARY FOLDER EVEN THOUGH I WAS GETTING THE READ ERROR ERROR MESSAGE.
> > 
[ ... ]

The extensive analysis of the specific case: images lost while working on an opened document has been reconstructed in Issue 126970 and should be continued there.

This is a very specific case different from the others around full-corruption/-loss of a document.  The deciding factor for the current Issue 12689 is that the document is completely unusable; that is, the document cannot be opened at all or if opened, there is no remainder of the original document whatsoever.  There are different flavors of this also worth distinguishing in separate issues.
Comment 28 orcmid 2016-05-30 01:25:24 UTC
Issue 126990 brings up the special case of a saved document that is found to contain only paragraphs of "#" characters.

It can be addressed separately there.  This is entirely different than cases where the file is damaged or otherwise unreadable.  The file of all "#" content in one or more paragraphs is completely valid and readable.  It just isn't what the user saw as the document being saved.
Comment 29 John 2016-05-30 04:38:05 UTC
(In reply to orcmid from comment #28)
> Issue 126990 brings up the special case of a saved document that is found to
> contain only paragraphs of "#" characters.
> 
> It can be addressed separately there.  This is entirely different than cases
> where the file is damaged or otherwise unreadable.  The file of all "#"
> content in one or more paragraphs is completely valid and readable.  It just
> isn't what the user saw as the document being saved.

A slight correction ...

A .odt file of all "#"' does not have any structure and cannot be unzipped.  

When the .odt is opened with a Hex editor, each and every character in the file is 00. I will upload such a .odt file of all "#"' at Issue 126990.

When the file is opened with Writer, Writer assumes it is must be a flat, ASCII TEXT file, and the opened document therefore consists of page after page of "#" characters.

The problem has the characteristics of "Writer reserves some space, names it as fred.odt, sets the space to zeros to delete existing data .... and Writer does not then write the correct data into the file.  The .odt file is then saved while being full of zeros."
Comment 30 orcmid 2016-05-30 16:09:50 UTC
(In reply to John from comment #29)
> (In reply to orcmid from comment #28)
> > Issue 126990 brings up the special case of a saved document that is found to
> > contain only paragraphs of "#" characters.
> > 
> > It can be addressed separately there.  This is entirely different than cases
> > where the file is damaged or otherwise unreadable.  The file of all "#"
> > content in one or more paragraphs is completely valid and readable.  It just
> > isn't what the user saw as the document being saved.
> 
> A slight correction ...
> 
> A .odt file of all "#"' does not have any structure and cannot be unzipped.  
> 
> When the .odt is opened with a Hex editor, each and every character in the
> file is 00. I will upload such a .odt file of all "#"' at Issue 126990.
> 
[ ... ]

I have personally verified files that open without difficulty and yet present paragraphs of all "#".  I inspected the .odt, which was a valid Zip, and valid ODF, and the content of each paragraph element was a lengthy run of "#" characters.

I have not, on Windows, ever seen a corrupted .odt that does not look like a Zip in any way shape or form open at all.  There might be something that triggers the ASCII filter (what kind of text file is this?) sort of thing, but I have not personally witnessed that with a file of all 00 bytes.  I am very interested in seeing such an attachment at Issue 126990.  Thank you.

I am not disputing the reported observation.  I am suggesting that there may be two different situations that have similar symptoms but are quite different.

It could even be the case that the file I inspected at the binary level was a fabrication or an incorrect save of a file that opened badly, so it was not the actual defective file.  

Only by gaining more information from users, preferably with the defective file attached, can we provide certainty.
Comment 31 oooforum (fr) 2017-03-02 08:57:23 UTC
*** Issue 127343 has been marked as a duplicate of this issue. ***
Comment 32 Keith N. McKenna 2019-06-05 22:52:11 UTC
*** Issue 128124 has been marked as a duplicate of this issue. ***
Comment 33 Arrigo Marchiori 2021-02-22 18:20:34 UTC
I am trying to understand what is happening "behind the scenes" of a save operation.

First interesting fact: if we "Save as" a text document, the green line disappears in method SwXMLWriter::_Write() at main/sw/source/filter/xml/wrtxml.cxx:468

At this point, the file is not yet created.

That method is called (two levels above) by method StgWriter::Write() file main/sw/source/filter/writer/writer.cxx:617
But nothing happens there after the write ends.

The file is finally generated by SfxObjectShell::SaveTo_Imp() after at main/sfx2/source/doc/objstor.cxx after line 1470 (several levels above in the stack trace).

If the file is "big enough", some time may pass between the moment in which the green line disappears, and the above method writes the actual file.
I am using a test document with a big image, resulting in an ODT file that is 12 megabytes big. This "hidden" step takes about 3 seconds.

The user is somewhat warned that the save operation is not completed yet, by the fact that the Writer window remains disabled. But this is not acceptable, is it?
Comment 34 John 2021-02-23 00:46:00 UTC
(In reply to Arrigo Marchiori from comment #33)
> 
> The user is somewhat warned that the save operation is not completed yet, by
> the fact that the Writer window remains disabled. 
A test shows that when a file is saved, Writer updates registrymodifications.xcu in the User Profile with the name of the saved file for the Recent documents list.

Does this writing to the profile occur while the Writer window remains disabled?  Or does this writing to the profile occur after the disable has been removed?  

If it occurs after the disable has been removed, a user could initiate a close Writer (because the window is now enabled) but before this data (and other data?) has been written.
Comment 35 Arrigo Marchiori 2021-02-23 18:17:39 UTC
(In reply to John from comment #34)
> (In reply to Arrigo Marchiori from comment #33)
> > 
> > The user is somewhat warned that the save operation is not completed yet, by
> > the fact that the Writer window remains disabled. 
> A test shows that when a file is saved, Writer updates
> registrymodifications.xcu in the User Profile with the name of the saved
> file for the Recent documents list.
> 
> Does this writing to the profile occur while the Writer window remains
> disabled?  Or does this writing to the profile occur after the disable has
> been removed?  

File registrymodifications.xcu is updated before the Writer windows is re-enabled.
Comment 36 Peter 2021-02-24 19:17:17 UTC
> But this is not acceptable, is it?
I agree this is not accept able. There should not be a hidden step.
Comment 37 Arrigo Marchiori 2021-02-25 20:12:11 UTC
I switched back to AOO419 because it's closer to what reporters used for... their reports.

If we "Save as" a Writer document, everything seems to get written into temporary files.

The method responsible for creating the destination file is
SfxMedium::TransactedTransferForFS_Impl() in file main/sfx2/source/doc/docfile.cxx:1794

Looking at trace messages emitted during its full invocation, it does:

 1- close a fd linked to a temporary file (probably containing the final document)

 2- reopen the same file in read-only mode

 3- open the destination file in write-mode

 [copy must happen here)

 4- close the destination file

 5- close the source file

The above algorithm seems to be fair and not prone to data corruption. Either something is done wrong inside it, or I am searching in the wrong place.
Comment 38 Arrigo Marchiori 2021-02-25 20:21:00 UTC
When saving a Writer document ("Save" instead of "Save as") the sequence seems to be a bit different:

 1- close a fd linked to a temporary file (probably containing the final document)

 2- reopen the same file in read-only mode

 3- open the destination file in write-mode

 3b- close the destination file

 3c- reopen the destination file

 4- close the destination file

 5- close the source file

It's strange that the destination file is opened, then closed, then reopened.
Comment 39 Arrigo Marchiori 2021-02-27 14:44:40 UTC
(still talking to myself ;-) from comment #38)
> When saving a Writer document ("Save" instead of "Save as") the sequence
> seems to be a bit different:
> 
>  1- close a fd linked to a temporary file (probably containing the final
> document)
> 
>  2- reopen the same file in read-only mode
> 
>  3- open the destination file in write-mode
> 
>  3b- close the destination file

This is a truncation!
It's probably done at main/sfx2/source/doc/docfile.cxx:1794
that is basically:
	aOriginalContent.setPropertyValue("Size", 0);

>  3c- reopen the destination file

File contents are (re)written here.

>  4- close the destination file
> 
>  5- close the source file
> 
> It's strange that the destination file is opened, then closed, then reopened.

This has now been solved: the first open-close cycle (steps 3-3b) is a truncation. But I found out something else: file is reopened _later_ for writing! It is a bit hard to debug what's going on; it will take some more time.
Comment 40 Arrigo Marchiori 2021-02-27 14:54:06 UTC
I did some tests on a Windows machine.

I opened a "heavy" (~16 MB) file on a USB pen drive.

I tried to "save as", "save" and "quit then reply yes when asked to save".

The outcome was always the same:

 - if I tried to close the window using the top-right "X" button, nothing happened until the _full_ save operation was completed;

 - if I tried to put the computer in sleep mode after the progress bar disappeared, still save operation was successful anyway;

 - if I unplugged the pen drive right after the progress bar disappeared, before the window was re-enabled, then the file was truncated (0 bytes). This is explained by my previous comment.

I was unsuccessful to find a cause for data corruption, but I have no doubts now, that the progress bar should follow the whole saving operation (in accordance with Peter's comment #36).
Comment 41 Arrigo Marchiori 2021-02-27 14:56:13 UTC
(In reply to myself from comment #40)
>  - if I unplugged the pen drive right after the progress bar disappeared,
> before the window was re-enabled, then the file was truncated (0 bytes).
> This is explained by my previous comment.

I forgot to mention that AOO complained loudly with error messages and finally did not consider the document saved. This is the expected behavior in such occasions, at least.
Comment 42 Peter 2021-02-27 21:20:53 UTC
related to comment 38: In line 1858 is maybe a trick to confirm if the file is really write able? I find it unlucky to abort the writing. Maybe it would be more clear to write a stream of one character.
And honest to ensure that the file is write able Is imho here is a wrong place.
We do not want to worry here if the file is write able or not.

I think the code mixes a lot of layers.
Comment 43 John 2021-02-28 18:40:31 UTC
Created attachment 87007 [details]
File written when PC is powered off during the Save

I can confirm AOO does not prevent a shutdown taking place while AOO is writing a file.  

I can confirm that AOO not issue a warning that AOO has not finished writing a  file it is saving. (4.1.9, Windows 10).

I used a USB attached diskette drive and a 1.4MB diskette so I could slow down the write process. I needed to format the diskette with a full, not quick, format before the tests would work.  I then saved a 700kB fred.odt (text only, no images) to the diskette. 

1.  The writing to the diskette only seems to start after the green bar has finished crossing the screen.  The diskette carries on clicking for tens of seconds until the entire file is written. During this time I was locked out of AOO and I could not close AOO with the X at top right in the Title Bar.

Conclusion:  This is safe as the file cannot be damaged but counter-intuitive that the green bar has stopped.

2.  I started a save and, while the file was being written, I went Start > Power > Shutdown.  I was expecting AOO to prevent the shutdown, or at least to give an error message that AOO was writing and asking if the shutdown should continue, but I got nothing and the PC powered OFF. 

When I powered ON, the file was showing in Explorer to be 400kB, not 700kB showing the save had not completed.  I opened it with 7-ZIP and it reported

Name                Size    Packed size

Configurations2      0         2
Thumbnails       5,021     5,021
content.xml          0         0
mimetype            49        39

Note that the total is only just over 5kB despite being reported as 400kB.  I was able to extract each file so the ZIP container was well formed.  I have uploaded this fred.odt file - note it reports as 400kB but is only about 5kB.

Conclusions: 

1.  Should AOO prevent the shutdown power off?  Should AOO give a warning to the user and prevent the shutdown?

2.  I do not know at what stage in the Save process I issued the shutdown command.  Note how a well formed ZIP file was created so this is not the "My file is full of #####" problem.  It suggests that I need to do more testing where I issue shutdown earlier to see if I can catch it.

Visual observation of File Explorer never showed the file to be 700kB in size suggesting that when a file is full of null characters it has some other cause. 

I will repeat the test with a laptop and investigate hibernate and sleep and earlier shutdowns, and possible differences between Save and Save As.
Comment 44 Arrigo Marchiori 2021-03-01 19:29:45 UTC
(In reply to John from comment #43)
> I used a USB attached diskette drive and a 1.4MB diskette so I could slow
> down the write process. 

That's a clever test!

> 2.  I started a save and, while the file was being written, I went Start >
> Power > Shutdown.  I was expecting AOO to prevent the shutdown, or at least
> to give an error message that AOO was writing and asking if the shutdown
> should continue, but I got nothing and the PC powered OFF. 
> 
> When I powered ON, the file was showing in Explorer to be 400kB, not 700kB
> showing the save had not completed.  I opened it with 7-ZIP and it reported
> 
> Name                Size    Packed size
> 
> Configurations2      0         2
> Thumbnails       5,021     5,021
> content.xml          0         0
> mimetype            49        39
> 
> Note that the total is only just over 5kB despite being reported as 400kB. 
> I was able to extract each file so the ZIP container was well formed.  I
> have uploaded this fred.odt file - note it reports as 400kB but is only
> about 5kB.

On my Linux system, the tools I have (including a command line version of 7-Zip) cannot open the file. They say it's truncated.

I have a question: when you restarted AOO, did it offer to recover the file?

> Conclusions: 
> 
> 1.  Should AOO prevent the shutdown power off?  Should AOO give a warning to
> the user and prevent the shutdown?

I think so.

It is also very important to know if AOO recorded that the file was "not saved sucessfully" and offered to recover it.

> 2.  I do not know at what stage in the Save process I issued the shutdown
> command.  Note how a well formed ZIP file was created so this is not the "My
> file is full of #####" problem.  It suggests that I need to do more testing
> where I issue shutdown earlier to see if I can catch it.

It was not well-formed according to:
 - UnZip 6.00 of 20 April 2009, by Info-ZIP
 - p7zip Version 16.02

I could have fiddled with the options to try recovering the data but I do not think it would lead to any interesting results.

> Visual observation of File Explorer never showed the file to be 700kB in
> size suggesting that when a file is full of null characters it has some
> other cause.

I also looked at its contents and they do not seem to be all NULL's.

Fun fact: it is 491520 bytes, i.e. _exactly_ 480 * 1024 bytes. The transfer was probably halted on a kilobyte boundary or something like that.

> I will repeat the test with a laptop and investigate hibernate and sleep and
> earlier shutdowns, and possible differences between Save and Save As.

Thank you!
Comment 45 Arrigo Marchiori 2021-03-01 20:58:11 UTC
(In reply to Peter from comment #42)
> related to comment 38: In line 1858 is maybe a trick to confirm if the file
> is really write able? 

I think it's quite possible!

Or, alternatively, the "overwrite" parameter is not "trusted".
In fact, output data is copied by calling:

   aOriginalContent.writeStream(aTempInput, bOverWrite);

where bOverWrite is true when saving and false when "saving as". 
Setting it to true it's useless, if the file was already truncated.

> I find it unlucky to abort the writing. Maybe it would
> be more clear to write a stream of one character.
> And honest to ensure that the file is write able Is imho here is a wrong
> place.
> We do not want to worry here if the file is write able or not.

Well, there is an interesting detail.

Before the truncation, variable bTransactStarted is set to true.
If the truncation fails, then probably an exception will be raised and caught below.
Then, the file will be restored from the backup.
But if the truncation failed, then restoring from the backup is also likely to fail...

Maybe it is not a check for write access, but rather a way to make sure the contents are overwritten and not appended?

> I think the code mixes a lot of layers.

I agree.
Comment 46 John 2021-03-02 15:08:32 UTC
(In reply to Arrigo Marchiori from comment #44)

I carefully repeated the test as below.

All tests use fred.odt.  Properties are

Size:  467kB (478,676 bytes)
Size on disk: 468kB (479,232 bytes)

Test_1

1.  Open fred.odt and make no edits
2.  Start > Power > Shutdown

Result:  PC shuts down without AOO giving fred.odt is open warning message.  This is as expected because no data is lost because the file has not been edited.

Test_2

3.  Open fred.odt and make an edit so it is not safe to shut down the PC as the edit is held only in memory
4.  Start > Power > Shutdown

Result:  PC displays message saying fred.odt is open in AOO and I can prevent the shutdown.  When I prevent the shutdown AOO has the "Do you want to save your changes?" pop up displayed.

Conclusion.  This is expected because AOO needs to protect the user data.  

Test_3

1.  Place fred.odt on A: diskette drive
2.  Double click fred.odt and make a few edits
3.  File > Save.
4.  Wait till green bar has stopped crossing screen.  
5.  Wait 2 or 3 seconds more, then Start > Power > Shutdown while file is still being written.

Result: PC shuts down without AOO giving fred.odt is open warning message.

Conclusion.  AOO should have prevented the shutdown but AOO did not.

6.  Power ON and go to diskette drive.  fred.odt is present but only 96kB.  .~lock.fred.odt~ is present.
7.  Double-click fred.odt.

Result: AOO opens with the Document recovery screen saying fred.odt is not recovered yet.

8.  Choose Start recovery

Result:  AOO gives error message "General Error.  General input/output error".

9.  Choose OK.  Gives Recovery failed.
10. Next. 

Result:  AOO says fred.odt is corrupt and offers to repair the file.  

11. Choose NO.

Result:  The file fred.odt could not be repaired and could not be opened.  

12.  Exit from answering questions.  .~lock.fred.odt~ is no longer present on diskette.

13.  Start AOO from desktop icon.  

Result:  Offered Document recovery pop-up window for fred.odt.

14.  Choose Cancel.

Result:  AOO opens cleanly to an empty document.

15.  Open fred.odt with 7-ZIP.  

Result:  fred.odt is a well formed ZIP container and I can extract Configurations2 and its sub-folders (0 bytes), Thumbnails and its png thumbnail (15,289 bytes), content.xml (0 bytes) and mimetype (39 bytes).  content.xml has no content. Properties says Size and Size on disk are both 96.0kB (98,304 bytes).  

Conclusion:

1  I waited 2 or 3 seconds after the green bar stopped crossing the screen which was enough for the file to be started to be written.

2  If AOO prevents the shutdown in Test_2 why does AOO not prevent the shutdown in Test_3?

A USB diskette drive is only 10 or 15 euros on ebay - it may be worth investing in one!
Comment 47 John 2021-03-02 15:36:05 UTC
A further point.  I did this earlier test (reported in my Four user problems paper) which suggests a two stage process in writing a file: first to a temporary location, then copy to the proper location.

1. File > Open > choose vanity_2.odt.  This is a 4MB .odt file containing only text (6 copies of Vanity Fair, 1.8 million words) which takes many seconds to save.

2. Type some text into the document

3. File > Save

4.  Wait a few seconds and kill the soffice.exe and soffice.bin processes (ie Writer) using TaskManager before Writer has finished writing the file.

5. Start Writer. Writer offers to recover the file vanity_2.odt. When it does so, vanity_2.odt does not have the new edits – it is as it was when opened at 1 above.

So, it appears that Writer writes the output file in a temporary store, and only if the file is written completely, does Writer delete the old one, and replace it with the new one.
Comment 48 John 2021-03-02 17:39:08 UTC
I have managed to get AOO to prevent shutdown by issuing the shutdown earlier.  

1.  Copy the text into a new, empty document.
2.  Click the Save icon (ie to simulate someone creating a new file and saving it)
3.  Wait until the green bar had finished crossing and then, as soon as possible after, issue the shutdown.  Previously I waited 2 or 3 seconds.

Result:  I was offered the "fred.odt is open in AOO - do you want to cancel?" popup and I was able to prevent shutdown.
Comment 49 Peter 2021-03-02 20:35:33 UTC
I find John finding interesting. This indicates that the shutdown prevention is maybe linked with the file modification indicator (or uses the same flag.
I speculate that the indicator is reset in the process of the save, probably at the beginning.
Maybe we should move it to the end, and have it query if a save is going on.
Comment 50 John 2021-03-05 12:44:06 UTC
See Comment 7 and especially Comment 8 in Issue 107558 - A hidden step while writing OOo files? 

It seems we need to be looking in \temp as well because files are first written to \temp before being written to hard disk.
Comment 51 John 2021-03-05 15:04:33 UTC
I monitored what is happening in \temp during a file open, an edit, and a Save.  

I set \temp to a 32GB USB memory stick to slow things down.  AOO would not open when I set \temp to the diskette drive nor to a very old, slower 128 MB USB memory stick.  I recorded with FreeCam.

1.  Open MyDocuments\fred.odt 468kB.

2.  AOO writes many temporary files to \temp, each deleted before the next is written, as below:
svc8appm.tmp   2kB
svc8asg5.tmp   1kB
svc8uaog.tmp   2kB
svc8awg9.tmp   2kB
svc8aypu.tmp   2kB
svc8b0wu.tmp 468kB timestamped 14:26

From previous testing I know svc8b0wu.tmp is an exact copy of fred.odt. 

3.  Make a few edits to fred.odt.  Nothing changes in \temp as expected.

4.  Click Save icon.

\temp has only svc8b0wu.tmp 468kB timestamped 14:26

AOO adds svc8bhyz.tmp   0kB

AOO adds svc8bk4o.tmp  36kB and then increases it to 1,805kB.  

AOO deletes svc8bhyz.tmp and svc8bk4o.tmp

AOO adds svc8bvu4.tmp 468kB timestamped 14:28

AOO deletes svc8b0wu.tmp.

\temp now has only svc8bvu4.tmp 468kB timestamped 14:28.  The file is deleted and AOO closes.  MyDocuments\fred.odt 468kB is now timestamped 14:28 suggesting it is a copy of svc8bvu4.tmp. 

I may have missed some files being created and deleted.  I tried to copy the 36kB file but I could not do it quickly enough and Windows said it had gone.

I wondered if svc8bk4o.tmp, which gets to be 3.5x bigger than fred.odt, could be where the nulls are written in cases where .odt files are full of null characters.
Comment 52 Arrigo Marchiori 2021-03-05 21:51:07 UTC
I started a Git branch in my own clone of the OpenOffice repository and I will work on that. Eventually, it will become a pull request.

For anyone interested:
https://github.com/ardovm/openoffice/tree/bettersave

So far I fixed the detection of problems arising from ZIP compression. This is the very first step... hopefully in the right direction.
Comment 53 Arrigo Marchiori 2021-03-19 19:36:13 UTC
First Windows build is out!
Please download your installer of choice from:
https://home.apache.org/~ardovm/openoffice/bettersave-2021-03-16/

This version is based on trunk, therefore you may see quite some differences with respect to 4.1.9.
Please concentrate on saving documents. This version should display another progress bar during the "disabled-only" phase. The text in the lower-left corner of the window is "..." while this additional progress bar is displayed.

 - can you see this progress bar?

 - does this version inhibit Windows from shutting down while saving?

Thank you in advance to whom will help testing it!
Comment 54 Pedro 2021-03-19 22:15:25 UTC
(In reply to Arrigo Marchiori from comment #53)

> Please concentrate on saving documents. This version should display another
> progress bar during the "disabled-only" phase. The text in the lower-left
> corner of the window is "..." while this additional progress bar is
> displayed.
> 
>  - can you see this progress bar?
> 
>  - does this version inhibit Windows from shutting down while saving?

Can you share as Attachments to this issue the heavy files you are using?

Are these files large enough to show the issues you mentioned or is it really necessary a USB floppy drive?
Comment 55 John 2021-03-20 09:40:13 UTC
I'll test it on my diskette drive today.

Another way of slowing I/O is to connect a 4-way USB hub to a USB port.  Put 3 x USB memory sticks in 3 slots and transfer large files to them to use I/O bandwidth.

Let AOO use a memory stick in the 4th slot.
Comment 56 John 2021-03-20 10:20:25 UTC
You can download Vanity Fair from Project Gutenburg at https://www.gutenberg.org/ebooks/599.  A single copy is 700,000 words and gives a 700.odt file.
Comment 57 John 2021-03-20 11:07:44 UTC
Test 4.5_1

1.  Open fred.odt on the diskette drive - it is 468kB
2.  Make an edit.
3.  Click the Save icon.

Result

1.  Saving document appears and green bar crosses screen
2.  Green bar stops.  Everything disappears from bottom bar except the three layout icons (single page/double page/book)
3.  File starts being written to diskette and AOO is unresponsive
4.  After many seconds, file stops being written.
5.  Everything re-appears in the bottom bar (Page 1/346 / Default/ English (UK) / INSRT / STD / layout icons / zoom bar / zoom percentage

Conclusion:  No " ... " progress bar is displayed

Test 4.5_2

1.  Open fred.odt on the diskette drive - it is 468kB
2.  Make an edit.
3.  Click the Save icon.
4.  Issue Shutdown i) before; or ii) while while green bar is crossing screen

Result

1.  Pop-up window says fred.odt is open in AOO and I can cancel shutdown
3.  When Windows comes back fred.odt has a "Do you want to save?" pop-up window.

Conclusion:  AOO prevents shutdown when shutdown is initiated early in save cycle

Test 4.5_3

1.  Open fred.odt on the diskette drive - it is 468kB
2.  Make an edit.
3.  Click the Save icon.
4.  Wait until the green bar has finished crossing screen
5.  Diskette clicks showing file is being written to disk.
6.  Wait 2-3 seconds more
7.  Issue Shutdown

Result

1.  AOO does not prevent shutdown.
2.  ~lock file is present on diskette drive
3.  fred.odt is shown as 224kB.
4.  Open fred.odt with 7-ZIP.  It is a properly formed ZIP file which reports as:
 
Name                Size    Packed size

Configurations2      0         2
Thumbnails       15,002     15,002
content.xml          0         0
mimetype            39        39

Note the displayed file size 224kB is much larger than the ZIP shows.  
5.  Open fred.odt in HeX Editor shows it is  229,360 bytes in length (00037FF0).  This suggests that content.xml is being written as expected and, because it has not completed, 7-ZIP reports content.xml as being zero length.  Note the document fred contains only flat text with no images, tables etc.
6.  Double-click fred.odt to open it
7.  AOO brings up the recovery screen and offers recovery.  It fails.

Conclusion.  AOO does not prevent shutdown if shutdown is initiated later in the save cycle.
Comment 58 Pedro 2021-03-20 11:47:14 UTC
Thank you for the link @JohnHa

I think your extremely thorough testing and reporting is the ultimate answer needed for this bug at this point!
It is notable that you are available to contribute to fix this bug nearly 7 years after your first answer!
Kudos!
Comment 59 Arrigo Marchiori 2021-03-20 18:26:58 UTC
In my humble opinion, one of the reasons that make working for Apache Openoffice so rewarding, is that there are lots of people supporting, directly or indirectly, your work. Bug reports such as this are a good example of how free software is supposed to work, community-wise.

I add my thank you to Pedro's one, and... acknowledge that there is still work to do. I will keep you updated.
Comment 60 John 2021-03-22 10:29:21 UTC
This is a screen video of saving a 4MB to SSD.  It is six copies of Vanity Fair (1.83 million words, 4MB .odt file).

Test_4.5_4 - Saving 4MB to SSD.

1.  Open file and make an edit.  
2.  Click Save icon.

The first time this is done there is a long delay ~ 10 seconds before the green bar starts crossing.  

If I then make a second edit and click the Save icon, the delay is much smaller - see video at https://www.dropbox.com/s/nkku2bfkl7qzrb2/Saving%204MB%20file.wmv?dl=0

I wondered if the delay was due to pagination so I repeated the test with delays as below.  The file has only text - no images, footnotes or endnotes.

1.  Open file, make an edit
2.  Page Preview.  I get Pagination at bottom left and the green bar.  It takes several seconds to paginate.
3.  Wait 30 minutes until AOO CPU has dropped to approx 0%
3.  Click Save icon.

There is still the long delay before the first save starts.  If I then make a second edit/save, the save begins immediately.
Comment 61 John 2021-03-25 10:42:08 UTC
We have another "my file is full of ####" on the forum - see Another HashTagged Document at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=104869.  I have asked him to give me full details of how it happened. 

We cannot explain any scenario in which these "files full of nulls" can be created but they do get created so our understanding is incomplete. 

A normal file write is something like:

1 AOO clears some memory and fills it full of nulls in preparation for creating and writing fred.odt (line 50 in deflater?).

2 AOO opens fred.odt on the disk ready to receive the file.  At this stage fred.odt is zero bytes long.

3 AOO uses the cleared memory as working space and creates the zipped content in the memory

4 AOO copies the created zipped content from memory into fred.odt.

Now, say the user issues a shutdown after 2 but before 3. We now know AOO does not prevent the shutdown so the PC starts shutting down.

What happens during the shutdown?

Could the OS kill an AOO process and so cause AOO to miss step 3 and jump straight from 2 to 4 and so write the nulls to fred.odt before shutdown actually happens?   

Could the OS somehow take over and try to protect fred.odt which the OS knows is open for writing? Does the OS quickly write the contents of memory - all nulls - into fred.odt thinking it will save the file?

Could AOO have a routine which steps in and tries to protect the file and quickly writes the content of memory - all nulls - to fred.odt?
Comment 62 John 2021-03-25 14:00:04 UTC
(In reply to John from comment #61)

I have done a quick search of the forum and I am now convinced that "files full of ####" result from AOO or the PC hanging or crashing.

See Issue 107847 - File content changed to hashes at crash or power loss which suggests it was introduced at v3.0.

I collected these where users described how it occurred:

https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=92880

My computer crashed due to no electricity (without battery) but I thought saving the file was completed before the crash. I saved the edited odt more than once during the writing process but I could not find any previous versions. As I restarted the computer and tried to open the odt it was corrupted.

https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=104676

I opened it and had been working on it for few hours, saving it every 10 minutes or so, when my computer froze and showed a grey screen. As this hadn't shifted despite my best efforts I had to do a forced shut down after about half an hour. When I restarted the computer it was all fine apart from the document I had open on the screen where the text had been replaced by ######
If it helps then the file name was just Writing

https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=100754

while I was playing a video my lap just outright wnent nuts, even though the battery was fully charged, and my screen froze and I was unable to coninue playing the video. However, at the moment I was just saving this document of around 40 pages to my OneDrive cloud and I had to reboot my entire computer.
As I turned my laptop on again I openede immediatly Open Office, and weirdly enough it asked me to register my program online and choose a language for my program, and as I searched thourg recently opened files I found nothing, and now I have even trouble to efficiently run Word or Open Office.
And my document as I open it, it just reads "########", there are no words on it. 

https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=92880

My computer crashed due to no electricity (without battery) but I thought saving the file was completed before the crash. I saved the edited odt more than once during the writing process but I could not find any previous versions. As I restarted the computer and tried to open the odt it was corrupted.

https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=90475

... three OO Writer documents ... created and saved on C: disk. ... I saved the updates (waited till savings are finished) but unfortunately have not closed all documents. ... I put my notebook in Standby status [overnight] to continue early morning finishing the documents. I just selected Standby and closed Laptop.  Today early morning I opened laptop (screen) but could not start the notebook. 

[ie PC shut down overnight]

So I had to shut down and newly start. ... as I tried to open the first document (I worked for 2 days on) it again opened with window message "ASCII Filter Options" and as I clicked further the documents was full of hashtags. Other 2 documents from yesterday I could open and they are Ok. Only the one document which is most important is not ok.
Comment 63 oooforum (fr) 2021-03-25 16:03:53 UTC
(In reply to John from comment #62)
> I have done a quick search of the forum and I am now convinced that "files
> full of ####" result from AOO or the PC hanging or crashing.
When the document is in this state, it's too late. 

To prevent these cases, I had submitted a process for a better save (see report #111290 )
Comment 64 Peter 2021-10-07 21:20:35 UTC
Just to pick the case up again.
How about we look at the issue from the other end?

What happens when AOO is issued a shutdown? -> I searched for the Word and I found something, which looks like something that coulöd be the entry point.
-> http://opengrok.openoffice.org/xref/aoo42x/main/vcl/unx/generic/app/sm.cxx?r=c82f2877
When a Shutdown is issued it looks like each document instance gets a SalSessionSaveRequestEvent. If no Instance is left it is moving into savedone.

This seems to trigger callSaveRequested in the Session.
Then all listeners are checked for reporting back that they are done.
A SessionListener is established by addSessionManagerListener.

--
I hope to find the events on document level at some point. See https://wiki.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/Document_Events

When I was looking for those events it was not obvious where they are set or called.

Lets see if this leads somewhere. Since I my time is very spontaneous, please feel free to continue on the thoughts. I don't mind. At the same time I try to make some more time to venture forth.
Comment 65 Arrigo Marchiori 2021-10-08 19:23:29 UTC
(In reply to Peter from comment #64)
> Just to pick the case up again.

Wonderful! Thank you for your help!

> How about we look at the issue from the other end?
> 
> What happens when AOO is issued a shutdown? -> I searched for the Word and I
> found something, which looks like something that coulöd be the entry point.
> ->
> http://opengrok.openoffice.org/xref/aoo42x/main/vcl/unx/generic/app/sm.
> cxx?r=c82f2877
> When a Shutdown is issued it looks like each document instance gets a
> SalSessionSaveRequestEvent. If no Instance is left it is moving into
> savedone.

I am afraid the file you posted may only refer to the Unix world because its path contains "unx".

If we want to keep looking at the same end (instead of the other one :-) we can look for "WM_ENDSESSION" and "WM_QUERYENDSESSION" that are the messages sent by Windows to programs at shutdown time.

OpenGrok leads us here:
http://opengrok.openoffice.org/openoffice/xref/aoo41x/main/vcl/win/source/window/salframe.cxx#6113

> I hope to find the events on document level at some point. See
> https://wiki.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/
> Document_Events

Uh, that is interesting!

> When I was looking for those events it was not obvious where they are set or
> called.

I agree.

> Lets see if this leads somewhere. Since I my time is very spontaneous,
> please feel free to continue on the thoughts. I don't mind. At the same time
> I try to make some more time to venture forth.

I hope the above helps!
Comment 66 Peter 2021-10-10 11:23:04 UTC
Thanks the information on Windows. I did not look on the OS dependency.
And on first glance we should get to the same position fairly quickly. The WindowsCode issues a SALCALLBACK_SHUTDOWN, which I suspect ends up in the chain in unx code follows.

However I am not sure on Windows behavior.
This code what does it do? Does it stop the main thread for a while before moving on?

6105  			else
6106  			{
6107  				ImplSalYieldMutexAcquireWithWait();
6108  				ImplSalYieldMutexRelease();
6109  				rDef = TRUE;
6110  			}

I am still looking at: http://opengrok.openoffice.org/openoffice/xref/aoo41x/main/vcl/win/source/window/salframe.cxx#6113
Comment 67 Arrigo Marchiori 2021-10-10 13:51:55 UTC
(In reply to Peter from comment #66)
> Thanks the information on Windows. I did not look on the OS dependency.
> And on first glance we should get to the same position fairly quickly. The
> WindowsCode issues a SALCALLBACK_SHUTDOWN, which I suspect ends up in the
> chain in unx code follows.

Hopefully... sorry I never looked into this.

> However I am not sure on Windows behavior.
> This code what does it do? Does it stop the main thread for a while before
> moving on?
> 
> 6105  			else
> 6106  			{
> 6107  				ImplSalYieldMutexAcquireWithWait();
> 6108  				ImplSalYieldMutexRelease();
> 6109  				rDef = TRUE;
> 6110  			}
> 
> I am still looking at:
> http://opengrok.openoffice.org/openoffice/xref/aoo41x/main/vcl/win/source/
> window/salframe.cxx#6113

Those two function calls are the same as the beginning and end of  function ImplHandleShutDownMsg(), called a few lines above the snippet.
I understand they acquire and release a mutex. So the snippet only waits, if someone else has acquired the mutex at that time.

I hope this helps.