Issue 126990

Summary: File saved normally then opened and filled with #
Product: Writer Reporter: tinaconroy0718
Component: save-exportAssignee: AOO issues mailing list <issues>
Status: CONFIRMED --- QA Contact:
Severity: Critical    
Priority: P5 (lowest) CC: john.ha24, oooforum, petko, tl.valladares53
Version: 4.1.2   
Target Milestone: ---   
Hardware: PC   
OS: Windows 8, 8.1   
Issue Type: DEFECT Latest Confirmation in: ---
Developer Difficulty: ---
Attachments:
Description Flags
An example .odt file which opens as "full of ######"
none
Some examples of damaged files - all zeros, garbage and a mixture none

Description tinaconroy0718 2016-05-29 22:40:41 UTC
I saved a file not 2 hours ago and when I opened it again the format was wrong and all my text was #. The whole document just #########. This has happened before. I do not want to rewrite it all again. Is there a way to recover the text I had before?
Comment 1 orcmid 2016-05-30 01:23:10 UTC
(In reply to tinaconroy0718 from comment #0)
> I saved a file not 2 hours ago and when I opened it again the format was
> wrong and all my text was #. The whole document just #########. This has
> happened before. I do not want to rewrite it all again. Is there a way to
> recover the text I had before?

Generally, no.

When this happens, however it happens, that is really the content of the file and that is all there is.

The best precaution is to not save over the previous copy but save with a new name (put a date in the name or use a sequence number).  Then you at least can fall back to the one you made the failed one from.  This precaution works for a number of other problems as well.

If you want, you can upload the file as an attachment here, and we can inspect it to confirm whether there is recoverable content.

This is the first of the cases identified in Issue 126846.

I am extracting the essential information here so we have an identified issue for this individual case.  I failed to find an existing separate issue about it.

"Hagar Delest has carefully listed the posts where users have lost data at 22 pages term paper replaced with pound signs, where he has collected over two hundred (224 to date) cases."  That is at https://forum.openoffice.org/en/forum/viewtopic.php?f=6&t=17677

The forum topic includes some cases beside the "#" case.  This issue is for tracking the "#" issue only.
Comment 2 John 2016-05-30 05:06:35 UTC
Created attachment 85563 [details]
An example .odt file which opens as "full of ######"

This file is taken from https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=420#p372812.

Notes:

1  It is a .odt file, but it is not a zip file, and it has no internal structure (no content.xml, manifest.rdf etc).

2  When the file is opened with a Hex editor, it is 27,605 Bytes, and each byte is zero.

3  When the file is opened by Writer, Writer assumes it must be a flat, ASCII TEXT file.  Writer brings up the ASCII Filter Options pop-up.  The document then appears with 9,999 x "#" as word 1, a paragraph return; 9,999 x "#" as word 2, a paragraph return, and the remaining "#" as word 3. Presumably Writer has a 9,999 character limit on a word and adds the paragraph return.

4  The fault seems to have the characteristics of Writer reserving some space, naming that space postcol literature II.odt, setting the space to all zeros ... and then failing to write the correct data to the file.  The file content is therefore all zeros.  Does this occur because Writer was somehow prevented from completing the write?  Could shutting a laptop lid too quickly cause this?

5  There are numerous issues relating to saving files across networks, where the slow speed of the network highlights problems.  See Issue 107558 - A hidden step while writing OOo files? which reports that AOO continues to do saving AFTER the bar stops moving across the bottom of the screen.  Could it be that users think that the save is completed when the bar stops moving, and slam the laptop lid shut, whereas the save has not completed?

See also Issue 104661 - Saving to file should take place in a process independent of the GUI 

Some form of atomic save is needed where the save can be guaranteed.
Comment 3 orcmid 2016-05-30 16:26:29 UTC
(In reply to John from comment #2)
> Created attachment 85563 [details]
> An example .odt file which opens as "full of ######"
> 
> This file is taken from
> https://forum.openoffice.org/en/forum/viewtopic.
> php?f=7&t=1532&start=420#p372812.
> 
> Notes:
> 
> 1  It is a .odt file, but it is not a zip file, and it has no internal
> structure (no content.xml, manifest.rdf etc).
> 
> 2  When the file is opened with a Hex editor, it is 27,605 Bytes, and each
> byte is zero.
> 
> 3  When the file is opened by Writer, Writer assumes it must be a flat,
> ASCII TEXT file.  Writer brings up the ASCII Filter Options pop-up.  The
> document then appears with 9,999 x "#" as word 1, a paragraph return; 9,999
> x "#" as word 2, a paragraph return, and the remaining "#" as word 3.
> Presumably Writer has a 9,999 character limit on a word and adds the
> paragraph return.
[ ... ]

I confirm the behavior with the example file.  The particular file triggers the plaintext filter.  If the file is opened, it will be presented as paragraphs having runs of "#" characters.  (I assume, in this case, the hex 00 bytes are interpreted as unknown or inadmissable characters and "#" is used to indicate them.)

I confirm that the file consists of 27,605 null (hex 00) bytes.

What we need to know from Tina, who has had this experience more than once, is 

 1. When a previously-saved file was opened for further work, and it showed as all "####", did the plaintext filter show up first?  Did she click OK and then see the all "#" document?

 2. What can Tina report about the conditions under which the document was saved and later failed to open correctly?

 3. Can Tina upload an attachment of the file that opened that way for her, exactly as it was when she tried to open it (not after accepting the plaintext filter).

And, either way, Tina's document will not be recoverable.  But the evidence she can provide may help us to eliminate or mitigate what the cause might be.
Comment 4 John 2016-05-31 10:49:26 UTC
Created attachment 85564 [details]
Some examples of damaged files - all zeros, garbage and a mixture

See the forum post 22 page term paper replaced with pound signs which is at https://forum.openoffice.org/en/forum/viewtopic.php?f=6&t=17677#p81363.  You will see well over 200 cases of "My document is all #####" reports, many with uploaded damaged files.  I have identified a few below:

These reports have each uploaded files which are full of zeros:

1.  Retrieving document at https://forum.openoffice.org/en/forum/viewtopic.php?f=6&t=17690
2.  File now contains nothing but # characters at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=18573
3.  Character Set issue opening an .ODT document at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=23463
4.  Problem with file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=25534
5.  .odt corrupted at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=25838
There are many more ...

These reports have each uploaded a damaged file:

1.  Re: ASCII Filter? Help me save my doc!!! at https://forum.openoffice.org/en/forum/viewtopic.php?f=6&t=25503#p265546  The file appears to be a valid .odt when you unzip it, but the file is damaged

2. Re: 22 pages term paper replaced with pound signs at https://forum.openoffice.org/en/forum/viewtopic.php?f=6&t=17677&start=30#p275921 points to a file at http://www.mediafire.com/download/x75pb ... s_copy.ods which is partially full of zeros and partially full of garbage.
There are many more ...

See the forum post [Hint] How did I fix my ODT file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532 (viewed 140,000 times) which has many, many corrupted files which have been uploaded.  Many have been analysed by forum posters.  

A small collection include:

1 Re: [Hint] How did I fix my ODT file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=390#p372337 uploaded a file clinical opthalmolog1.odt which appears to be a valid odt file, but is full of zeros after FF0 (4,080) bytes.  Is the 2^n significant?

2 Re: [Hint] How did I fix my ODT file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=420#p372836 has upl;oaded $R67BQ9D.odt which starts with readable text which looks like a Firefox crash report, then is full of zeros, then has binary data and then ends with zeros.

3 Re: [Hint] How did I fix my ODT file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=330#p354771 Both dz.odt and mc.t.odt start off looking like a valid PK zip files, but then just end - the files are corrupted.

4 Re: [Hint] How did I fix my ODT file at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=330#p357120 uploaded the water door.odt.  The file is full of garbage - it looks like a dump of memory.

I have uploaded a ZIP file with examples of these damaged files.

There are also many reports, with uploaded files, where the file is perfect ... but the XML tags in content.xml are incorrect.  acknak is often able to correct the XML errors manually and thus recover the file.  For example, see https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=360#p357769 for currupted tags, which refers to Issue 126219: invalid xml on saving document with comment/annotation
Comment 5 Tania Valladares 2017-03-06 00:03:18 UTC
 I am trying to recover a document in which the text has been replaced with #. It is extremely important. Can you try to see the document as I understand there's nothing I can do at this point. Can anyone try to retrieve it if I bring it in somewhere? Thank you. You should really put a warning  on the Apache open office website. I've lost a lot of material that is very valuable to me.
Comment 6 John 2017-03-06 08:01:31 UTC
I am sorry but there is nothing which can be done because the file is full of zeros - there is (literally) nothing in it.  For some reason the file was not saved. Search the forum with #### and you will find a number of posts.

See [url=https://forum.openoffice.org/en/forum/viewtopic.php?f=71&t=85038][Tutorial] How to find and un-delete Writer temporary files[/url] for instructions on how to identify and un-delete the temporary files Writer wrote while you were editing the file, and then deleted.  You should be able to recover all or most of the file.
Comment 7 oooforum (fr) 2019-06-07 06:59:21 UTC
This issue must be maintained opened?
We know that #### content is equal to a lost document.
The problem is to reproduce the process which corrupt a file.
Comment 8 John 2019-06-30 22:25:45 UTC
Yes.  There are hundreds and hundreds of cases on the forum.

I strongly suspect that the problem arises when AOO mishandles a hibernate or sleep interrupt when it is in the process of saving a file.

I further suspect that the time at which the interrupt arrives is critical - at some times it is handled at others it is not. 

I did some testing for Patricia about two years ago which seemed to support this suggestion.
Comment 9 John 2020-12-08 23:52:39 UTC
As

a) this bug is confirmed, 
b) there are hundreds of reports of it in the forum,
c) it causes complete data loss and nothing can be recovered from the file (the # are displayed because the file is full of NULL characters)

I think it should be classified as CRITICAL.

See https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=1532&start=660#p479545 where I suggest the NULLs are written in Line 52 of DEFLATOR.  If the PC issues a SHUTDOWN after line 52 and AOO does not prevent the shutdown, then the file would presumably be full of NULLs.

44    Deflater::~Deflater(void)
45    {
46            end(); 
47    }
48    void Deflater::init (sal_Int32 nLevelArg, sal_Int32 nStrategyArg, sal_Bool bNowrap)
49    {
50            pStream = new z_stream;
51            /* Memset it to 0...sets zalloc/zfree/opaque to NULL */
52            memset (pStream, 0, sizeof(*pStream));
53    
54            switch (deflateInit2(pStream, nLevelArg, Z_DEFLATED, bNowrap? -MAX_WBITS : MAX_WBITS,

See Why is my file full of #####? at https://forum.openoffice.org/en/forum/viewtopic.php?f=71&t=85038#p493247 for a discussion.
Comment 10 Peter 2020-12-09 09:12:54 UTC
I set the Importance to critical, since we have to look into this.
This issue has been the "Most Valued Bug" for some time now. And we should not forget to solve it.
Comment 11 oooforum (fr) 2020-12-09 11:44:14 UTC
(In reply to John from comment #9)
> where I suggest the NULLs are written in
> Line 52 of DEFLATOR.  
If you have programming skill, you can submit a PR on Github with this fix:
https://github.com/apache/openoffice
Comment 12 John 2020-12-09 14:51:45 UTC
Unfortunately I don't have sufficient programming skills :-(

I recently assisted a user with a corrupted .ods file which I think resulted from the same "shut down before writing is completed" cause.  

In his case (see https://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=103810#p502547) he had lots of graphs and they were all missing.  Unzipping the .ods showed that the Object 1 through Object 189 folders were present, so were content.xml. manifest.rdf, meta.xml, mimetype and styles.xml.  However, folders Configurations-2, META-INF, ObjectReplacements and Thumbnails were missing.

It would support my theory if the missing folders were written after that data which was written. 

The file is 2MB so I cannot upload it here but it is still available in the forum thread.
Comment 13 John 2020-12-09 15:00:56 UTC
Also, for someone with programming skills, a simple test would probably be

1.  Add an infinite loop a few lines after line 52 so that AOO loops during the file write

2.  Issue a shutdown by closing the laptop lid.

Expected behaviour.  AOO will prevent the shutdown because the file has not been written

Probable behaviour.  AOO will not attempt to prevent the shutdown
Comment 14 John 2020-12-09 15:02:48 UTC
Also, for someone with programming skills, a simple test would probably be

1.  Add an infinite loop a few lines after line 52 so that AOO loops during the file write

2.  Wait until AOO is looping during the file write

3.  Issue a shutdown by closing the laptop lid.

Expected behaviour.  AOO will prevent the shutdown because the file has not been written

Probable behaviour.  AOO will not attempt to prevent the shutdown
Comment 15 John 2021-03-05 12:31:32 UTC
See Comment 46 in Issue 126869 - Analysis Task: Lost/Corrupted Documents after Save/Shutdown where I shut down the PC (Start > Power > Shutdown) a few seconds after the green bar had finished crossing the screen.  The file was still being written (I used a slow diskette drive) and AOO did not prevent the shutdown as expected.

See Comment 48 in Issue 126869 where I issued the shutdown as soon as possible after the green bar had finished crossing the screen (ie a few seconds earlier).  AOO now prevented the shutdown taking place and the shutdown screen offered a pop-up with "fred.odt is open in AOO - do you want to cancel?" and I was able to prevent shutdown. When I cancelled shutdown AOO was displaying a "Do you want to save your changes?" pop-up.    

Conclusion:  AOO mishandles? ignores? a Windows interrupt saying the PC is being shutdown.
Comment 16 John 2021-03-06 14:06:57 UTC
See "Text in document transformed to #####" at https://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=104676#p507553 where a user describes exactly what happened to cause a .odt file to become full of #####.

> The piece of writing that I have lost is an OpenDocument Text [ie a .odt file].
>
> I opened it and had been working on it for few hours, saving it 
> every 10 minutes or so, when my computer froze and showed a grey screen.
> As this hadn't shifted despite my best efforts I had to do a forced shut
> down after about half an hour.
> 
> When I restarted the computer it was all fine apart from the document
> I had open on the screen where the text had been replaced by ######
> 
> [ie - when AOO opened ...\fred.odt, the file displayed as #####
> which means ...\fred.odt was a flat file (not a ZIP container) full
> of null characters.  Inspection of ...\fred.odt uploaded to the forum
> shows fred.odt is full of null characters 

As I understand it, when AOO edits fred.odt:

1.  AOO copies ...\fred.odt to a temporary file in ...\Temp.  

2.  AOO marks ...\fred.odt as "in use".  If I send ...\fred.odt to 7-ZIP I get a 7-ZIP error message "The process cannot access the file it is being used by another process".  However, I can copy the file and I can send the file to Notepad++ where it opens.

3. All user changes are held in memory until the file is saved.  ...\fred.odt is thus never touched until a Save is done.

4.  When a Save is done, ...\fred.odt is saved as a proper .odt file.

As the user saved the document I would expect ...\fred.odt to be a proper .odt file containing the document exactly as it was when the document being edited was last saved. 

So why is ...\fred.odt a flat file full of nulls when the PC is restarted? 

Could it be that AOO was writing a Save when the PC froze - indeed, AOO probably caused the freeze.  In this case, I would expect ...\fred.odt to be as it was when the PC froze and this is why it is full of nulls.

So, is there a stage during the file write process when ...\fred.odt is set to be full of nulls?  Or some Windows process that kicks in as a freeze happens which fills the file full of nulls?