Apache OpenOffice (AOO) Bugzilla – Issue 54586
Cannot open files via NFS on mis-configured Linux machines
Last modified: 2011-02-13 14:56:47 UTC
Since issue 29425 is fixed, file locking is enabled in OOo. On some Linux machines, the attempt to lock a file accessed via NFS results in an error (ENOLCK), because the machine is mis-configured in that an NFS lock deamon is not running (ENOLCK can also occur, whether or not a file is accessed via NFS, in the---somewhat unlikely---event that "satisfying the lock or unlock request would result in the number of locked regions in the system exceeding a system-imposed limit" [susv3]). In OOo, this leads to a cryptic error box "General input/output error while accessing ..." and failure to open a file via NFS. There are at least five choices to handle this problem: 1 Disable file locking again. See issue 29425 for why it was enabled (involved developer: obr). 2 Blame the mis-configured machines and do nothing (no developer involved). 3 Handle ENOLCK in the OOo code by presenting a more appropriate error message (possibly including a hint that the problem might be a mis-configured Linux machine), but still refusing to open the file (involved developer: abi). 4 Handle ENOLCK in the OOo code by presenting a warning message that file locking did not work for some reason, and opening the file read-only (involved developer: abi). 5 Handle ENOLCK in the OOo code by presenting a warning message that file locking did not work for some reason, and opening the file read/write, but unlocked (involved developers: abi, obr). In each case, an additional step might be to document the problem of mis-configured Linux machines in the readme or similar.
*** Issue 53682 has been marked as a duplicate of this issue. ***
AFAIK file-locking is also needed by the storage implementation and the GNOME .recently-used functionality. However, these affect only the users $HOME and the OOo program directory - if both are on the local disk, one still needs no locking daemon. Given that a NFS locking has multiple problems (e.g. the lock is removed when the file get's opened a second time by the same process), my preferred approach would be to: 6. Change the document handling not to use locking, but to compare the access time of the file before overwriting it with the access time when the file was opened. In case these don't match, tell the user the file has changed on disk and ask whether (s)he would like to overwrite the file anyway, pick a new name for it or cancel the save operation.
@obr: - "e.g. the lock is removed when the file get's opened a second time by the same process" Correct me if I am wrong, but as far as I know, locks are removed as soon as any file descriptor on the same file is closed, and that is a general design "feature" of fcntl(F_SETLK), irrespective of NFS. - "compare the access time of the file before overwriting it with the access time when the file was opened" Compare-and-overwrite would have to be an atomic operation then, which it is not. Thus, your proposed approach #6 IMO gives a false sense of security.
> because the machine is mis-configured in that an NFS lock deamon is not running Should simply nlockmgr running fix this? I restarted my NFS, and now nlockmgr is running on the NFS client and server (reported via rpcinfo -p hostname). However, I still get "General input/output error.." when open files via NFS with OOo 2.0 rc1 on Fedora Core 3.
I don't know what nlockmgr actually does, but the NFS lock daemon is usually started by a script named nfslock and shows up in the process list as [lockd].
@sb: hmm, strange: I thought I'd replied to your latest comments, but there is nothing in the issue :(. So here we go again: - "Correct me if I am wrong, but as far as I know, locks are removed as soon as any file descriptor on the same file is closed, and that is a general design "feature" of fcntl(F_SETLK), irrespective of NFS." I don't remember the details (so you are probably right on them). However, the result is the same - we actively need to make sure that a file get's opened only once at a time. - "Compare-and-overwrite would have to be an atomic operation then, which it is not. Thus, your proposed approach #6 IMO gives a false sense of security." The point I was trying to make is that the feedback IMHO needs to arise at the time the user _saves_ the document, not when (s)he opens it (which potentially could be days earlier). To make this more atomic, the algorithm could work like: save operation: try locking the file and warn if that fails (cancel y/n) - compare modification time and warn if that fails (cancel y/n) - overwrite the document. I could even imagine that the modification time gets checked each time the window contain the visualization of the document receives the focus.
The Debian/Sarge NFSv3 userspace server does not support locking. On my machine, I can open a file on NFS but it is read-only. I cannot save files to NFS. The [lockd] process is running on the local machine. IMHO this should not be called "misconfigured Linux system".
I am seeing this issue on one of my machines with OOo-2.0rc1, running Vectorlinux-4.3 on kernel 2.6.9. Reviewing this bug report led me to install and enable the nfslock mechanism, which appears to be working: [john@vector john]$ /usr/sbin/rpcinfo -p vector program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100005 1 udp 920 mountd 100005 1 tcp 923 mountd 100005 2 udp 920 mountd 100005 2 tcp 923 mountd 100005 3 udp 920 mountd 100005 3 tcp 923 mountd 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100021 1 udp 32770 nlockmgr 100021 3 udp 32770 nlockmgr 100021 4 udp 32770 nlockmgr 100021 1 tcp 32768 nlockmgr 100021 3 tcp 32768 nlockmgr 100021 4 tcp 32768 nlockmgr Despite this, the "General input/output error" persists and I am unable to open or write files over nfs. My two other machines running OOo-2.0rc1 (Xandros-3.0.2-OCE, kernel 2.6.11; Fedora Core 1, kernel 2.4.29) can read/write/open/save/whatever over nfs and locally. Vectorlinux -- problematic machine -- is slackware based, but since OOo-2.0rc1 is only avauilable as a package of rpms I installed using rpm. Xandros-3.0.2-OCE is Debian based, but once again I used rpm to install -- but it worked here. Fedora Core 1 is rpm based and worked fine as well. This is a show-stopper issue for me with OOo-2.0, since my laptop running Vectorlinux is my primary machine these days. Fortunately, OOo-1.1.5 still works fine. Is this file-locking issue a user-configurable item in OOo-2.0? I.e., could I disable the need for using nfs file locking on the OOo-2.0rc1 installation on this one machine without having to recompile all or part of OOo?
One can always disable file locking by editing the program/soffice script and change the line export SAL_ENABLE_FILE_LOCKING to # export SAL_ENABLE_FILE_LOCKING which was the default in OOo 1.1.x.
@kurti,jdthompson: It appears that there are two problems: 1 On certain Linux machines, file locking is known to fail due to the NFS lock demon not running. 2 On certain other Linux machines, it appears that file locking fails due to some other, not yet analyzed reason. I would like to keep this issue concentrated on problem 1. To analyze problem 2, please file a new issue (you can assign it to me) where you include the output of running strace on soffice.bin (if you run soffice.bin directly rather than through the soffice script, remember to export SAL_ENABLE_FILE_LOCKING!).
rpc.lockd was running on both client and server, but I had to start rpc.statd on my workstation to resolve this issue. IMHO, the error message should be changed to be more descriptive and the file should be opened read-only as a fall-back.
*** Issue 55086 has been marked as a duplicate of this issue. ***
*** Issue 54187 has been marked as a duplicate of this issue. ***
sb->lho: In case it was not clear: I think we need a specification how to communicate the problem described in this issue to the user (see the initial description of this issue). Please dispatch this issue to someone who will come up with that specification.
I commented out 'export SAL_ENABLE_FILE_LOCKING' and verified that it works over NFS. Upon reading some of the previous posts, I noticed that there was some mention of lockd and statd. On my machine, both of these daemons are running, but I still experience the NFS problem. The only solution for now is to comment out the environment variable as above. Perhaps as an interim solution, OpenOffice should test file locking upon startup before doing anything else. If it fails, print an error message on the console and disable it for the rest of the session.
On a machine with NFS file locking working (Fedora Core 1, kernel 2.4.29) and the default OOo setup (SAL_ENABLE_FILE_LOCKING=1), I had a crash in OOo2.0rc3 while configuring the tool bar. I had to "killall -9 soffice.bin" from the console to recover, as X was wedged and "killall soffice.bin" was not successful. When OOo terminated, it took X with it (X server restarted). After logging in and restarting OOo, I allowed OOo to recover the open files as requested and sent the error report. The two open documents were reported as successfully recovered, but I was only able to open them in "read-only" mode. These documents were saved in my home directory, which is NFS-mounted from another machine. Copying the documents to a new file and opening the copy worked fine, so suspecting an NFS lock issue, I closed OOo, commented out the SAL_ENABLE_FILE_LOCKING lines in the soffice script, restarted OOo, and was able to open the original recovered documents without problems. Will nfs file locks time out eventually and allow opening without disabling OOo's nfs file locking? Or does the nfs filesystem have to be remounted or nfs processes restarted in order to clear the file locks? Either way it seems cumbersome, especially when the documents were successfully recovered after the OOo crash. For the time being I am leaving nfs file locking disabled.
reassigned to Frank Loehmann (fl)
The evaluation of this task will take some time. Therefore I re-target this task to OOo2.0.2.
I just encountered this problem yesterday when I installed the "official" OpenOffice 2.0 (from the openoffice.org website) in an FC4 system. I will not add of the boring deatils of which NFS systemworked or failed for me. However, I can say that based on experience of NFS from its beginnings, and from the experience of writing a correctly functioning file locking system for a Unix-based application,
*** This is a continuation of my previous comment (which accidentally got send as I was typing...) My conclusion is: RELYING ON NFS FILE LOCKING FOR AN APPLICATION THAT SUPPORTS MORE THAN ONE SPECIFIC NFS IMPLEMENTATION WILL NEVER WORK. The comments to the effect that this will need thought and review are accurate. A very large amount of time and effort have gone into this over the past 30 years or so. So, my recommendation is: if you want file-locking in OO, think it through carefully, and implement a mechanism that does not depend on NFS.
mdgodfrey wrote on Sat Oct 29 17:40:59 -0700 2005: "My conclusion is: RELYING ON NFS FILE LOCKING FOR AN APPLICATION THAT SUPPORTS MORE THAN ONE SPECIFIC NFS IMPLEMENTATION WILL NEVER WORK. "The comments to the effect that this will need thought and review are accurate. A very large amount of time and effort have gone into this over the past 30 years or so. So, my recommendation is: if you want file-locking in OO, think it through carefully, and implement a mechanism that does not depend on NFS." Are you able to share details of what you did to achive a functioning file-locking mechanism for the application you developed?
jdthompson Sat Oct 29 19:24:20 -0700 2005 -------said: Are you able to share details of what you did to achive a functioning file-locking mechanism for the application you developed? ============================================================ Sure. It is in an open source VLSI Design system called Magic. This can be downloaed from: http://opencircuitdesign.com/magic/. Click on Version 7.3. The code was originally written about 8 years ago. It provides file locking for designers working on the same database. The main body of the code is in (after you untar the dowloaded file) magic/utils/flock.c. There once was a narrative about the design, but that does not seem to be present in the current system. flock.c contains a fair amount of doucmentation, and points to where the rest of the code resides. I think that a system for OpenOffice.org has much more demanding requirements, but this code may still be helpful to the thought process. The code does deal with the fact that the Magic system, when it opens a new file, reads the file completely and then closes the file. During the several years that I was directly involved, the code performed correctly. As I think of it, 2 areas were not handled as well as should have been: 1. Links are only followed for one level. This could be generalized, but would need to deal with recursive link loops, etc. 2. Stale locks (mostly due to crashes) could be cleared more gracefully. I would have fixed this, but no one complained enough.
Another symptom of this problem: when storing ~ on NFS, the Welcome / Registration dialog appear every time OpenOffice.org is started. The Welcome / Registraiton problem is also fixed through the workaround commenting out "export SAL_ENABLE_FILE_LOCKING."
Same problem here on OO 2.0 final, SuSE 10.0 server (nfs and lockd running) and SuSE 9.3 Client with nfs-mount (no, I would not call it 'misconfigured system'). If I just try to save a new(!) document on the nfs-sahre, I got the error message 'File test.odt does not exist'. followed by the general I/O error mentioned. I havent tried the workaround (disable: export SAL_ENABLE_FILE_LOCKING), but I feel that this should be fixed somehow to increase usability.
*** Issue 57969 has been marked as a duplicate of this issue. ***
My target for this issue is, like proposed by SB, to handle the following: 1. On certain Linux machines, file locking is known to fail due to the NFS lock demon not running. 2. Disabling file locking rises other problems, see issue 29425 The following has to be considered: - Report: Welcome dialog appears every time - Report: AutoRecovery restores documents read-only Recommended solution: (Tend to use #4, but maybe we need possibility to open/read write) #4 Handle ENOLCK in the OOo code by presenting a warning message that file locking did not work for some reason, and opening the file read-only (involved developer: abi). or #5 Handle ENOLCK in the OOo code by presenting a warning message that file locking did not work for some reason, and opening the file read/write, but unlocked (involved developers: abi, obr). In each case, an additional step might be to document the problem of mis-configured Linux machines in the readme or similar. Notes: We have to inform the user when loading the document.
Please find the spec here: http://specs.openoffice.org/appwide/fileIO/FileLockingOnLinux.sxw
Changed owner. Don't know why change of owner did not work last time in December.
Changed mapping of osl to ucb errorcode. The document will be opened now readonly without errormessage.
FL: Please note that the spec. linked above is obsolete. Document will just open read only like described by ABI.
Sorry for the rant, but I'm really sad about how this issue was handled. In http://www.openoffice.org/issues/show_bug.cgi?id=29425 one requests an extremely rarely used feature to be set as default. (I have never used or needed file locking, I know nobody who has ever needed or used file locking) The next comment is "fixed" and SAL_ENABLE_FILE_LOCKING is set per default. I also ran into the NFS-problem and found this and numerous other issues about this exact problem. The really sad part of the story is that despite of many, many duplicate bugreports, despite many users having this problem and despite the votes for this issue, nobody just said "fixed" and unset SAL_ENABLE_FILE_LOCKING, instead a feature that is needed by almost nobody (the case that several people work on the same file at the same time is extremely rare) and is only needed by experts (those rare multiuser environments are usually run by administrators, who will have not any problems setting SAL_ENABLE_FILE_LOCKING, it is their job) is causing grief and trouble to many users. NFS is already fragile and problematic as it is. The line "This is a first step on the way to get rid of this variable completely" in http://www.openoffice.org/issues/show_bug.cgi?id=29425 actually gave me shivers. Please consider removing SAL_ENABLE_FILE_LOCKING again in the default settings, it just is not worth the trouble and in the rare cases in which it is needed, can be set by the administrator (which is a necessity in any serious multiuser environment) Thanks for listening
verified re-open issue and reassign to tm@openoffice.org
reassign to tm@openoffice.org
reset resolution to FIXED
Mount an NFS drive using "defaults" as the option. Trying to open a document (no matter if .odt, .sx*, or any other format) with OpenOffice.org 2.0 will result in an I/O error message. If you're lucky and the document opens, anyway (didn't have time to test under which circumstances this happens), you still cannot save that document, neither with its original name and format, nor with a different name or format. The solution to this is simple: Mount the NFS drive with the "nolock" option. For example, "defaults,nolock" will work. Opening or saving documents works like a charm. I tried this on a SuSE 10.0 Linux where new NFS drives are mounted with the "defaults" option only. Took me two weeks to find out, because everything else was working using "defaults", just OpenOffice.org 2.0 didn't.
Checked and verified in cws nfslockproblem -> OK !
closed
I still have this problem. I have tried exporting SAL_ENABLE_FILE_LOCKING=0. I have tried remounting my NFS mounts with nolock, all to no avail. Is there any information i can give to help find the solution to this ?? I really like the look of OOo2 but as such have not been able to use it for anything useful as ALL my files are stored on NFS shares.
You need to unset SAL_ENABLE_FILE_LOCKING, the implementation does not care about the value. Just put a '#' in front of the export SAL_ENABLE_FILE_LOCKING line in the 'soffice' shell script.
Re-opening issue as this problem still exists with properly configured Linux and other UNIX systems. Suggest examining code for Mac OS X which uses a different type of file lock/unlock and see if it applies to other UNIX based systems. James McKenzie
adjust target to 2.3 since 2.0.2 has been released in the meantime
OF: I'm sorry, but due to resource shortage this issue can't be handled in 2.3 time frame.
what's the status of this popular issue right now ?
As file locking with file system commands will be replaced in OOo 3.0 this issue will be resolved by implementing issue 85794. *** This issue has been marked as a duplicate of 85794 ***
Closing. Interested people should watch Issue 85794.
I would like to delete an older verson 3.0.9 but I can't becouse I can't find the msi file that should be attached to it. How do I get a new msi file attach it to this verson so I can delete it ??? It says it is looking for Openofficeorg30.msi file and can't find it. Or is their a nother way to do this ??? It take up a lot of room and I need to clean things up. Jerry