Issue 54586 - Cannot open files via NFS on mis-configured Linux machines
Summary: Cannot open files via NFS on mis-configured Linux machines
Status: CLOSED DUPLICATE of issue 85794
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: 680m124
Hardware: All All
: P3 Trivial with 10 votes (vote)
Target Milestone: OOo 3.0
Assignee: thorsten.martens
QA Contact: issues@framework
URL:
Keywords: oooqa
: 53682 54187 55086 57969 (view as issue list)
Depends on:
Blocks: 61865
  Show dependency tree
 
Reported: 2005-09-14 10:33 UTC by Stephan Bergmann
Modified: 2011-02-13 14:56 UTC (History)
11 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Stephan Bergmann 2005-09-14 10:33:20 UTC
Since issue 29425 is fixed, file locking is enabled in OOo.  On some Linux
machines, the attempt to lock a file accessed via NFS results in an error
(ENOLCK), because the machine is mis-configured in that an NFS lock deamon is
not running (ENOLCK can also occur, whether or not a file is accessed via NFS,
in the---somewhat unlikely---event that "satisfying the lock
or unlock request would result in the number of locked regions in the system
exceeding a system-imposed limit" [susv3]).

In OOo, this leads to a cryptic error box "General input/output error while
accessing ..." and failure to open a file via NFS.

There are at least five choices to handle this problem:
1  Disable file locking again.  See issue 29425 for why it was enabled (involved
developer: obr).
2  Blame the mis-configured machines and do nothing (no developer involved).
3  Handle ENOLCK in the OOo code by presenting a more appropriate error message
(possibly including a hint that the problem might be a mis-configured Linux
machine), but still refusing to open the file (involved developer: abi).
4  Handle ENOLCK in the OOo code by presenting a warning message that file
locking did not work for some reason, and opening the file read-only (involved
developer: abi).
5  Handle ENOLCK in the OOo code by presenting a warning message that file
locking did not work for some reason, and opening the file read/write, but
unlocked (involved developers: abi, obr).
In each case, an additional step might be to document the problem of
mis-configured Linux machines in the readme or similar.
Comment 1 Stephan Bergmann 2005-09-14 10:36:23 UTC
*** Issue 53682 has been marked as a duplicate of this issue. ***
Comment 2 nospam4obr 2005-09-15 07:35:22 UTC
AFAIK file-locking is also needed by the storage implementation and the GNOME
.recently-used functionality. However, these affect only the users $HOME and the
OOo program directory - if both are on the local disk, one still needs no
locking daemon.

Given that a NFS locking has multiple problems (e.g. the lock is removed when
the file get's opened a second time by the same process), my preferred approach
would be to:

6. Change the document handling not to use locking, but to compare the access
time of the file before overwriting it with the access time when the file was
opened. In case these don't match, tell the user the file has changed on disk
and ask whether (s)he would like to overwrite the file anyway, pick a new name
for it or cancel the save operation.
Comment 3 Stephan Bergmann 2005-09-15 12:47:20 UTC
@obr:

- "e.g. the lock is removed when the file get's opened a second time by the same
process"  Correct me if I am wrong, but as far as I know, locks are removed as
soon as any file descriptor on the same file is closed, and that is a general
design "feature" of fcntl(F_SETLK), irrespective of NFS.

- "compare the access time of the file before overwriting it with the access
time when the file was opened"  Compare-and-overwrite would have to be an atomic
operation then, which it is not.  Thus, your proposed approach #6 IMO gives a
false sense of security.
Comment 4 aziem 2005-09-28 16:10:25 UTC
> because the machine is mis-configured in that an NFS lock deamon is
not running 

Should simply nlockmgr running fix this?  I restarted my NFS, and now nlockmgr
is running on the NFS client and server (reported via rpcinfo -p hostname). 
However, I still get "General input/output error.." when open files via NFS with
OOo 2.0 rc1 on Fedora Core 3.
Comment 5 nospam4obr 2005-09-29 07:08:44 UTC
I don't know what nlockmgr actually does, but the NFS lock daemon is usually
started by a script named nfslock and shows up in the process list as [lockd].
Comment 6 nospam4obr 2005-09-29 07:20:29 UTC
@sb: hmm, strange: I thought I'd replied to your latest comments, but there is
nothing in the issue :(.  So here we go again:

- "Correct me if I am wrong, but as far as I know, locks are removed as soon as
any file descriptor on the same file is closed, and that is a general design
"feature" of fcntl(F_SETLK), irrespective of NFS." 

I don't remember the details (so you are probably right on them). However, the
result is the same - we actively need to make sure that a file get's opened only
once at a time.

- "Compare-and-overwrite would have to be an atomic
operation then, which it is not.  Thus, your proposed approach #6 IMO gives a
false sense of security."

The point I was trying to make is that the feedback IMHO needs to arise at the
time the user _saves_ the document, not when (s)he opens it (which potentially
could be days earlier). To make this more atomic, the algorithm could work like:

save operation: try locking the file and warn if that fails (cancel y/n) -
compare modification time and warn if that fails  (cancel y/n) - overwrite the
document.

I could even imagine that the modification time gets checked each time the
window contain the visualization of the document receives the focus.

Comment 7 kurti 2005-10-08 07:46:37 UTC
The Debian/Sarge NFSv3 userspace server does not support locking. On my machine,
I can open a file on NFS but it is read-only. I cannot save files to NFS. The
[lockd] process is running on the local machine.

IMHO this should not be called "misconfigured Linux system".
Comment 8 jdthompson 2005-10-08 20:10:21 UTC
I am seeing this issue on one of my machines with OOo-2.0rc1, running
Vectorlinux-4.3 on kernel 2.6.9. Reviewing this bug report led me to install and
enable the nfslock mechanism, which appears to be working:

[john@vector john]$ /usr/sbin/rpcinfo -p vector
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100005    1   udp    920  mountd
    100005    1   tcp    923  mountd
    100005    2   udp    920  mountd
    100005    2   tcp    923  mountd
    100005    3   udp    920  mountd
    100005    3   tcp    923  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100021    1   udp  32770  nlockmgr
    100021    3   udp  32770  nlockmgr
    100021    4   udp  32770  nlockmgr
    100021    1   tcp  32768  nlockmgr
    100021    3   tcp  32768  nlockmgr
    100021    4   tcp  32768  nlockmgr

Despite this, the "General input/output error" persists and I am unable to open
or write files over nfs. 

My two other machines running OOo-2.0rc1 (Xandros-3.0.2-OCE, kernel 
2.6.11; Fedora Core 1, kernel 2.4.29) can read/write/open/save/whatever over nfs
and locally. 

Vectorlinux -- problematic machine -- is slackware based, but since 
OOo-2.0rc1 is only avauilable as a package of rpms I installed using rpm.

Xandros-3.0.2-OCE is Debian based, but once again I used rpm to install -- 
but it worked here.

Fedora Core 1 is rpm based and worked fine as well.

This is a show-stopper issue for me with OOo-2.0, since my laptop running
Vectorlinux is my primary machine these days. Fortunately, OOo-1.1.5 still works
fine.

Is this file-locking issue a user-configurable item in OOo-2.0? I.e., could I
disable the need for using nfs file locking on the OOo-2.0rc1 installation on
this one machine without having to recompile all or part of OOo?
Comment 9 nospam4obr 2005-10-10 07:11:22 UTC
One can always disable file locking by editing the program/soffice script and
change the line

export SAL_ENABLE_FILE_LOCKING

to

# export SAL_ENABLE_FILE_LOCKING

which was the default in OOo 1.1.x.
Comment 10 Stephan Bergmann 2005-10-10 08:19:05 UTC
@kurti,jdthompson:  It appears that there are two problems:
1  On certain Linux machines, file locking is known to fail due to the NFS lock
demon not running.
2  On certain other Linux machines, it appears that file locking fails due to
some other, not yet analyzed reason.
I would like to keep this issue concentrated on problem 1.
To analyze problem 2, please file a new issue (you can assign it to me) where
you include the output of running strace on soffice.bin (if you run soffice.bin
directly rather than through the soffice script, remember to export
SAL_ENABLE_FILE_LOCKING!).
Comment 11 acooks 2005-10-13 08:21:12 UTC
rpc.lockd was running on both client and server, but I had to start rpc.statd on
my workstation to resolve this issue.

IMHO, the error message should be changed to be more descriptive and the file
should be opened read-only as a fall-back.
Comment 12 andreschnabel 2005-10-14 13:20:31 UTC
*** Issue 55086 has been marked as a duplicate of this issue. ***
Comment 13 andreschnabel 2005-10-14 13:22:11 UTC
*** Issue 54187 has been marked as a duplicate of this issue. ***
Comment 14 Stephan Bergmann 2005-10-14 14:33:52 UTC
sb->lho:  In case it was not clear:  I think we need a specification how to
communicate the problem described in this issue to the user (see the initial
description of this issue).  Please dispatch this issue to someone who will come
up with that specification.
Comment 15 kumaran 2005-10-14 18:03:18 UTC
I commented out 'export SAL_ENABLE_FILE_LOCKING' and verified that it
works over NFS.  Upon reading some of the previous posts, I noticed that
there was some mention of lockd and statd.  On my machine, both of these
daemons are running, but I still experience the NFS problem.  The only
solution for now is to comment out the environment variable as above.

Perhaps as an interim solution, OpenOffice should test file locking
upon startup before doing anything else.  If it fails, print an error
message on the console and disable it for the rest of the session.
Comment 16 jdthompson 2005-10-16 17:22:03 UTC
On a machine with NFS file locking working (Fedora Core 1, kernel 2.4.29) and
the default OOo setup (SAL_ENABLE_FILE_LOCKING=1), I had a crash in OOo2.0rc3
while configuring the tool bar. I had to "killall -9 soffice.bin" from the
console to recover, as X was wedged and "killall soffice.bin" was not
successful. When OOo terminated, it took X with it (X server restarted). After
logging in and restarting OOo, I allowed OOo to recover the open files as
requested and sent the error report. The two open documents were reported as
successfully recovered, but I was only able to open them in "read-only" mode.
These documents were saved in my home directory, which is NFS-mounted from
another machine. Copying the documents to a new file and opening the copy worked
fine, so suspecting an NFS lock issue, I closed OOo, commented out the
SAL_ENABLE_FILE_LOCKING lines in the soffice script, restarted OOo, and was able
to open the original recovered documents without problems. Will nfs file locks
time out eventually and allow opening without disabling OOo's nfs file locking?
Or does the nfs filesystem have to be remounted or nfs processes restarted in
order to clear the file locks? Either way it seems cumbersome, especially when
the documents were successfully recovered after the OOo crash. For the time
being I am leaving nfs file locking disabled.

Comment 17 lutz.hoeger 2005-10-19 15:39:58 UTC
reassigned to Frank Loehmann (fl)
Comment 18 thorsten.ziehm 2005-10-19 15:54:42 UTC
The evaluation of this task will take some time. Therefore I re-target this task
to OOo2.0.2.
Comment 19 mdgodfrey 2005-10-30 01:33:43 UTC
I just encountered this problem yesterday when I installed the "official"
OpenOffice 2.0 (from the openoffice.org website) in an FC4 system. 

I will not add of the boring deatils of which NFS systemworked or failed for me.
However, I can say that based on experience of NFS from its beginnings, and from
the experience of writing a correctly functioning file locking system for a
Unix-based application,  
Comment 20 mdgodfrey 2005-10-30 01:40:59 UTC
*** This is a continuation of my previous comment (which accidentally got send
as I was typing...)

My conclusion is:  RELYING ON NFS FILE LOCKING FOR AN APPLICATION THAT SUPPORTS
MORE THAN ONE SPECIFIC NFS IMPLEMENTATION WILL NEVER WORK.

The comments to the effect that this will need thought and review are accurate.
A very large amount of time and effort have gone into this over the past 30
years or so.  So, my recommendation is: if you want file-locking in OO, think it
through carefully, and implement a mechanism that does not depend on NFS.
Comment 21 jdthompson 2005-10-30 03:24:20 UTC
mdgodfrey wrote on Sat Oct 29 17:40:59 -0700 2005:

"My conclusion is:  RELYING ON NFS FILE LOCKING FOR AN APPLICATION THAT SUPPORTS
MORE THAN ONE SPECIFIC NFS IMPLEMENTATION WILL NEVER WORK.

"The comments to the effect that this will need thought and review are accurate.
A very large amount of time and effort have gone into this over the past 30
years or so.  So, my recommendation is: if you want file-locking in OO, think it
through carefully, and implement a mechanism that does not depend on NFS."

Are you able to share details of what you did to achive a functioning
file-locking mechanism for the application you developed?
Comment 22 mdgodfrey 2005-10-31 02:17:38 UTC
 jdthompson Sat Oct 29 19:24:20 -0700 2005 -------said:

Are you able to share details of what you did to achive a functioning
file-locking mechanism for the application you developed?

============================================================
Sure. It is in an open source VLSI Design system called Magic.  This
can be downloaed from: http://opencircuitdesign.com/magic/. Click on
Version 7.3.  The code was originally written about 8 years ago. It provides
file locking for designers working on the same database.
The main body of the code is in (after you untar the dowloaded file)
magic/utils/flock.c.  There once was a narrative about the design, but
that does not seem to be present in the current system. flock.c contains a
fair amount of doucmentation, and points to where the rest of the code resides.

I think that a system for OpenOffice.org has much more demanding requirements,
but this code may still be helpful to the thought process.  The code does
deal with the fact that the Magic system, when it opens a new file, reads
the file completely and then closes the file. During the several years that
I was directly involved, the code performed correctly.  As I think of it,
2 areas were not handled as well as should have been:

1. Links are only followed for one level.  This could be generalized, but would
   need to deal with recursive link loops, etc.

2. Stale locks (mostly due to crashes) could be cleared more gracefully. I
   would have fixed this, but no one complained enough.
Comment 23 aziem 2005-10-31 17:03:23 UTC
Another symptom of this problem: when storing ~ on NFS, the Welcome /
Registration dialog appear every time OpenOffice.org is started.

The Welcome / Registraiton problem is also fixed through the workaround
commenting out "export SAL_ENABLE_FILE_LOCKING."
Comment 24 docb 2005-11-05 17:11:36 UTC
Same problem here on OO 2.0 final, SuSE 10.0 server (nfs and lockd running) and
SuSE 9.3 Client with nfs-mount (no, I would not call it 'misconfigured system').

If I just try to save a new(!) document on the nfs-sahre, I got the error
message 'File test.odt does not exist'. followed by the general I/O error
mentioned. I havent tried the workaround (disable: export
SAL_ENABLE_FILE_LOCKING), but I feel that this should be fixed somehow to
increase usability.
Comment 25 Regina Henschel 2005-11-16 08:46:24 UTC
*** Issue 57969 has been marked as a duplicate of this issue. ***
Comment 26 frank.loehmann 2005-12-09 15:51:34 UTC
My target for this issue is, like proposed by SB, to handle the following:
1. On certain Linux machines, file locking is known to fail due to the NFS lock
demon not running.
2. Disabling file locking rises other problems, see issue 29425 

The following has to be considered:
- Report: Welcome dialog appears every time
- Report: AutoRecovery restores documents read-only

Recommended solution:
(Tend to use #4, but maybe we need possibility to open/read write)
#4 Handle ENOLCK in the OOo code by presenting a warning message that file
locking did not work for some reason, and opening the file read-only (involved
developer: abi).

or

#5 Handle ENOLCK in the OOo code by presenting a warning message that file
locking did not work for some reason, and opening the file read/write, but
unlocked (involved developers: abi, obr).
In each case, an additional step might be to document the problem of
mis-configured Linux machines in the readme or similar.

Notes: We have to inform the user when loading the document.
Comment 27 frank.loehmann 2005-12-19 14:14:08 UTC
Please find the spec here:
http://specs.openoffice.org/appwide/fileIO/FileLockingOnLinux.sxw
Comment 28 frank.loehmann 2006-01-03 09:42:13 UTC
Changed owner. Don't know why change of owner did not work last time in December.
Comment 29 andreas.bille 2006-01-11 13:35:19 UTC
Changed mapping of osl to ucb errorcode. The document will be opened now
readonly without errormessage.
Comment 30 frank.loehmann 2006-01-23 15:08:31 UTC
FL: Please note that the spec. linked above is obsolete. Document will just open
read only like described by ABI.
Comment 31 rseuhs 2006-01-30 21:20:22 UTC
Sorry for the rant, but I'm really sad about how this issue was handled. 
 
In http://www.openoffice.org/issues/show_bug.cgi?id=29425 one requests an 
extremely rarely used feature to be set as default. (I have never used or 
needed file locking, I know nobody who has ever needed or used file locking) 
 
The next comment is "fixed" and SAL_ENABLE_FILE_LOCKING is set per default. 
 
I also ran into the NFS-problem and found this and numerous other issues about 
this exact problem. 
 
The really sad part of the story is that despite of many, many duplicate 
bugreports, despite many users having this problem and despite the votes for 
this issue, nobody just said "fixed" and unset SAL_ENABLE_FILE_LOCKING, instead 
a feature that is needed by almost nobody (the case that several people work on 
the same file at the same time is extremely rare) and is only needed by experts 
(those rare multiuser environments are usually run by administrators, who will 
have not any problems setting SAL_ENABLE_FILE_LOCKING, it is their job) is 
causing grief and trouble to many users. 
 
NFS is already fragile and problematic as it is. 
 
The line "This is a first step on the way to get rid of 
this variable completely" in 
http://www.openoffice.org/issues/show_bug.cgi?id=29425 actually gave me 
shivers. 
 
Please consider removing SAL_ENABLE_FILE_LOCKING again in the default settings, 
it just is not worth the trouble and in the rare cases in which it is needed, 
can be set by the administrator (which is a necessity in any serious multiuser 
environment) 
 
Thanks for listening 
 
 
Comment 32 andreas.bille 2006-02-02 18:11:11 UTC
verified

re-open issue and reassign to tm@openoffice.org
Comment 33 andreas.bille 2006-02-02 18:11:21 UTC
reassign to tm@openoffice.org
Comment 34 andreas.bille 2006-02-02 18:11:32 UTC
reset resolution to FIXED
Comment 35 stefanhinz 2006-02-06 23:56:36 UTC
Mount an NFS drive using "defaults" as the option. Trying to open a document (no
matter if .odt, .sx*, or any other format) with OpenOffice.org 2.0 will result
in an I/O error message. If you're lucky and the document opens, anyway (didn't
have time to test under which circumstances this happens), you still cannot save
that document, neither with its original name and format, nor with a different
name or format.
The solution to this is simple: Mount the NFS drive with the "nolock" option.
For example, "defaults,nolock" will work. Opening or saving documents works like
a charm. I tried this on a SuSE 10.0 Linux where new NFS drives are mounted with
the "defaults" option only. Took me two weeks to find out, because everything
else was working using "defaults", just OpenOffice.org 2.0 didn't.
Comment 36 thorsten.martens 2006-02-07 12:02:40 UTC
Checked and verified in cws nfslockproblem -> OK !
Comment 37 thorsten.martens 2006-03-03 09:47:14 UTC
closed
Comment 38 daveqb 2006-04-24 23:35:27 UTC
I still have this problem.

I have tried exporting SAL_ENABLE_FILE_LOCKING=0.  I have tried remounting my 
NFS mounts with nolock, all to no avail.

Is there any information i can give to help find the solution to this ??

I really like the look of OOo2 but as such have not been able to use it for 
anything useful as ALL my files are stored on NFS shares.

Comment 39 nospam4obr 2006-04-25 07:50:59 UTC
You need to unset SAL_ENABLE_FILE_LOCKING, the implementation does not care
about the value. Just put a '#' in front of the export SAL_ENABLE_FILE_LOCKING
line in the 'soffice' shell script.
Comment 40 jjmckenzie 2007-03-07 04:55:49 UTC
Re-opening issue as this problem still exists with properly configured Linux and
other UNIX systems.  Suggest examining code for Mac OS X which uses a different
type of file lock/unlock and see if it applies to other UNIX based systems.
James McKenzie
Comment 41 Martin Hollmichel 2007-05-07 12:00:13 UTC
adjust target to 2.3 since 2.0.2 has been released in the meantime
Comment 42 Olaf Felka 2007-08-21 12:24:50 UTC
OF: I'm sorry, but due to resource shortage this issue can't be handled in 2.3
time frame. 
Comment 43 Martin Hollmichel 2008-02-29 13:25:50 UTC
what's the status of this popular issue right now ?
Comment 44 Mathias_Bauer 2008-04-07 11:19:30 UTC
As file locking with file system commands will be replaced in OOo 3.0 this issue
will be resolved by implementing issue 85794.

*** This issue has been marked as a duplicate of 85794 ***
Comment 45 Mathias_Bauer 2008-04-07 11:20:11 UTC
Closing.
Interested people should watch Issue 85794.
Comment 46 jkpitts 2011-02-13 14:56:42 UTC
I would like to delete an older verson 3.0.9 but I can't becouse I can't find 
the msi file that should be attached to it. How do I get a new msi file attach 
it to this verson so I can delete it ??? It says it is looking for 
Openofficeorg30.msi file and can't find it. Or is their a nother way to do 
this ??? It take up a lot of room and I need to clean things up.  Jerry
Comment 47 jkpitts 2011-02-13 14:56:47 UTC
I would like to delete an older verson 3.0.9 but I can't becouse I can't find 
the msi file that should be attached to it. How do I get a new msi file attach 
it to this verson so I can delete it ??? It says it is looking for 
Openofficeorg30.msi file and can't find it. Or is their a nother way to do 
this ??? It take up a lot of room and I need to clean things up.  Jerry