Andrew Mobbs (mobbsy) wrote,
Andrew Mobbs
mobbsy

Argh. To record this one for posterity.

Take one NFS mount on a RHEL 3 box.

It used to Just Work.

One day, "ls -l" consistently hangs, as does "mv", and "cp". Many other things work, including "ls" and "echo hello > foo", and "lsof".

It turns out that the call that is hanging is "getxattr".

Another RHEL 3 box, installed from the same image doesn't have this problem. "ls -l" on the same mount doesn't bother calling getxattr.

The processes on the problem machine can be recovered with the following procedure:
kill -9 <PID>
(process is still alive, still hung on disk wait)
umount -f <MOUNTPOINT>
(errors claiming fs is busy, but hung processes die)
umount -f <MOUNTPOINT>
(yes, again, but no errors this time)
mount <MOUNTPOINT>
(Ta-da - filesystem reappears, hung processes are dead. However, all commands that call getxattr still exhibit the same problem.)

Attempting a forced unmount without the kill doesn't do anything useful.

Rinse - repeat - get same result time and again - fiddle - write one-line test program for bug report - everything mysteriously starts working again. Even the getxattr test program just returns EOPNOTSUPP rather than hanging.

*sob*

[Oh, and for u.c.o.l readers, no this is a different NFS problem to the one I was talking about there]
Subscribe
  • Post a new comment

    Error

    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments