WebLogic Server Doesn't Restart After Failure
After a node failure, WebLogic Server fails to start.
The WebLogic service log contains an error message similar to the following:
java.io.IOException: Error from fcntl() for file locking, Resource temporarily unavailable, errno=11
Cause 1: NFSv3 servers don't include a lock lease feature, so lock states aren't stored and locks can't be released after the node failure.
Solution 1: Request removal of file locks. For more information, see Removing File Locks from a Host that is No Longer Available.
Cause 2: Sometimes, the rpc-statd
service, which is needed for NFSv3 locking, is in an unhealthy state after the server failure. This can be verified by running a sample lock test using fcntl
module. For example:
$python3
>>> import fcntl
>>> f = open('/fss/path/testfile.txt', 'r') #Open an existing file as read mode (do not use 'w')
>>> fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) #Throws "no lock available" error.
>>> exit()
Solution 2: Restart the rpc-statd
service.
-
Open a terminal window on the instance and use the following commands as the root user:
$sudo systemctl status rpc-statd $sudo systemctl stop rpc-statd $sudo systemctl start rpc-statd $sudo systemctl status rpc-statd
- Verify that the
fcntl
sample lock test completes without error. - Start the WebLogic server.
Cause 3: NFSv3 doesn't track lock owners. So, NFS holds the lock indefinitely if a lock owner fails. After a node failure, a WebLogic restart attempt can't acquire a lock.
Solution 3: This is a general NFSv3 limitation. Immediate mitigation and long-term design considerations are provided in WebLogic's documentation. For more information, see Verifying Server Restart Behavior.