Using Node Doctor to Troubleshoot Worker Node Issues

On Compute Cloud@Customer, Node Doctor is a script that is included in the latest OKE images.

Note

If Compute Cloud@Customer has worker nodes that were created before Node Doctor was included in OKE images, you can use node cycling to update older worker node images. See Node Cycling an OKE Node Pool.

If a cluster has a worker node that's in a state other than Active or Running, use the Node Doctor utility to troubleshoot the issues.

Node Doctor scans a worker node and reports the health status of the node. Node Doctor can do the following tasks:

Important

Use Node Doctor only on worker nodes. Because Node Doctor is installed on OKE images, Node Doctor is also available on cluster control plane nodes. Don't use Node Doctor on control plane nodes.

Connect to the Worker Node Using SSH

Perform the following steps to connect to the worker node that you want to troubleshoot.

  1. Ensure that you have a private and public SSH key pair.

    You must have the private key that goes with the public key that was added to the node when the node was created.

  2. Get the node username. OKE images have the initial username opc configured.

  3. Get the IP address of the worker node that you need to troubleshoot.

    The IP address is on the Networking tab of the node details page in the Console.

    • If the node has a public IP address, use the public IP address.

    • If the node is on a private IP, then connect to the node through the bastion host.

      If a bastion host isn't available, see Creating a Bastion.

  4. Enter the following command at a shell prompt on your local system (public IP address) or on the bastion host (private IP address):

    ssh -i private_key_file username@ip-address
    • private_key_file. The full path and name of the file that contains the private SSH key that goes with the public key that was added to the node when the node was created.

    • username. The default username for the node. This value probably is opc.

    • ip-address. The node IP address that you got in the previous step.

  5. Ensure that you have execute permissions for the following script. You run the script later.

    ls -l /usr/local/bin/node-doctor.sh
    -rwxr-xr-x 1 user1 user1 6288 Dec  5  2024 usr/local/bin/node-doctor.sh

Print Troubleshooting Information

While logged in to the worker node as described in Connect to the Worker Node Using SSH, enter the following command to print information that identifies potential problem areas:

$ sudo /usr/local/bin/node-doctor.sh --check

Use the following command to see more options:

$ sudo /usr/local/bin/node-doctor.sh --help

Create a Support Bundle

If you can't resolve the issue, use the following command to create a support bundle with relevant information for Oracle Support:

$ sudo /usr/local/bin/node-doctor.sh --generate

The support bundle is in the /tmp directory as oke-support-bundle-dateTtime.tar.

Note

Monitor the /tmp directory to ensure that it doesn't fill up. Remove old files using the rm command, for example.

See the following resources for information about submitting a Support Request and uploading a bundle: