Troubleshooting Load Balancer Backend Server Issues

Learn about backend server issues associated with load balancers.

Debugging a Backend Server Timeout

When the backend server exceeds the response time when responding to a request, a 504 error occurs indicating that the backend server is either down or not responding to the request forwarded by the load balancer. The client application receives the following response code: HTTP/1.1 504 Gateway Timeout.

Errors can occur for the following reasons:

  • The load balancer failed to establish a connection to the backend server before the connection timeout expired.

  • The load balancer established a connection to the backend server but the backend did not respond before the idle timeout period elapsed.

  • The security lists or network security groups for the subnet or the VNIC did not allow traffic from the backends to the load balancer.

  • The backend server or application server failed.

Follow these steps to troubleshoot the backend server timeout errors:

  1. Use the curl utility to directly test the backend server from a host in the same network.

    curl -i http://backend_ip_address
    If this test takes longer than one second to respond, an application-level issue is causing latency. Oracle recommends that you check any upstream dependencies that might cause latency, including:
    • Network attached storage such as iSCSI or NFS

    • Database latency

    • An off-premise API

    • An application tier

  2. Check the application by accessing it directly from the backend server. Check its access logs to determine if the application can be accessed and is functioning properly.

  3. If the load balancer and the backend server are in different subnets, then check whether the security lists contain rules to allow traffic. If no rules exist, then traffic is not allowed.

  4. Enter the following commands to determine whether firewall rules exist on the backend servers that block traffic:

    iptables -L lists all firewall rules enforced by iptables

    sudo firewall-cmd --list-all lists all firewall rules enforced by firewalld

  5. Enable logging on the load balancer to determine whether the load balancer or the backend server is causing the latency.

Testing TCP and HTTP Backend Servers

This topic describes how to troubleshoot a load balancer connection. The topology used in this procedure has a public load balancer in a public subnet and the backends are in the same subnet.

Oracle recommends that you use the Oracle Cloud Infrastructure Logging service to troubleshoot issues. (See Details for Load Balancer Logs.)

In addition to using Oracle Cloud Infrastructure logging, however, you can use other utilities listed in this section to troubleshoot the traffic that is processed by the load balancer and sent to a backend. To perform these tests, Oracle recommends that you create an instance in the same network as your load balancer and allow the traffic in the same network security groups and security lists. Use the following tools to troubleshoot:

  • ping

    Before using the more advanced utilities listed here, Oracle recommends that you perform a basic ping test. For this test to succeed, you must allow ICMP traffic between the test instance and the backend.
    $ ping backend_ip_address
    The response should look similar to:
    PING 192.0.2.2 (192.0.2.2) 56(84) bytes of data.
    64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.028 ms
    64 bytes from 192.0.2.2: icmp_seq=2 ttl=64 time=0.044 ms

    If you receive a message that contains "64 bytes from...", then the ping succeeded.

    Receiving a message that contains "Destination Host Unreachable" indicates that the system does not exist.

    Receiving no message indicates that the system exists but the ICMP protocol is not allowed. Check all firewalls, security lists, and network security groups to ensure ICMP is allowed.

  • curl

    Use the curl utility to send HTTP requests to a specific host, port, or URL.

    • The following example shows using curl to connect to a backend that is sending a 403 Forbidden error:

      $ curl -I http://backend_ip_address/health
      HTTP/1.1 403 Forbidden
      Date: Tue, 17 Mar 2021 17:47:10 GMT
      Content-Type: text/html; charset=UTF-8
      Content-Length: 3539
      Connection: keep-alive
      Last-Modified: Tue, 10 Mar 2021 20:33:28 GMT
      ETag: "dd3-5b3c6975e7600"
      Accept-Ranges: bytes

      In the preceding example, the health check fails, returning a 403 error, indicating that the backend does not have local file permissions configured properly for the Health check page.

    • The following example shows using curl to connect to a backend that is sending a 404 Not Found error:

      $ curl -I http://backend_ip_address/health
      HTTP/1.1 404 Not Found
      Date: Tue, 17 Mar 2021 17:47:10 GMT
      Content-Type: text/html; charset=UTF-8
      Content-Length: 3539
      Connection: keep-alive
      Last-Modified: Tue, 10 Mar 2021 20:33:28 GMT
      ETag: "dd3-5b3c6975e7600"
      Accept-Ranges: bytes

      In the preceding example, the health check fails, returning a 404 error, indicating that the Health check page does not exist in the expected location.

    • The following example shows a backend that exists and either a network security group, the security lists, or a local firewall is blocking the traffic:

      $ curl -I backend_ip_address
      curl: (7) Failed connect to backend_ip_address:port; Connection refused
    • The following example shows a backend that does not exist:

      $ curl -I backend_ip_address
      curl: (7) Failed connect to backend_ip_address:port; No route to host
  • Netcat

    Netcat is a networking utility for reading from and writing to network connections using TCP or UDP.

    • The following example shows using the netcat utility at the TCP level to ensure that the destination backend server can receive a connection:
      $ nc -vz backend_ip_address port
      Ncat: Connected to backend_ip_address:port.

      In the preceding example, port is open for connections.

    • $ nc -vn backend_ip_address port
      Ncat: Connection timed out.

      In the preceding example, port is closed.

  • Tcpdump

    Use the tcpdump utility to capture all traffic to a backend to ensure which traffic is coming from a load balancer and what is being returned to the load balancer.

    sudo tcpdump -i any -A port port src load_balancer_ip_address
    11:25:54.799014 IP 192.0.2.224.39224 > 192.0.2.224.80: Flags [P.], seq 1458768667:1458770008, ack 2440130792, win 704, options [nop,nop,TS val 461552632 ecr 208900561], length 1341: HTTP: POST /health HTTP/1.1
  • OpenSSL

    When troubleshooting SSL issues between the load balancer instance and the backend servers, Oracle recommends using the openssl utility. This utility opens an SSL connection to a specific host name and port, and prints the SSL certificate and other parameters.

    Other options for troubleshooting issues are:
    • -showcerts

      This option prints all certificates in the certificate chain presented by the backend server. Use this option to identify issues, such as a missing intermediate certificate authority certificate.

    • -cipher cipher_name

      This option forces the client and server use a specific cipher suite and helps to rule out whether the backend is allowing specific ciphers.

  • Netstat

    Use the netstat -natp command to ensure that the application running on the backend server is up and running. For TCP or HTTP traffic, the backend application, IP address, and port must all be in listen mode. If the application port on the backend server is not in listen mode, then the TCP port of the application is not up.

    To resolve this issue, ensure that the application is up and running by either restarting the application or the backend server.