Troubleshooting
The following topic covers how to diagnose data flow problems from Autonomous Databases, targets managed by Enterprise Manager Cloud Control and targets managed by the Management Agent service.
The following topics are covered:
- Troubleshooting Enterprise Manager Data Flow Issues
- Troubleshooting Autonomous Database Target Data Flow Issues
- Troubleshooting Management Agent Service Data Flow Issues
- Troubleshooting Ops Insights Disablement Issues
- SQL Insights Are Collected For Enterprise Manager Databases But No SQL Text Is Displayed
Troubleshooting Enterprise Manager Data Flow Issues
The following steps allow you to interpret alarms related to related to the DataFlowDelayInHrs metric.
The following alarm sample defines an alarm to find targets for which there is a delay greater than 48 hours for some of its metrics.
DataFlowDelayInHrs[1d].grouping(telemetrySource, resourceId, sourceIdentifier).max() > 48
The following triage process assumes you have already set up alarms on the DataFlowDelayInHrs metric in its tenancy. See Managing Alarms for instructions on setting up alarms.
- Obtain the EM Bridge OCID to which the target belongs from the value of the alarm sourceIdentifier dimension.
- Navigate to the EM Bridge Administration UI for that bridge and check its status.
- If the status is NEEDS ATTENTION, then refer to the policy shown in the UI and add the appropriate policy so that the EM bridge's status become ACTIVE.
- If its status is ACTIVE, then proceed with step three.
- If the status is NEEDS ATTENTION, then refer to the policy shown in the UI and add the appropriate policy so that the EM bridge's status become ACTIVE.
- Since the EM Bridge on the OCI side is in ACTIVE status, you now need to navigate to the Enterprise Manager console to troubleshoot this issue further.
- In Enterprise Manage console, check the OCI Bridge connectivity using steps outlined in Step 1: Export Enterprise Manager Data to OCI.
Click of Test to check the OCI bridge connectivity for your bridge.
- If the OCI bridge test fails, then update the OCI credential accordingly so that Enterprise Manager has access to the provided Object Storage bucket.
- If the OCI bridge test succeeds, then check the data upload status for the Ops Insights service by following the steps outlined in Viewing Data Upload Status for a Service.
- Select the Ops Insights service and select Run Diagnostics from the menu to get the overall report.
- For a group/target, with apparent problems (as shown in the UI), you can select Show Errors to get detailed information on that error.
If the error shows that the target or agent is down, then you can restart the target or agent from the Enterprise Manager console.
Troubleshooting Autonomous Database Target Data Flow Issues
- Obtain the Autonomous Database OCID for which the delay is occurring to by getting the value of the sourceIdentifier dimension from the alarm.
- Navigate to the Autonomous Database UI console for that database and check its status.
If its status is STOPPED, then start the database.
- Check the Metrics section of the Autonomous Database home page and view the last 7 days of data.
If charts are not displaying any data for the last 2 days, then create a Support Request explaining that you are not seeing data in Metrics charts for that specific Autonomous Database.
Troubleshooting Management Agent Service Data Flow Issues
- Obtain the Management Agent OCID for which the delay is occurring to by getting the value of the sourceIdentifier dimension from the alarm.
- Next, check the Agent Health. This can be done via the Management Agent page; Ops Insights provides direct links to the Management Agent details page from the Host and Database (via the External Database Connector) administration pages.
- If the agent is in SILENT or NOT AVAILABLE state, check the health of the agent:
- Verify the agent status on Linux.
- Double check the Management Agent installation prerequisites (Steps 4 and 5)
- If the Management Agent is not ACTIVE, try performing an agent bounce (start/stop).
- If this issue cannot be resolved, an agent-reinstallation may be required:
- Delete the Management Agent.
- After completing this step, disable Ops Insights and then re-enable it using a newly installed Management Agent.
- If the agent is ACTIVE, there may be an issue with the agent's ability to upload Ops Insights metrics. This can be verified by checking the agent logs:
- On the host where the agent is installed, navigate to the agent log directory (
/opt/oracle/mgmt_agent/agent_inst/log/
)
- On the host where the agent is installed, navigate to the agent log directory (
- Perform the following search:
grep operationsinsights mgmt_agent_client.log
- If the status code for these calls is 404, confirm the Ops Insights prerequisites are in place (and were not removed after installation).
Troubleshooting Ops Insights Disablement Issues
Disabling Ops Insights sometimes fails when the Management Agent is unavailable. In this situation, do the following:
- Check the Agent Health. This can be done via the Management Agent page; Ops Insights provides direct links to the Management Agent details page from the Fleet administration pages.
- If the Management Agent is in SILENT or NOT AVAILABLE state, check the health of the agent.
- Verify the agent status on Linux.
- If the Management Agent is not ACTIVE, try performing an agent bounce (start/stop).
- If the Management Agent is not able to be started:
- Delete the Management Agent.resource.
- After completing this step, try to disable Ops Insights again.
SQL Insights Are Collected For Enterprise Manager Databases But No SQL Text Is Displayed
If your Enterprise Manager managed databases are collecting SQL Insights but nothing is displayed ensure the Enterprise Manager agents installed are version 13.5 Release Update 13 (13.5.0.13) or higher. For more information see MOS note 2864085.1.Host Resource Shows as Needs Attention
Oracle Cloud Agent (OCA) enabled host resources show as Needs Attention when reviewing the host fleet page. When reviewing your Oracle Cloud Agents (OCA) on the instances, they are running, however when reviewing under Observability & Management, and selecting Agents, these are are not visible. This error is due to insufficient permissions or permission issues to create directory or files under the /var/lib/oracle-cloud-agent/plugins/
path.
- Stop the oracle-cloud-agent:
sudo systemctl stop oracle-cloud-agent
- Clean up the old management agent plugin, with the following commands:
cd /var/log/oracle-cloud-agent/plugins/
rm -rf oci-managementagent/*
cd /var/lib/oracle-cloud-agent/plugins/
rm -rf oci-managementagent/*
- Restart Oracle Cloud agent:
- sudo systemctl stop oracle-cloud-agent
- sudo systemctl start oracle-cloud-agent
- Ensure that the
/var/lib/oracle-cloud-agent/plugins/
directory has all the permissions to create directory and files. - Create an
oci-managementagent
directory.Note
If any permission issues occur on /var/lib/oracle-cloud-agent/plugins/, you need to manually create theoci-managementagent
directory for/var/lib/oracle-cloud-agent/plugins/
. - Restart OCA:
sudo systemctl stop oracle-cloud-agent
sudo systemctl start oracle-cloud-agent
- Disable and re-enable Host Fleet, navigate to OPSI Administration, select Host Fleet. First click Disable OPSI for this host, once disabled select Re-Enable OPSI for this host. After this it will show as Active.