Troubleshoot Update Failures
Update operations can fail for various reasons. Typically, an operation fails because a database node is down, there is insufficient space on the file system, or the database host cannot access the object store.
This article includes information to help you determine the cause of the failure and fix the problem. The information is organized into several sections, based on the error condition.
If you already know the cause, you can skip to the topic with the suggested solution. Otherwise, use the Identify the Cause of Failure topic to get started.
The following topics are covered in this article:
- Identify the Cause of Failure
- Database Service Agent Issues
- Object Store Connectivity Issues
- Host Issues
- Oracle Clusterware Issues
- Database Issues
- Get Additional Help
Tip:
You can also create serial console connections to troubleshoot your system in single-user mode. For information on creating a serial console connection in the OCI Console, see Manage Serial Console Connection to the DB System.Identify the Cause of Failure
In the OCI Console, you can identify a failed update operation by viewing the update history of a DB system or an individual database. An update that was not successfully applied displays a status of Failed and includes a brief description of the error that caused the failure. If the error message does not contain enough information to point you to a solution, you can use the database CLI and log files to gather more data. Then, refer to the applicable section in this article for a solution.
The following topics are covered:
Identify the Root Cause of the Update Operation Failure
-
Log on to the host as the root user and navigate to the
/opt/oracle/dcs/bin/
directory. -
Determine the sequence of operations performed on the database.
dbcli list-jobs
Note the last job ID listed with a status other than Success.
-
With the job ID you noted from the previous step, use the following command to check the details of that job:
dbcli describe-job -i <job_ID> -j
Typically, running this command is enough to reveal the root cause of the failure.
-
If you require more information, review the
/opt/oracle/dcs/log/dcs-agent.log
file.You can find the job ID in this file by using the timestamp returned by the job report in step 2.
- If the update failure is on a 2-node RAC database, perform steps 3 and 4 on both nodes.
Database Service Agent Issues
Your database makes use of an agent framework to allow you to manage it through the Oracle Cloud platform.
The following topics are covered:
Resolve Update Failures Caused by a Stopped Agent
Occasionally you might need to restart the dcsagent
program if it has the status of stop/waiting to resolve a update failure.
Restart the Database Service Agent
-
From a command prompt, check the status of the agent:
initctl status initdcsagent
-
If the agent is in the stop/waiting state, try to restart the agent:
initctl start initdcsagent
-
Check the status of the agent again to confirm that it has the start/running status:
initctl status initdcsagent
Resolve Update Failures Caused by an Agent That Needs to Be Updated
Update operation can also fail if your agent needs to be updated. The system gives the following error message for this failure:
Current DcsAgent version is less than or equal to minimum required version.
To resolve this issue, perform the steps in the following section.
Contact Oracle Support to Update the OCI Database Service Agent
-
Confirm that the agent (dcsagent) and DCS Admin program (dcsadmin) are running using the following commands:
initctl status initdcsagent
initctl status initdcsadmin
-
If these programs are not running, use the following commands to restart them:
initctl start initdcsagent
initctl start initdcsadmin
- Follow the instructions in Get Additional Help to collect your DCS agent log files.
- Contact Oracle Support for assistance with updating the agent.
Object Store Connectivity Issues
The DB system and database updates are stored in OCI Object Storage. Therefore, successful update operations require connectivity between the DB system host and the Object Storage location from which the updates are downloaded.
The following topics are covered:
Ensure Your Database Host Can Connect to OCI Object Storage
-
Use the following command to verify the host can access OCI Object Storage:
dbcli describe-latestpatch
Example output indicating success:
componentType availableVersion -------------- -------------- gi 12.2.0.1.180417 gi 12.1.0.2.180417 db 11.2.0.4.180417 db 12.2.0.1.180417 db 12.1.0.2.180417 oak 12.1.2.11.3 oak 12.2.1.1.0
Example output indicating failure:
DCS-10032:Resource patch metadata is not found.Failed to download patchmetadata from objectstore
-
If you cannot connect to the Object Store, refer to Back Up a Database Using the Console for how to configure Object Store connectivity.
Host Issues
One or more of the following conditions on the database host can cause update operations to fail:
The following topics are covered:
Database Node Not Running During the Update Operation
All nodes of the database must be active and running while an update operation is in progress, whether you are updating the DB system or the database home. Use the OCI Console to check that the status of each node is AVAILABLE, and start the node, if needed.
The File System Is Full
Update operations require a minimum of 15 GB of free space in the /u01
directory on the host file system. Use the df -h
command on the host to check the available space. If the file system has insufficient space, you can remove old log or trace files to free up space.
Oracle Clusterware Issues
The following topics are covered:
The Oracle Clusterware Is Not Running
Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. The cluster software program must be up and running on the DB system for update operations to complete. Occasionally you might need to restart the Oracle Clusterware to resolve a update failure.
Restart the Oracle Clusterware
-
From command prompt, check the status of Oracle Clusterware:
crsctl check crs
Output:
CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
For more detailed status information, you can run
crsctl stat res -t
. -
If Oracle Clusterware is not online, try to restart the program:
crsctl start crs
-
Check the status of Oracle Clusterware to confirm that it is online:
crsctl check crs
The Oracle Grid Infrastructure (GI) is Not Updated
This problem occurs when you try to update a database before you update the DB system of that database. The error description indicates that the Oracle Grid Infrastructure must be updated first. To resolve this issue, update the DB system to latest available version. After you update the DB system, you can retry the database update operation.
To get the current and latest-available GI versions for the DB system, use the following command:
dbcli describe-component
Database Issues
An improper database state can lead to update failures.
The following topics are covered:
Database Not Running During the Update Operation
The database must be active and running for all of the update tasks to complete. Otherwise, you must run the datapatch task manually.
Check That the Database Is Active and Running
Use the following command to check the state of your database, and ensure that any problems that might have put the database in an improper state are resolved:
srvctl status database -d <db_unique_name> -verbose
The system returns a message including the database instance status. The instance status must be Open for the update operation to succeed.
If the database is not running, use the following command to start it:
srvctl start database -d <db_unique_name> -o open
If the database is mounted but does not have the Open status, use the following commands to access the SQL*Plus command prompt and set the status to Open:
sqlplus / as sysdba
alter database open;
Run the datapatch
Task
Before you run the datapatch
command, ensure that all pluggable databases (PDBs) are open. To open a PDB, you can use SQL*Plus to execute ALTER PLUGGABLE DATABASE <pdb_name> OPEN READ WRITE;
against the PDB.
$ORACLE_HOME/OPatch/datapatch
The datapatch
command should be run on each database home.
Get Additional Help
If you were unable to resolve the problem using the information in this article, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.
The following topics are covered:
Collect Diagnostic Information Regarding Failed Jobs
-
Log on to the host as the root user and navigate to the
/opt/oracle/dcs/bin/
directory. -
Run the following two commands to generate information about the failed job:
dbcli list-jobs | grep -i <dbname>
dbcli describe-job -i <job_ID> -j
The <job_ID> in the second command should be the ID of the latest failed job reported from the first command.
-
Run the diagnostics collector script to create a zip file with the diagnostic information for Oracle Support Services.
diagcollector.py
This command creates a file named
diagLogs-<timestamp>.zip
in the/tmp
directory.
Collect DCS Agent Log Files
To collect DCS agent log files, do the following:
- Log in as opc user.
-
Run the following command:
sudo /opt/oracle/dcs/bin/diagcollector.py
-
The system returns a message indicating that agent logs are available in a zip file at a specified directory. For example:
Log files collected to :/tmp/dcsdiag/diagLogs-1234567890.zip Logs are being collected to: /tmp/dcsdiag/diagLogs-1234567890.zip
Collect Oracle Grid Infrastructure and Database Log Files
If an Oracle Grid Infrastructure or Oracle Database update failed, you can find log files for these failures in the following locations:
Oracle Grid Infrastructure
$GI_HOME/cfgtoollogs/
Oracle Database
$ORACLE_HOME/cfgtoollogs/