Support and Incident Management
Incident management is the end-to-end business process of identifying, analyzing, and resolving an outage or service disruption. The goal of incident management is to keep services running or restore them as quickly as possible, while minimizing the impact to the business.
Incident Management Is Important
Service interruption incidents can be extremely costly to your business and its teams. Incidents can disrupt operations, lead to temporary downtime, and contribute to the loss of data and productivity. Incident management provides teams with a reliable method to prioritize incidents, get to resolution faster, and offer better service for users.
Benefits of Incident Management
Some of the benefits of incident management include the following:
- Increased productivity and efficiency.
- Increased visibility and transparency.
- Improved mean time to resolution (MTTR). MTTR is a combination of the average time to detect, diagnose, and mitigate incidents.
- Improved customer and employee experience.
- Prevention of incidents.
Oracle Cloud Infrastructure Support
When using Oracle Cloud Infrastructure, sometimes you need to get help from the community or talk to someone in Oracle support. For information about support options, see Getting Help and Contacting Support.
Recommendations
Design a support and incident management strategy to support your environment and minimize service disruptions.
Proactively define your support and incident management strategy wherever possible, but learn from experience and adjust your practices as needed.
Put controls in place to prepare and respond to incidents. Recommendations include:
- Use a system to determine risks, threats, vulnerabilities, and impacts related to security
- Use a security information and event management (SIEM) system
- Set up a security operations center (SOC)
- Set up an incident response team
- Implement incident detection, response, and reporting
- Define escalation paths
- Build a standard post-mortem mechanism
Develop an operations strategy to detect, prevent, respond to, and recover from events. Recommendations include:
- Monitor system performance metrics
- Document and test a disaster recovery plan
- Understand key roles needed for disaster recovery coordination
- Plan for interactions with Oracle Cloud Infrastructure support
- Respond to incidents
- Simulate attacks based on real incidents
- Prepare for application failure
- Recover from data corruption
- Recover from network outage
- Recover from a dependent service failure
- Recover from a region-wide service disruption
- Learn from disaster recovery tests, and improve processes
- Expect failure and learn from mistakes
We recommend that you formalize a support contract with Oracle or an approved partner to help keep your organization's systems running at peak performance. Leverage these partnerships when critical events are scheduled, such as migrations or expected increases in demand. Doing so ensures that you can benefit from the right support, best practices, and expertise. It can also ensure a feedback mechanism directly with Oracle engineering for continuous improvement of the platform.
Explore More
Documentation and other resources:
- Getting Help and Contacting Support
- Oracle Support Rewards
- Oracle Support for Oracle Cloud Infrastructure
My Oracle Support (sign in required):
- Oracle Support Training and Resources
- Working Effectively With Oracle Support - Best Practices
- Video: Oracle Support Essentials Series - Working Effectively with Support
OCI Status Dashboard:
Blogs: