Major Incident Management process in Service Operations Workspace

  • Release version: Xanadu
  • Updated August 1, 2024
  • 3 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Major Incident Management process in Service Operations Workspace

    The Major Incident Management process in Service Operations Workspace addresses incidents with high impact and urgency that disrupt critical business services and affect many users. This process enables ServiceNow customers to coordinate a rapid and efficient response to minimize service interruption and business impact. It ensures proper management, communication, collaboration, and follow-up actions to resolve major incidents effectively.

    Show full answer Show less

    Major Incident Management Phases

    Identification and Detection

    Major incidents are identified through configured trigger rules or agent review. Incidents can be automatically promoted or proposed as major incident candidates. A major incident manager reviews and promotes candidates or can create major incidents directly, ensuring timely recognition and escalation of critical issues.

    Communication and Collaboration

    Effective communication is vital throughout the incident lifecycle. ServiceNow allows defining communication plans tailored by incident priority and audience, supporting notifications via email, SMS, and collaboration tools like Microsoft Teams. These features keep IT teams, business stakeholders, and end users informed and engaged, enabling focused resolution efforts.

    Resolution

    The resolution phase involves not only resolving the major incident but also all related child incidents. ServiceNow facilitates notifying individual users upon resolution, helping restore service and close the incident comprehensively.

    Problem Record Creation

    To analyze root causes and prevent recurrence, a problem record is created after resolving a major incident. This can be automated or manual, with options to copy relevant incident information, enabling structured root cause analysis and continuous improvement.

    Post Incident Report Review

    After resolution, a post incident report (PIR) is generated to review the incident timeline and response effectiveness. Customers use the PIR to identify improvements in processes and prevent future incidents. The PIR can be updated during the review before sharing with stakeholders, supporting organizational learning and transparency.

    Key Outcomes

    • Minimized business impact through rapid identification and coordinated response to major incidents.
    • Clear communication plans and collaboration tools ensure stakeholders are informed and engaged.
    • Comprehensive resolution covering all related incidents with automated user notifications.
    • Root cause analysis enabled by problem record creation for sustained service improvement.
    • Structured incident reviews via post incident reports to refine response processes and prevent recurrence.

    A major incident has a high impact and urgency that affects a large number of users and deprives the business of one or more crucial services. Given the urgency of the situation, a well-coordinated response process is required to accelerate the resolution and minimize the business impact.

    When responding to major incidents, an effective and efficient system is needed to minimize the high impact of service interruption. An efficient system involves the following actions:
    • Minimize the impact of service interruptions.
    • Ensure that an appropriate incident manager, major incident team, or management group is in place to manage a major incident.
    • Communicate to the stakeholders about the service interruptions, degradations, resolutions, and other major incident updates.
    • Collaborate with stakeholders to resolve the major incident and restore service.
    • Create a problem record to analyze the root cause.
    • Generate a post incident report (PIR) to review each major incident once the service is restored.

    The Major Incident Management process can be classified in the following phases.

    Major Incident identification and detection
    The first phase in the process is to identify a potential major incident candidate. You can detect and identify major incidents in the following ways:
    • An incident is detected based on the configured major incident trigger rules and is either proposed as a major incident candidate or promoted directly to a major incident automatically.
    • An agent reviews the incident information to identify if the incident should be proposed to a major incident candidate. If proposed, a major incident manager reviews the information of the major incident candidate and promotes it to a major incident.
    • A major incident manager creates a major incident directly without the proposal process.
    Communication and collaboration
    The second phase is communication. Proper communication during a major incident is crucial to ensure that the IT teams, business stakeholders, and end users are informed about the impact and progress of the major incident.

    Communicating throughout a major incident requires a comprehensive communication plan, including whom to contact, the methods and frequency of communication, and messaging channels used to support, such as email and SMS. A communication plan enables the incident response team to focus their efforts on the resolution process and to set expectations for any future communications.

    You can define one or more communication plans based on the communication type, the priority of the incident, and the target audience of the major incident. Throughout the life cycle of the major incident, notifications and status updates are sent to the stakeholders to keep them informed and involved.

    Along with communications, effective collaboration with IT teams and other business stakeholders is also important when resolving a major incident. You can use communication channels, such as Microsoft Teams conference calls, to collaborate and work toward issue resolution. The Incident record page in Service Operations Workspace provides various controls for collaboration. For more information, see Collaborate with stakeholders during a major incident.

    Resolution
    The next phase in the major incident life cycle is resolving the major incident. Resolving a major incident also involves resolving all associated child incidents, and the individual callers receive a notification about incident resolution.
    Problem record creation
    A problem record must be created to analyze the root cause of the major incident. You can configure the Create problem from major incident flow to create a problem record automatically after a major incident is resolved or manually create a problem record, as required. You can also configure if the major incident information is automatically copied to a problem record when a problem record is created.
    Post incident report review
    The final phase of a major incident life cycle is generating a post incident report (PIR). After the major incident is resolved, a PIR is generated. You can use a PIR to analyze the incident and understand how to help prevent a similar incident in the future. This review also provides an opportunity to evaluate the incident response process and identify areas for improvement. The report contains the timeline of events that occurred after an incident is created. You can review and update the PIR during the review process before it’s shared with stakeholders.