All Posts

    Noam Morginstin Noam Morginstin
    Admin
    Nov 28 7 min read

    Automate insights-rich incident summaries with generative AI

    11669237_20945825

    Does this sound familiar? The incident has just been resolved and management is putting on a lot of pressure. They want to understand what happened and why. Now. 

    They want to make sure customers and internal stakeholders get updated about what happened and how it was resolved. ASAP.

    But putting together all the needed information about the why, how, when, and who, can take weeks. Still, people are calling and writing. Nonstop. They’re asking where is the incident summary, expecting you to explain what the root cause was and what preventative actions you’re taking. They can’t wait weeks.

    So, yes, resolving security and IT incidents is a very demanding task. But as we know, all too painfully, the post-incident phase is not without its challenges.

    And one of the big post-incident challenges is creating the summary report.

     

    The strategic role of the incident summary

    The incident summary is critical to the efficacy of IT and security teams. It facilitates the incident review – including the unfolding of the event, which systems and customers were impacted, what was the broader business and operational context, and how well (or not so well) processes were executed.

    This summary is what feeds the post-mortem and root cause analysis (RCA), so the organization can understand what happened, what needs to be fixed, and how to optimize the handling of future incidents.

     

    The summary challenge

    But putting together a robust summary that is accurate and which contains all the required details can be extremely time consuming. And often, it’s impossible to gather and compile all the information needed, and to do so in a timely manner.

     

    The information required for an incident summary
    • What went wrong and which systems were impacted 
    • Which customers or clients were impacted, and how many
    • What was the operational, security, and business impact of the incident, e.g., data breach, not meeting SLAs or KPIs, etc.
    • What was the severity level  
    • When was the incident first reported internally/to customers
    • Who was notified internally/externally
    • What was communicated to leadership and customers
    • Which actions were taken to achieve the resolution

     

    This information can be very difficult to put together. Many different people need to be chased for the information, and their memories need to be jogged (too often unsuccessfully).

    Even if the right people took copious notes to document every step of the way, there is rarely enough bandwidth to review them all, extract the right insights, and compile them for incident handlers and management to review.

    Moreover, when a similar incident happens in the future – and chances are that it will, how can the organization be sure that the insights achieved from the original incident will be applied when handling the latest one?

     

    80% of companies that suffer a ransomware attack will get hit again 
    40% of them will pay again 
    70% of those who do, will pay a higher amount

    (source: SecurityWeek)

     

    When something goes boom, the last thing anyone has time for is sifting through old reports and looking for similarities.

    Ultimately, what these teams need is:

    • An efficient, accurate, and automated way to create incident summaries
    • Search-friendly, real-time access to insights on how to handle the reoccurrence of a similar incident 
     

    The importance of accelerating the incident summary report

    The incident summary feeds the post-mortem, which should be held in 48 hours or less following resolution. Any later, and the team will likely forget what happened, how, and why, at least in part.

    This makes learning from mistakes and leveraging what works nearly impossible.

     

    Virtual War Room

     

    In comes generative AI

    Generative AI (genAI) is a transformative technology that’s changing how a lot of things are getting done today in our personal and professional lives.

    And for summarizing incidents, it possesses a number of important capabilities that make it an ideal tool for overcoming the challenge:

    • Synthesizing information from large volumes of data that’s stored across various sources.
    • Contextual understanding for picking up on  key incident details and their correlation.
    • Natural language processing (NLP) for understanding and interpreting incident data, and then generating easy-to-understand summary text.
    • Speed for accelerating the process of finding information, analyzing it, detecting the connections, extracting insights, and putting together the summary report.

    When genAI is on the team, it can automatically populate all the relevant data into a summary template as soon as the incident has been resolved, for a clearly articulated and coherent report.

    And because genAI is particularly adept at identifying patterns and synthesizing masses of data, it can detect the root cause and advise on preventive actions.

    Moreover, when a similar incident hits once more, handlers can query genAI for previous incidents and summaries that will shed light on which actions had brought about a fast and effective resolution, as well as which actions should be avoided.

    Sample incident summary created by generative AI

    When the incident took place: October 3rd, 2023.

    When first reported: 12:00 AM EDT, October 3rd, 2023.

    When resolved: 7:27 PM EDT, October 3rd, 2023.

    What happened: multiple clients using fault-tolerant agent servers experienced performance issues with batch processing. 

    Systems affected: The issue impacted multiple products, including the ERP, CRM, and helpdesk systems. 

    Severity: The severity of the incident was raised to Sev1 due to the broader impact, where a high priority case was initiated with each of the relevant vendors.

    Customer impact: The affected clients included VIP customers – Bank A, Bank B, and Bank C, as well as Telco X, Telco Y, and Telco Z. 

    Business impact: SLA commitments weren’t met. 

    Resolution: The actions that brought about the resolution included rolling back a password synchronization change and issuing a replan to balance the workload. 

     

    A note about data privacy & security

    It’s important to note that public generative AI platforms such as OpenAI’s ChatGPT should be avoided when seeking to automate the creation of IT and security incident summary reports.

    Since the contents used for prompts can later be delivered by the platform as answers to other unrelated parties.

    Therefore, the genAI used for the task should be private, local, and contained within a closed and trusted environment.

    This is the only way to avoid the risk of exposing sensitive information to third parties.

     

    In conclusion

    Optimizing incident handling is greatly dependent on the insights contained in a clear and actionable incident summary report.

    And with the increasing rates and damages of incidents, it is critical to be able to create these reports quickly, accurately, and automatically. The ground-breaking capabilities of generative AI make this possible.

    At Exigence we are committed to empowering IT and security teams with the most advanced technologies for taking incident resolution capabilities to a whole new level.

    Stay tuned. More on genAI-powered incident summaries and Exigence coming soon.

    In the meantime, to find out how we can help you optimize incident handling with intelligent automation, we invite you to reach out to us at info@exigence.io.

     

    New call-to-action

    Critical Incident Management major incident management CyberSecurity Incident Response Automating Critical Incident Management

    Critical Incident Management major incident management CyberSecurity Incident Response Automating Critical Incident Management