All Posts

    Noam Morginstin Noam Morginstin
    Admin
    Oct 15 5 min read

    Microsoft’s 3 major incidents in 10 days, where did they go wrong?

    stress-1331259_640

     

    How to aggravate millions of users

    Just in case you haven’t heard, last week Microsoft experienced a huge outage that prevented users from accessing its Office 365 cloud-based subscription service which serves 200 million active monthly users.

    This latest outage was the third in ten days, causing the company to receive a deluge of customer complaints about a 'something went wrong' message that popped up when they tried to access their accounts.

    Downdetector.com, provider of real-time updates on issues and outages, showed that user reports had spiked at 14:26. And many users even published on various social media platforms the great anger and frustration they have over continued Office 365 outages.

    “Microsoft users vented their frustrations on social media as the outage left them unable to send and receive important emails.” (CNBC)

    It was only at 14:48, more than twenty minutes later that the Microsoft 365 Status Twitter account even acknowledged the issue. And only at 19:05, four and a half hours since the peak of complaints did the company announce that affected services were back up and running.

    As noted, unfortunately for Microsoft and users of Office 365, this was the third time in ten days that the service was rendered inaccessible for several hours at a time.

    Both on September 28th and on October 1st, similar issues had occurred.

     

    Three strikes and you’re out?

    Needless to say, no company wants to be in this situation, where your service is down for several hours. Clearly, the implications are dire.

    For, regardless of the cause, whether it’s due to putting new code into production, a cyberattack, or an increase in usage – the damage from an outage can be tremendous, impacting customer satisfaction and loyalty, revenues, compliance, and brand equity.

    When it comes to customer satisfaction, as the stats show, when something goes wrong, your customers will blame you. According to one survey, 66% of respondents believe that it’s the responsibility of the service provider to deal with the issue. And if the problem is not dealt with swiftly frustration and risk of attrition will rise.

    As for cost, in another survey it was found that one hour of downtime can cost most businesses at least $100,000, and for some even between $1- $5million.

    EBOOKDOWNLOAD

     

    Acceleration is the name of the game

    So, what can organizations and their IT teams do about this?

    Well, for one – they can’t bank on 100% prevention.

    Outages are inevitable. There will always be network failures, sudden increases in system and app usage, human error, and software malfunctions.

    But, whereas incidents are not avoidable, lengthy incident resolution certainly is.

    In the Microsoft case, even with three incidents in less than two weeks – the way to mitigate the risk and reduce losses is to profoundly accelerate incident resolution.

    This strategic need becomes all the more critical when we consider that outages are more likely and more frequent today than ever due to the massive mobilization of the workforce to the home following Covid-driven work from home directives.

    In fact, internet outages have reached an all-time high during the pandemic, with downtime increasing by an astounding 63% just in March.

     

    Don’t forget to add a good measure of cyberattacks

    Though Covid has not only accelerated the rates of outages. It has also prompted a dramatic increase in cyberattacks, with a 50% increase in the daily average of ransomware attacks, for example, during Q3-2020, compared with the first half of the year.

    Among the cybercrime trends that have been noted are a great increase in the proliferation of malware attacks that leverage social engineering to exploit a global pre-occupation with the virus. Furthermore, cybercriminals are also leveraging the increased use of Zoom video conferencing during lockdown to launch phishing attacks. 

    So, no matter how you look at it – whether through the lens of outages, or through the prism of cyberattacks, no organization can allow itself not to expedite as much as possible the resolution of a major incident.

    This is why it has never been more important to make sure that incident resolution capabilities, processes, and tools are at the top of their game.

    Virtual War Room

     

    How Exigence can help

    Exigence offers a platform for automated incident management and orchestration for profoundly improving resolution capabilities and processes.

    This platform empowers incident responders with complete command and control and oversight of critical incidents, whether for technology operations, security, or drills, and business continuity tests as well.

    It enables them with a structured framework for a typically unstructured process, enabling them to orchestrate stakeholders and tools and ensure that everyone is always on the same page and fully up to speed.

    Exigence automatically notifies and onboards all incident stakeholders, informing them of what type of incident has occurred, which systems have been impacted, and what their role is in the resolution effort.

    It also automatically opens a conf. call bridge and collaboration channel, so no time is wasted in getting everyone aligned. Then, throughout the incident resolution journey, it automatically updates stakeholders about each status change in accordance with their stake in the incident.

    And once the incident is resolved the team is notified of incident closure automatically, and incident summaries and root cause analysis reports can be created just by clicking a button.

     

    In closing

    The mandate to resolve incidents faster than ever is here to stay. However, with automated incident management and orchestration, the burden of speed, clarity, and efficiency can be lifted off the shoulders of those charged with the ever more important job of resolving major incidents.

    To learn more about how Exigence can help your team profoundly increase the speed and efficacy of incident resolution, we invite you to reach out to us at info@exigence.io.

    New call-to-action

    Critical Incident Management major incident management CyberSecurity Incident Response Automating Critical Incident Management

    Critical Incident Management major incident management CyberSecurity Incident Response Automating Critical Incident Management