Critical Next Steps For Tech And Security Leaders

What We Know – And What To Do Now

Technology leaders to find that a software update by cybersecurity vendor CrowdStrike had gone badly wrong and disrupted major systems at organizations across both countries. The impact has spread globally, with airports, governments, financial institutions, hospitals, ports, transportation hubs, and media outlets facing significant operational disruptions. Worldwide, airlines are urging people not to come to the airport (American Airlines, Delta, and United halting operations for a time). Surgeries have been canceled in hospitals globally and emergency response in some cities is impacted. The outage will have severe economic impacts as well as having impact on health and wellbeing.

Earlier on Friday morning, CrowdStrike issued what seemed to be a routine software update to its Falcon sensor (endpoint protection, XDR and CWP) software. The update caused Windows hosts running CrowdStrike Falcon (with its kernel-based threat protection) to fail to boot and hang on a Blue Screen of Death (BSOD). George Kurtz, CrowdStrike CEO confirmed in an update on X this morning that “Mac and Linux hosts are not impacted”. .

Because of the way in which the update has been deployed, recovery options for affected machines are manual and thus limited: administrators must attach a physical keyboard to each affected system, boot into Safe Mode, remove the compromised CrowdStrike update, and then reboot (see the official CrowdStrike knowledgebase article here). Some administrators have also stated they have been unable to gain access to BitLocker hard drive encryption keys to perform remediation steps. Administrators should follow CrowdStrike guidance via official channels to work around this issue if you’re impacted.

Forrester recommends that tech leaders should do the following immediately:

  • Empower authorized system administrators to fix the problems quickly and effectively. This includes backing up hard disk encryption (BitLocker or other third party) keys as these may be critical for recovery in such as well as using privileged identity management (PIM) solutions for break-glass emergency situations.
  • Communicate effectively and clearly. Communicate clearly both internally and externally the impacts, status and progress of the remediation efforts. Enlist marketing and PR if available to craft that messaging. . Stay grounded in the realistic impacts not theoretical worst-case scenario and keep an even tone.
  • Watch your back. Crisis events require all hands-on deck response but be sure to reserve a few analysts to continue monitoring other systems. Threat actors may use this time to attack since you’re distracted.
  • Pay attention to the vendor’s communication strategies and follow official advice. TheFollow official channels for instructions on addressing issues. Following social media advice may result in inconsistent, conflicting or outright incorrect/damaging advice.
  • Look after your people. This disruption hit on Friday evening in some geographies, right as people were headed home for their weekend. Tech incidents like this require an all-hands-on-deck approach, and your teams will be working 24/7 over the weekend to recover. Support your teams by ensuring they have adequate support and rest breaks to avoid burnout and mistakes. Clearly communicate roles, responsibilities, and expectations.

What To Do After The Crisis Subsides

Tech leaders should take the following steps once the immediate issue is fixed:

  • Implement infrastructure automation. Infrastructure automation is a must have for controlled and managed software rollouts. While a manual recovery is not possible in this specific instance,  tech leaders should use infrastructure automation where possible to avoid manual recovery procedures. Tech leaders should develop rollback and regression capabilities and test them often to ensure you can recover to a prior state.
  • Refresh and rehearse your IT outage response plan. Regular practice of major outage response plans is vital, as is the requirement to put into practice what you learn. Tech leaders should develop the IT outage response plan, develop contingencies and communications protocols for all major systems, services, and applications and all associated recovery procedures for working with and restoring them. Create and practice a “back-out” procedure specifically for updates that don’t go as planned to return to a known, good state.
  • Get unified, written warranties from security vendors on their quality assurance processes as well as threat detection effectiveness . CrowdStrike offers a warranty if you suffer a breach while using its Falcon Complete platform, but this is specific to security breaches. Customers need to ask for business interruption indemnification clauses in the event of a software update gone awry such as the current CrowdStrike one. For software that runs in trusted spaces with automatic updates, especially those ones that impact/use kernel modules or otherwise may impact operating system stability, this could be seen as a necessary step towards building back trust.

What Tech Leaders Should Do In The Longer Term

Tech leaders should take the following steps in the longer term:

  • Reevaluate third-party risk strategy and approach. If a third-party risk management (TPRM) program is overly focused on compliance, you’ll likely miss significant events such as this one that impact even compliant vendors. Tech leaders can’t afford to overlook assessing the vendor against multiple risk domains such as business continuity and operational resilience, not just cybersecurity.  Tech leaders also need to map their third-party ecosystem to identify significant concentration risk among vendors, especially those that support critical systems or processes.
  • Use the contract as a risk mitigation tool. Tech leaders, procurement & legal teams should update language to include new security and risk clauses that assign accountability during disruptive events and clearly outline timeframes for vendors to patch and remediate.   Consider using such incidents and their impacts as a basis for implementing measures in contracts or Service Level Agreements (SLAs).  If vendors push back, you’ll need to consider whether the price you negotiated still makes sense, and possibly whether to do business with them at all.

While Forrester is not a tech support firm, analysts are available to help you navigate this crisis and its longer-term repercussions. Forrester clients can request an inquiry or guidance session to discuss any of the above topics.