CrowdStrike Reveals Rapid Response Content Update Caused Global Outage
Read more coverage on the CrowdStrike IT outage:
CrowdStrike has published a preliminary Post Incident Review (PIR) into the global IT outage on July 19, which was caused by a bug in a content update for its Falcon platform.
The cybersecurity vendor revealed the incident was caused by a Rapid Response Content update containing an undetected error.
The issue impacted 8.5 million Windows devices globally. All Windows hosts running sensor version 7.11 and above that were online between Friday, July 19, 2024, 04:09 UTC and Friday, July 19, 2024, 05:27 UTC and received the update were affected.
The incident continues to disrupt critical sectors such as airlines, banks, media and healthcare.
The defect in the content update was reverted on Friday, July 19, 2024, at 05:27 UTC, and fixes and workarounds for affected customers have been deployed.
CrowdStrike Reveals How the Issue Occurred
CrowdStrike explained that it delivers security content configuration updates to its sensors in two ways:
- Sensor Content that is shipped with its sensor directly
- Response Content that is designed to respond to the changing threat landscape at operational speed
The July 19 issue was not triggered by Sensor Content, which is only delivered with the release of an updated Falcon sensor. CrowdStrike noted that customers have complete control over the deployment of the sensor.
Instead, the bug was part of a Rapid Response Content update to sensor version 7.11 on February 28, 2024.
This version introduced a new InterProcessComminication (IPC) Template Type to detect novel attack techniques that abuse Named Pipes, and followed all of CrowdStrike’s Sensor Content testing procedures.
On March 5, CrowdStrike carried out a stress test of the IPC Template Type within its staging environment. This was passed, and an IPC Template Instance was released to production as part of a content configuration update.
Three additional IPC Template Instances were subsequently deployed between April 8 and April 24, all of which performed as expected in production.
On July 19, two additional IPC Template Instances were deployed. One of these instances passed validation despite containing problematic content data.
CrowdStrike said both instances were deployed as a result of the earlier successful testing performed before the initial deployment of the Template Type, trust in the checks performed in the Content Validator, and previous successful IPC Template Instance deployments.
However, when the instances were received by the sensor and loaded into the Content Interpreter, the problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception.
This then resulted in the Windows operating system crash and the blue screen issue.
CrowdStrike Promises Changes to Testing Processes
CrowdStrike said it plans to roll out improvements to its Rapid Response Content testing processes to prevent similar issues occurring in the future.
This includes using testing types for these features such as:
- Local developer testing
- Content update and rollback testing
- Stress testing, fuzzing and fault injection
- Stability testing
- Content interface testing
The firm also plans to add additional validation checks to the Content Validator for Rapid Response Content to prevent similar problematic content being deployed in the future, as well as enhance exiting error handling in the Content Interpreter.
Further steps CrowdStrike plans to reduce the risk of bugs in Rapid Response Content deployment are:
- Implement a staggered deployment strategy for Rapid Response Content in which updates are gradually deployed to larger portions of the sensor base, starting with a canary deployment
- Improve monitoring for both sensor and system performance, collecting feedback during Rapid Response Content deployment to guide a phased rollout
- Provide customers with greater control over the delivery of Rapid Response Content updates by allowing granular selection of when and where these updates are deployed
- Provide content update details via release notes for customers
Image credit: VDB Photos / Shutterstock.com