Delta’s CrowdStrike recovery stymied by crew-tracking systems failure
Dive Brief:
- Delta Air Lines is struggling to recover normal operations after a faulty CrowdStrike software update brought down devices using the Microsoft operating system Friday.
- The carrier has “a significant number of applications” that use Microsoft Windows, Delta CEO Ed Bastian said in a Sunday customer update. “In particular, one of our crew tracking-related tools was affected and unable to effectively process the unprecedented number of changes triggered by the system shutdown,” he said.
- Delta was the domestic carrier hit most heavily by the outage, according to FlightAware. The airline canceled 32% of its flights Friday and 36% on Sunday. As of Monday afternoon, Delta had canceled 817 flights, 22% of its scheduled routes, according to the flight tracking service.
Dive Insight:
The CrowdStrike snafu triggered a global crisis impacting millions of Windows devices and causing operational disruptions at banks and state and federal agencies. But airports crowded with stranded travelers keenly highlighted aviation industry vulnerabilities.
“Canceling a flight is always a last resort, and something we don’t take lightly,” Bastian said.
Airlines canceled more than 5,000 flights Friday and numbers that remained elevated over the weekend, according to FlightAware. Nearly 3,000 cancelations were reported Sunday and the daily total surpassed 1,500 late Monday afternoon.
Delta had to cancel more than 3,500 flights through Saturday, Bastian acknowledged. “The technology issue occurred on the busiest travel weekend of the summer, with our booked loads exceeding 90%,” he said.
Delta has canceled more flights than any other carrier over the last four days
Number of flights canceled by airline
Commercial carriers like Delta rely on interconnected operational technology to get passengers to their destinations. When bad weather or faulty code grinds one cog to a halt, even temporarily, it can precipitate systemwide shutdowns.
“Airlines who were customers of CrowdStrike have been especially impacted because there are multiple endpoint systems that play a critical role in checking people in, security checks, staffing planes and getting customers onto the plane,” Forrester Senior Analyst Brent Ellis said in an email.
The speed and effectiveness of the response depends on having the technical resources on site for direct interventions. It’s a luxury not all airlines have.
“Many businesses have reduced internal desktop support staff or outsourced that function to services providers who cut costs by doing remote support,” Ellis said. The few available local personnel are then overwhelmed by the volume of simultaneous technical support requests, he added.
More than half of Delta’s IT systems are Windows based, the company said in a Monday afternoon update. Teams had to “manually repair and reboot each of the affected systems” and it took additional time for applications to resynchronize, according to Delta.
The prolonged disruptions mirror the crisis Southwest Airlines faced in December 2022, when a severe storm overwhelmed the carrier’s crew reassignment software. Southwest was forced to ground nearly 17,000 flights as it struggled to resume normal operations for the duration of the holiday travel period.
Delta’s current woes are tied to a similar piece of technology, according to the Monday update.
“One of Delta’s most critical systems – which ensures all flights have a full crew in the right place at the right time – is deeply complex and is requiring the most time and manual support to synchronize,” the company said.
The Southwest shutdown, which cost the airline more than $1.1 billion, surfaced industrywide IT resilience concerns.
The industry’s reliance on third-party technical support likely compounded service disruptions this time around, Gartner Director Analyst Eric Grenier told CIO Dive via email.
“One of the big reasons that this was hard to recover from is that there was no way to perform the fix remotely,” Grenier said. “This means that there must be physical access to the system to perform the fix.”
Ironically, cyber defense strategy may have contributed to the severity of the CrowdStrike outage, too.
“Essentially, this was a perfect storm of issues related to anxiety about being protected from zero-day security exploits, which has made enterprises dependent on updates directly from security vendors,” Ellis said.