Major outages at CrowdStrike, Microsoft leave the world with BSODs and confusion

July 19, 2024:

A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.
Enlarge / A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.

Ezra Acayan/Getty Images

Millions of people outside the IT industry are learning what CrowdStrike is today, and that’s a real bad thing. Meanwhile, Microsoft is also catching blame for global network outages, and between the two, it’s unclear as of Friday morning just who caused what.

After cybersecurity firm CrowdStrike shipped an update to its Falcon Sensor software that protects mission critical systems, Blue Screens of Death started taking down Windows-based systems. The problems started in Australia and followed the dateline from there. TV networks, 911 call centers, and even the Paris Olympics were affected. Banks and financial systems in India, South Africa, Thailand, and other countries fell as computers suddenly crashed. Some individual workers discovered that their work-issued laptops were booting to blue screens on Friday morning.

Airlines, never the most agile of networks, were particularly hard-hit, with American Airlines, United, Delta, and Frontier among the US airlines overwhelmed Friday morning.

CrowdStrike CEO George Kurtz posted on X (formerly Twitter) at 5:45 am Eastern time that the firm was working on “a defect found in a single content update for Windows hosts,” with Mac and Linux hosts unaffected. “This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed,” Kurtz wrote.

A CrowdStrike engineer posted in the official CrowdStrike subreddit that the workaround steps involve booting affected Windows systems into Safe Mode or the Recovery Environment, navigating to a CrowdStrike directory, and deleting a .sys file and rebooting. If this works, it’s not something that can be done through a network push, so a lot of manual work remains to be done.

Microsoft services were, in a seemingly terrible coincidence, also down overnight Thursday into Friday. Multiple Azure services went down Thursday evening, with the cause cited as “a backend cluster management workflow [that] deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region.”

News reporting on these outages has so far blamed either Microsoft, CrowdStrike, or an unclear mixture of the two as the responsible party for various outages. Security consultant Troy Hunt was quoted as describing the dual failures as “the largest IT outage in history,” saying, “basically what we were all worried about with Y2K, except it’s actually happened this time.”

Ars has reached out to CrowdStrike, Microsoft, and a number of airlines for comment and will update this post with response.

This is a developing story and this post will be updated as new information is available.

Source link