CrowdStrike blames bug
CrowdStrike says a bug in its quality control system led to a software update that caused widespread computer crashes last week.
In possibly the largest tech infrastructure failure of all time, the incident resulted in massive global disruption, affecting sectors from aviation to banking.
The full extent of the damage from the failed update is still under investigation.
Microsoft has reported that approximately 8.5 million Windows devices were impacted.
The US House of Representatives Homeland Security Committee has requested cybersecurity firm CrowdStrike CEO George Kurtz testify regarding the incident.
It has been estimated that US Fortune 500 companies, excluding Microsoft, could incur losses well into the billions of dollars as a result of the failure, along with estimates that the issues cost the Australian economy upwards of $1 billion too.
Malaysia's digital minister has urged both CrowdStrike and Microsoft to consider compensating the affected companies.
The company says the issue originated with CrowdStrike's Falcon Sensor, an advanced security platform designed to protect systems from malware and cyberattacks.
The fault caused Windows systems to crash and display the ‘Blue Screen of Death’ (BSOD).
“Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data,” CrowdStrike said in a statement this week.
This internal quality control failure allowed the flawed data to bypass safety checks, ultimately leading to the crashes.
The company did not specify what the problematic content data entailed or why it was considered problematic.
A ‘Template Instance’ is a set of instructions that directs the software on what threats to detect and how to respond. CrowdStrike has since introduced a “new check” to its quality control processes to prevent similar issues in the future.
Although CrowdStrike provided information to fix the affected systems last week, experts have indicated that the recovery process will be time-consuming, involving the manual removal of the flawed code.
CrowdStrike’s Preliminary Post Incident Review (PIR) says that on July 19, two additional InterProcessCommunication (IPC) Template Instances were deployed.
A bug in the Content Validator allowed one of these instances to pass despite containing problematic data. This resulted in an out-of-bounds memory read in the Content Interpreter, triggering a Windows crash.
To prevent such incidents in the future, CrowdStrike plans to add more validation checks to the Content Validator for Rapid Response Content.
Additionally, the company intends to implement a staggered deployment strategy for updates, starting with a small subset of systems before broader deployment.
The company says its full Root Cause Analysis will be released publicly once the investigation is complete.