Right, that is what I said. It was just confusing because one happened right after the other.

And stagger your roll out so you can pull the plug if the first group experiences issues…
Azure admin certification contains a lot of info how they divide VMs into upgrade domains to not upgrade simultaneosly all machine in the groups…
From what I’ve read over the weekend, there were two separate incidents:
Azure Central US Failure
CrowdStrike Update
CrowdStrike is an Extended Detection and Response (EDR) system that finds current active threats and shares them with their users to reduce the time threat actors have to make successful attacks. Waiting too long would endanger their customers. Some have blamed Microsoft for allowing other companies to run kernel drivers that caused this issue. It turns out that it is anti-competitive for Microsoft to have access to the kernel and not to allow other companies access.
Yes, and some got hit by both unfortunately.
Friday wasn’t a good day.
I know of a university that had a cyber-incident last month and they had just finished restoring all systems, which included adding CrowdStrike. Now they have to touch 'em all again. HugOps to that IT crew. ![]()
But you were at the water park on Friday.

I did catch this in another article… so I’m not sure why it is anti-competitive for MS, but not MACOS?
Thursday, Friday I was taking calls from people and letting them know I can’t do anything.
Luckily, we were mostly good by about 5AM.
Nor Google. ![]()
When you control the information, it’s easier to get by with Anti-Trust / Etc issues.
It is not possible to test for a situation you can’t imagine. That’s what all our checklists and procedure documents are for… they are lists of scenarios that we (collectively) have imagined (and/or experienced).
“Mistakes”, whether of the dumb or honest variety, are most easily AVOIDED (cannot remove ALL probability but can certainly reduce it) by having multiple people run the checklist.
The root cause of the Crowdstrike outage will be analyzed to death and added to that checklist… which is why the After Action Review is so important.
I loved this part of the below article:
14 years ago (2010):
“Defective McAfee update causes worldwide meltdown of XP PCs.”
In that case, McAfee had delivered a faulty virus definition (DAT) file to PCs running Windows XP. That file falsely detected a crucial Windows system file, Svchost.exe, as a virus and deleted it.
In the “You Can’t Make This Up Department”… CrowdStrike’s founder and CEO, George Kurtz, was McAfee’s Chief Technology Officer during that 2010 incident.
Speaking of the above and McAfee, go do a little reading on McAfee the man himself…
Oh Jeebus I remember that event. Took us days to fix all our computers.
IMHO (not really all that humble), anyone running enterprise-level stuff on MacOS or Android deserves what they get probably shouldn’t be suprised.
On a tablet, only two options are Android and IOS. And I don’t believe categorically declaring Android as bad is accurate.
Every platform including Windows has it’s quirks. It’s a matter or how much TLC it needs.
You are correct… and I should have specified “enterprise-level servers”. Personal devices are not going away anytime soon… I don’t have smart watch or refrigerator yet, but I DO have a web-connected home security system. Everything receives updates.
