CrowdStrike

aosemwengie1 · July 22, 2024, 2:41pm

Right, that is what I said. It was just confusing because one happened right after the other.

klincecum · July 22, 2024, 2:42pm

Man Toronto GIF by Bernardson

Evan_Purdy · July 22, 2024, 2:43pm

And stagger your roll out so you can pull the plug if the first group experiences issues…

Olga · July 22, 2024, 2:45pm

Azure admin certification contains a lot of info how they divide VMs into upgrade domains to not upgrade simultaneosly all machine in the groups…

Mark_Wonsil · July 22, 2024, 2:47pm

From what I’ve read over the weekend, there were two separate incidents:

Azure Central US Failure

CrowdStrike Update

CrowdStrike is an Extended Detection and Response (EDR) system that finds current active threats and shares them with their users to reduce the time threat actors have to make successful attacks. Waiting too long would endanger their customers. Some have blamed Microsoft for allowing other companies to run kernel drivers that caused this issue. It turns out that it is anti-competitive for Microsoft to have access to the kernel and not to allow other companies access.

klincecum · July 22, 2024, 2:50pm

Yes, and some got hit by both unfortunately.

Friday wasn’t a good day.

Mark_Wonsil · July 22, 2024, 2:55pm

I know of a university that had a cyber-incident last month and they had just finished restoring all systems, which included adding CrowdStrike. Now they have to touch 'em all again. HugOps to that IT crew.

Randy · July 22, 2024, 3:03pm

But you were at the water park on Friday.

monkey GIF

dcamlin · July 22, 2024, 3:11pm

I did catch this in another article… so I’m not sure why it is anti-competitive for MS, but not MACOS?

klincecum · July 22, 2024, 3:13pm

Thursday, Friday I was taking calls from people and letting them know I can’t do anything.

Luckily, we were mostly good by about 5AM.

Mark_Wonsil · July 22, 2024, 3:13pm

Nor Google.

klincecum · July 22, 2024, 3:14pm

When you control the information, it’s easier to get by with Anti-Trust / Etc issues.

klincecum · July 22, 2024, 3:16pm

Obligatory

Ernie · July 22, 2024, 3:17pm

It is not possible to test for a situation you can’t imagine. That’s what all our checklists and procedure documents are for… they are lists of scenarios that we (collectively) have imagined (and/or experienced).

“Mistakes”, whether of the dumb or honest variety, are most easily AVOIDED (cannot remove ALL probability but can certainly reduce it) by having multiple people run the checklist.

The root cause of the Crowdstrike outage will be analyzed to death and added to that checklist… which is why the After Action Review is so important.

dcamlin · July 22, 2024, 3:23pm

I loved this part of the below article:

14 years ago (2010):
“Defective McAfee update causes worldwide meltdown of XP PCs.”
In that case, McAfee had delivered a faulty virus definition (DAT) file to PCs running Windows XP. That file falsely detected a crucial Windows system file, Svchost.exe, as a virus and deleted it.

In the “You Can’t Make This Up Department”… CrowdStrike’s founder and CEO, George Kurtz, was McAfee’s Chief Technology Officer during that 2010 incident.

klincecum · July 22, 2024, 3:28pm

Speaking of the above and McAfee, go do a little reading on McAfee the man himself…

Randy · July 22, 2024, 3:41pm

Oh Jeebus I remember that event. Took us days to fix all our computers.

Ernie · July 22, 2024, 3:48pm

IMHO (not really all that humble), anyone running enterprise-level stuff on MacOS or Android ~~deserves what they get~~ probably shouldn’t be suprised.

deepak · July 22, 2024, 3:53pm

On a tablet, only two options are Android and IOS. And I don’t believe categorically declaring Android as bad is accurate.

Every platform including Windows has it’s quirks. It’s a matter or how much TLC it needs.

Ernie · July 22, 2024, 4:02pm

You are correct… and I should have specified “enterprise-level servers”. Personal devices are not going away anytime soon… I don’t have smart watch or refrigerator yet, but I DO have a web-connected home security system. Everything receives updates.