2025.2 flex 2 upgrade this weekend......and a "small subset of customers" outage at 12p ET the day of it. :(

EPILOGUE (no pun intended)

We’re up on 2025.2 and so far things seem fine. Logged on yesterday and checked what I thought could be pain points (inbound EDI, printing via Network Edge Agent, system agent tasks, etc.) and no issues thus far. No emails from our shipping or production teams as they start 7a ET before everyone else…

So maybe things will be OK for everyone else today too…as our ‘small subset’ waits a couple more days for the RCA document on Friday’s outage.

4 Likes

Even a technical person could write a bad BAQ by mistake. I don’t think there’s anyone here that can say they’ve never wrote a bad query, ever.

One customer running a bad query shouldn’t take out a ‘small subset’ of customers.

3 Likes

No. One customer running a bad query shouldn’t take out ANY other customers.

(I did say break a cluster :slight_smile: )

1 Like

Still waiting on that RCA for the outage on 3/20…

3 Likes

Ive Waited Too Long GIFs - Find & Share on GIPHY

1 Like

I don’t know why you are so impatient lol. It’s going to say the exact same thing as all their other RCAs - absolutely nothing.

1 Like

It’s the simple matter that they’ve left it hanging so long that grinds my gears. I just keep updating the EpicCare case every day with this:

Told RCA would be available within 5 business days. Today is business day ## and calendar day ##.

Even if the eventual RCA provides no insight into why our production environment crashed for two hours in the middle of a business day…again

6 Likes

Closed the case today. Finally realized it’s easier to do that and escape the disappointment of an inconsequential and insufficent explanation. Pretty obvious the outage (nor addressing the root cause) mattered very little in the end.

Did you leave a 1 on the survey? That typically gets attention. At Insights, the C-levels really like to show off their survey results. If this year is radio-silence, that’ll speak volumes.

I haven’t even gotten the survey option yet - it’s still “pending tasks” on their end. When the survey option comes up I’ll be sure to rate them accordingly.

:face_with_symbols_on_mouth:

1 Like

Epicor seems to have stopped reaching out about 1s on surveys. I’ve unfortunately had to leave a couple 1s recently and never heard anything from Epicor. In one case support completely was wrong and I pointed it out. Nothing back.

2 Likes

Ya Dont Say GIFs - Find & Share on GIPHY

1 Like

Woke up this morning to find the RCA posted to the case.

The RCA is dated April 1. Today is April 10.

What follow-up corrective actions have or will be taken?
This was an issue of server under load. To improve system behavior following action items are undertaken in future behavior improvement.
• An engineering investigation has been initiated to understand and address slow connection release behavior
• Monitoring and alerting for CPU and connection usage are improved to get alerted for similar behavior in future
• Configuration improvements are being evaluated to ensure connections are released more efficiently

Conclusion
This incident was the result of resource exhaustion driven by high CPU utilization and delayed connection release. Immediate service restoration was achieved through a restart, and steps are in progress to improve system resilience, monitoring, and operational response to prevent recurrence.

Looks like ChatGPT wrote it.

@aosemwengie1 was right

5 Likes