This is probably a “You should have known better…” moment, but here goes. Treat me like the newbie I am, like I don’t know the obvious things.
Last night I knew there was server maintenance happening to Kinetic production, however I did not pause our night scheduled MRP run (midnight).
Today our MRP was hung up and still running more than 12 hours later. Being a Sunday I had not checked but we are only 3 weeks into Go Live so production and finance teams are working through the weekends to try and catch up on the overwhelming load of mess we created for ourselves in Go Live (living the dream). The teams started emailing this afternoon being unable to access buyer workbenches or other areas because MRP was running. Luckily I had a DataFix from a recent MRP hung process ticket with Epicor so I was able to stop the process. In reviewing the log files I can see the line:
“—> Microsoft.Data.SqlClient.SqlException (0x80131904): SHUTDOWN is in progress.”
That sure seems like something that would stop an MRP process.
Questions:
A) Was that the likely culprit of our hung up MRP run?
2) Should I as a policy intentionally pause MRP runs during updates and maintenance windows?
III. Do you tend to not run any automated tasks on Saturday nights as those are the normal maintenance windows?
Yeah, stuff gets stuck for me all the time when they do maintenance etc., I just usually plan to check it after the fact and get anything unstuck i need to and re-run if applicable.
“fun”
I have automated tasks running all across the board timings wise. Some serious thought could probably create a function that disables tasks during service windows, reading from a ud table that you’d upload the maintenance calendar to.. but that sounds like.. more fun..
Yeah, it’s from their maintenance. Happens on-prem to if you’re doing maintenance while tasks are running. They get hung up when SQL or the app server become unavailable.
Our MRP runs overnight on a schedule. Since out IT patches their servers always on weekends, we’ve scheduled MRP to not run Fri>Sat and Sat>Sun. This prevents IT interfering with it when they do their stuff.
Additionally, we’ve set up a BAQ on Ice.Systask that returns all rows for ‘Process MRP’ tasks that have status = ACTIVE or CANCELED and EndDate = NULL. These are tasks that didn’t execute properly and which will block any new MRP task from running. Sometimes this may happen when us Epicor admins work on the server and we forget that someone may run MPR during the day…
I’ve put this BAQ as a widget on my Epicor home page. If I see an entry in there, I then go to the SQL server (we are on-prem) and I delete this stuck entry in the Ice.SysTask table. However, since this widget is empty most of the time, you start not really looking at this and we’ve missed that once. So I am considering creating a BAQ report on this BAQ and to schedule this to be emailed early in the morning to get notified.
I like that strategy. I think there are some reports that I don’t want to get when there’s nothing to show and others where it would be nice to get an empty report so that I know it ran. I hadn’t considered your idea. One more tool in the toolbox. Thanks!
Did some similar things for heartbeat/wellness checks…a scheduled task to spit out a single-page GL report via email early-AM to make sure the System Agent was up…also did one to verify our ODBC connection. Peace of mind.
Just to be clear, imagine an App with a list of running tasks with the last activity date. If you know that it isn’t running, you highlight it and press delete. That’s what the Epicor Idea is asking for. So when YOU KNOW there’s a problem, YOU can fix it without burning up Support time.
Also, it would be kind of cool to only list processes (Windows or Linux) that don’t exist any longer, like when the container is killed during maintenance, or are no longer accumulating any CPU cycles. We don’t want people accidentally killing the wrong process!