i once spoke with a consultant about this happening in my previous 10.2 environment…we could not find the cause of the issue as it would happen at different times throughout the day/month/year and never would happen during testing heavy reporting or mrp process’. His recommendation, which i never got around to trying, was to add a second task agent for redundancy. this obviously is a bandaid but in theory it would allow your reporting to still function and you would need to periodically check the task agents and reset them. i didnt love the idea of not finding the route cause and just having yet another system to manually check. Windows task scheduler also didnt resolve the issue; tried setting a task to stop and start the task agent service.
When it comes to something like this, there are a lot of variables in play. The good news it seems to happen less often in the more current releases than it used to.
Assuming you are on premise, how are your Epicor services configured? Do you have just a single server and the database, application server, and task agent are all on the same server? Are they each on a separate one?
Is there a specific task that “hangs” the Task Agent each time? Are the tasks that cause it to hang manually submitted by users or via automation like a data directive, etc?
We have two servers, one of which stores the database. The Epicor was installed on another server with the task agent. We just manually print AP and AR invoice entries or payment entries. It worked most of the time and got stuck in the scheduled task once or twice weekly.
I think it has something to do with when the task agent is in standby mode. However, I’m not sure how to troubleshoot that.
In my experience there are two common issues that come up all the time.
One is due to server restarts and the timing of the services starting up. If the Task Agent starts before the App Server is done spinning up, it shows that it’s running but isn’t really connected. Usually setting the service to Delayed Start resovles this problem. The symptom of this is usually that it’s always down first thing in the morning.
If it’s hanging trhoughout the work day, that is usally a problematic task of some sort. For example, at one point back in E9 you were able to run the BOM Cost report wide open with no selection criteria. In a large enough dataset this would crash the Task Agent every time. Another example might be problematic Methods causing problems when MRP runs.
If the problem is task related, it could happen at any time that users might run the problematic task. Do the server’s Event Logs shed any light on what is happening?
Before restarting the agent/service make note of whether there are “stuck” tasks in process. See if a pattern develops.
I’m not convinced that auto restarting the service multiple times a day is a good idea. What if there is a process or a post running when you restart the service?