We didn’t have the issue (or it happened so rarely during that time nobody reported it to me…) the first 1.5 weeks after transfer. We didn’t see it in any of our test transfers.
We opened a case, and after lots of discussion they did “server maintenance” that appeared to fix the issue. I closed the case, and then a number of days later all the problems came back. The second case is turning into a huge slog too. I used to have a direct contact in the cloud team, but they no longer respond to me
Do these happen at a specific time of day or randomly? We were having major issues with Epicor running that support did not find but we discovered the time was the same as when our Shadow Copies were being made. We moved the time on those and our errors cleared up.
After I typed I realized you were SaaS so this probably does not apply to you.
I haven’t noticed a time of day. I do notice there seems to be less early in the week and it picks up as the week goes. I don’t know if perhaps we just get more active and more orders and quotes, etc are coming in later in the week, or if its something getting worse over the week.
This almost sounds like the old error in early 10 where you had to restart the system monitor once a week. Or something like that. I am sure @Mark_Wonsil remembers.
I’ve already used that dashboard for this case. I don’t know if they need to turn on more or not, but I was seeing a few of the errors there, not as many as we see at the client.
It really sounds like something is hogging resources on your assigned server. If you had the logging turned on, it may point to a BPM that gets stuck (endless loop?)…
You know, shortly after we transferred, I was told by support to change all my email BPM’s to synchronous instead of async. That wouldn’t make the server hang while its sending emails, would it?
From MT to DT, huh? I would request SaaS team drop the DB indexes and rebuild all, or at least provide you with a fragmentation report, and ask them to check the db properties are set correctly (optimize for ad hoc, etc). Lastly ask to check the task agent configurations. Ask SaaS what they’re seeing in Application Insights and if you have any “noisy neighbors” sharing your App server.
Check ping plotter from your location to the DT server url (without /saas). Might catch a bad routing or overloaded network piece. Longer running processes does seem to point to app/SQL server configuration.
Are you doing any web traffic filtering/monitoring/malware checking that might have been configured to skip the MT URL and hasn’t been updated to the DT address?
*
“they did “server maintenance” that appeared to fix the issue. I closed the case, and then a number of days later all the problems came back”
I would definitely request a weekly app server restart be scheduled, be sure and give them time/timezone/day of week that works for your business, and check you don’t have scheduled tasks running then either. .
Obviously lots of facets, but since it started with the migration to DT… well, if it walks like a duck and talks like a duck… Still could be on either side, but much less likely to be on yours unless the URL wasn’t updated everywhere.
Zero progress on this… Currently we are being told to turn off anti-virus and to turn additional tracing options on and send them logs. Feels like they are grasping at straws. I asked what sort of things they have done server side – got this for response:
in regard to your question we are looking into things further however we have not been able to reproduce the issue hear yet i am continuing to look into it further
I have no confidence that anyone competent has looked at the server yet, dispute all my attempts to escalate.
Well the moral of the story is don’t give up with Epicor support. They eventually admitted there was something wrong with the server:
“Our Cloud Ops team identified the immediate cause and resolved it, and they have an open Support ticket with Microsoft to determine why the issue occurred.”
Whatever they found (they did not share with me what it was when I asked) took care of all of our performance issues, and errors. (Actually, it took care of some errors I didn’t even think were related)