So, on Wednesday, my users started receiving the below error. We ended up having to reboot the (virtual) server to clear it up. Ever since the reboot, Epicor has been running extremely slow. I ran the PDT and received the terrible results below, but I’m not sure what to do with that information. Before the error on Wednesday, the PDT was passing everything and our SQL results was around 2,700-3,000ms. Where do I go from here?
Application Error
Exception caught in: mscorlib
Error Detail
============
Message: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:00:59.9938673'.
Inner Exception Message: An existing connection was forcibly closed by the remote host
Program: CommonLanguageRuntimeLibrary
Method: HandleReturnMessage
Client Stack Trace
==================
Server stack trace:
at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)
at System.ServiceModel.Channels.SocketConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
at System.ServiceModel.Channels.DelegatingConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
at System.ServiceModel.Channels.ConnectionUpgradeHelper.InitiateUpgrade(StreamUpgradeInitiator upgradeInitiator, IConnection& connection, ClientFramingDecoder decoder, IDefaultCommunicationTimeouts defaultTimeouts, TimeoutHelper& timeoutHelper)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.SendPreamble(IConnection connection, ArraySegment`1 preamble, TimeoutHelper& timeoutHelper)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.DuplexConnectionPoolHelper.AcceptPooledConnection(IConnection connection, TimeoutHelper& timeoutHelper)
at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open()
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at System.ServiceModel.ICommunicationObject.Open()
at Epicor.ServiceModel.Channels.ChannelEntry`1.CreateNewChannel()
at Epicor.ServiceModel.Channels.ChannelEntry`1.CreateChannel()
at Epicor.ServiceModel.Channels.ChannelEntry`1.GetContract()
at Epicor.ServiceModel.Channels.ImplBase`1.GetChannel()
at Epicor.ServiceModel.Channels.ImplBase`1.HandleContractBeforeCall()
at Ice.Proxy.BO.ReportMonitorImpl.GetRowsKeepIdleTime(String whereClauseSysRptLst, Int32 pageSize, Int32 absolutePage, Boolean& morePages)
at Ice.Adapters.ReportMonitorAdapter.GetRowsKeepIdleTime(SearchOptions opts, Boolean& MorePages)
Inner Exception
===============
An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)
aidacra
(Nathan your friendly neighborhood Support Engineer)
2
Have you looked at the BIOS c-states on the physical host (made sure the host is set to high-performance mode in the BIOS)? Are other VMs on the same host running slowly? Recent Windows updates that were applied during the latest restart? I’ve caused the same behavior to happen on purpose by throttling the CPU on the application pool, but, most people wouldn’t do that.
I can’t offer actual answers, but those results look familiar because we were getting some similar things when setting up our recent new environment, which entailed going from a physical server to virtuals.
I do know the VMs have to be set up with cores etc tied strictly to what is physically available, and there are strict set-up rules besides that, but Epicor did all that for us so I don’t have the details. I do know they struggled to find the last few settings which turned the VMs from piggishly slow to humming sweetly, but they did do so, so Support should be able to do the same for you.
@aidacra - C-States should be disabled. I was plagued by that back in 2016. The host was not rebooted, so I don’t think those would have changed. Other VMs seem okay. Actually, the VM, itself, seems fine. SQL isn’t moving too fast, though. No Windows updates were applied and the app pool is set to zero and ‘NoAction’.
@josecgomez - Yes, we do regular maintenance, but I’m seeing that the latest indexing task has failed, but I don’t see a reason in the history. Would that cause such a sudden change during a reboot? I’d imagine it would be more chronic.
Hmm your index task failed… it shouldn’t be that big idea, a reboot does clear the SQL Cached data which could have been “hiding” some of the symptoms… I suppose.
I wonder if your are having Hardware issues … any errors in Event Viewer? outside the norm?
I’m not seeing anything in the Event Viewer that’s odd.
One thing to note regarding the hardware, it was about -50 with windchill on Wednesday when this started happening and, ironically, our AC died in the server room, so things got a bit hot. It’s been fixed since then, though. Would a full host reboot help?
Did one of your CPUs go bad - All Epicor’s test does it runs a while loop and does math and calculates with StopWatch. Hope you figure it out. Only time I had bad CPU test results via PDT is after updating VMWare to HW11 and aftering installing a mismatched Firmware on Cisco Blade.
Any BPM causing an inifinite loop? Hitting your AppServer. The Test is ran on the AppServer, so check Windows Event Viewer there if it shows something weird.
Take SQL out of the mix, that CPU test is really just the App Server you are running the Test. and not even the IIS but literally just the Operating System of Whichever instance you are connected to via PDT.
So now let’s put Epicor aside and focus on the CPU - its probably related to Windows, Hardware - take out the App Pool because that test is really just foreaching division statements, nothing more nothing less.
Now if you trust me download this and simply RDP into a few servers, paste it and run it:
It is not a virus, garuanteed =) I had to strip the CPU Test from Epicor and run it on servers to figure out my bottlenecks months ago. Does the exact same logic as PDT (took it out of the BO that executes that CPU test)
If I recall its the bottom number that is the CPU Result.
Take this and run it on other non-related Epicor Servers, prob even your HyperV or VM Host if you can. Your SQL your SSRS your RPT Server if you have a diff, prob other APP servers. Easier.
Awesome so something is going on with your Host. Atleast we can narrow that down If you go to any OS hosted on that Host and you get the same results, now you know to look at the Host.