The Caller was not Authenticated by the Service

So, we’ve been experiencing an intermittent issue with our production Epicor for the past 3-4 weeks and are running out of ideas to try to fix it.
About 2-4 times a day, users’ Epicor will close with the message “Epicor is Offline”. Clicking “Retry” gives the message “The Caller was not Authenticated by the Service”.

image

Recycling the app pool after this happens helps for a time, but it inevitably goes down again.
Epicor suggested removing the server from the domain and adding it back, which we did last night, but are still receiving the issue.
From an Epicor server standpoint, nothing changed when it started happening. Epicor support is claiming it’s some sort of domain issue and aren’t really providing assistance because of that.
We’re hosted privately in the Azure cloud if that helps anyone.

I’m finding the following in the Event Viewer around the time it went down.

Domain issues are a bit outside of my, well, domain.
Does anyone have any ideas?

Here’s some more detail of the error I found in my local Event Viewer:

Unable to reach the server. Retrying...

System.ServiceModel.Security.SecurityNegotiationException: The caller was not authenticated by the service. ---> System.ServiceModel.FaultException: The request for security token could not be satisfied because authentication failed.
   at System.ServiceModel.Security.SecurityUtils.ThrowIfNegotiationFault(Message message, EndpointAddress target)
   at System.ServiceModel.Security.SspiNegotiationTokenProvider.GetNextOutgoingMessageBody(Message incomingMessage, SspiNegotiationTokenProviderState sspiState)
   --- End of inner exception stack trace ---

Server stack trace: 
   at System.ServiceModel.Security.IssuanceTokenProviderBase`1.DoNegotiation(TimeSpan timeout)
   at System.ServiceModel.Security.SspiNegotiationTokenProvider.OnOpen(TimeSpan timeout)
   at System.ServiceModel.Security.WrapperSecurityCommunicationObject.OnOpen(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
   at System.ServiceModel.Security.CommunicationObjectSecurityTokenProvider.Open(TimeSpan timeout)
   at System.ServiceModel.Security.SecurityProtocol.OnOpen(TimeSpan timeout)
   at System.ServiceModel.Security.WrapperSecurityCommunicationObject.OnOpen(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
   at System.ServiceModel.Channels.SecurityChannelFactory`1.ClientSecurityChannel`1.OnOpen(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
   at System.ServiceModel.Channels.CommunicationObject.Open()

Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at System.ServiceModel.ICommunicationObject.Open()
   at Epicor.ServiceModel.Channels.ChannelEntry`1.CreateNewChannel()
   at Epicor.ServiceModel.Channels.ImplBase`1.GetChannel()
   at Epicor.ServiceModel.Channels.ImplBase`1.HandleContractBeforeCall()
   at Ice.Proxy.BO.ReportMonitorImpl.GetRowsKeepIdleTimeWithBallonInfo(String whereClauseSysRptLst, Boolean getBallonInfo, String whereClauseSysTask, String whereClauseSysTaskLog, DataSet& sysMonitorData)

Your domain controleer is intermittently unaccessible. It is not Erp problem. it just complains about it.

And this is where I falter. I’m not too sure how to troubleshoot a domain controller issue. Was kinda hoping someone would have some suggestions to look at.

Azure Active Directory? :man_shrugging:

:wink:

Try this microsoft tool. You can run from any connected machine I believe.

https://www.microsoft.com/en-us/download/details.aspx?id=30005

Otherwise DNS issues it what it seems like. You can DNS flush the client machine if its a single user issue, otherwise you might need to enter a record for the epicor server address in your DNS.

Well, at present, that tool does not show any issues. Should I run it the next time Epicor goes down?

Run command prompt and type ipconfig /all on one of the naughty machines.
Verify their network settings are correct.

You can also do nslookup MyErp.cloudaddress.net to verify your DNS knows what to do with your epicor address.

It’s everyone that goes down. I believe it’s the server that loses its connection with the DNS.

run this on a client machine.

Close Epicor
Open Command Prompt
ipconfig /flushdns
Open Epicor and Connect
ipconfig /displaydns

This should show all the sites that have used DNS since flushing and all the automatically assigned records for your local network. It might give you some ideas.

Did the one from CMD. It’s clearly something with the domain trusting the server but still not sure what to do.

1 Like

Ouch,

Try leaving and joining the domain on a client computer you’re not worried about breaking. If you can’t successfully join the domain controller that’s a problem.

You should contact a MSP if you aren’t able to tackle this issue.

We have a ticket open with Microsoft. They suggested we demote the DC in Azure and promoting it with read and write since it’s only read right now.

2 Likes

Promoting the Azure DC did the trick. :+1:

4 Likes