ERP10.1.500 clients performance slow when using Netscaler Laodbalancers

We observe strange behavior and complaints about client performance slowness while using Network Load balancer (Citrix Netscaler 5650) to connection to ERP10 Application servers, but when we bypass it and put the direct server name/ip in the connection string they say performance much better.

We have the below setup and configuration, kindly advice if you can help:
LB IP: 10.11.20.32
APP01: 10.11.3.17
APP02: 10.11.3.32

When clients connect directly to 10.11.3.17 they say performance much better than when connecting to 10.11.20.32.

NLB design

Any advice or help can be given ?

What are you using for:
bindings - http or net.tcp
What timeout are you using for the connection in the load balancer?

I wonder if your loadbalancer is truncating connections wile in use. You can measure this on the client traces. For example I have this snippet I just did:

<tracePacket>
 <businessObject>Erp.Proxy.BO.SalesOrderImpl</businessObject>
 <methodName>MasterUpdate</methodName>
 <executionTime total="2724" roundTrip="2691" channel="_0" bpm="0" other="33" />
 <retries>0</retries >
<parameters>

See if you are getting retries.

e.g. - I have seen where the load balancer kills off connections after 45 or 90 seconds. WCF and net.tcp like to keep connections going for 10 minutes. This is why we have a ‘recycle connection’ timer (channeltimetolive) in the client.exe.config of 9 minutes :wink:
image

Review your loadbalancer config, trace, let me know what you find :slight_smile:

I found the topic I was trying to remember. It’s discussing Microsoft ARR instead of load balancers since we can kind of document that - Too many Load Balancers out there to doc all. Take a glance thru to see if any of this helps and do the tracing as previously mentioned :slight_smile:

we are using net.ctp windows bindings.
**we have two type of time out: **
- client-Idle timeOut set to 3000 seconds (now changed to 10200)
- Server-Idle timeOut set to 3000 seconds (now changed to 10200)
Timeout

For the retries we didn’t observe any most of the traces if not all are showing 0, we will keep it monitored and check.

But in random time we get the below exception on the client traces:

Blockquote

System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was ‘02:50:00’. —> System.IO.IOException: The write operation failed, see inner exception. —> System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was ‘02:50:00’. —> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.Send(Byte buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.ServiceModel.Channels.SocketConnection.Write(Byte buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)
— End of inner exception stack trace —
at System.ServiceModel.Channels.SocketConnection.Write(Byte buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)
at System.ServiceModel.Channels.BufferedConnection.WriteNow(Byte buffer, Int32 offset, Int32 size, TimeSpan timeout, BufferManager bufferManager)
at System.ServiceModel.Channels.BufferedConnection.Write(Byte buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)
at System.ServiceModel.Channels.ConnectionStream.Write(Byte buffer, Int32 offset, Int32 count)
at System.Net.Security.NegotiateStream.StartWriting(Byte buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.NegotiateStream.ProcessWrite(Byte buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
— End of inner exception stack trace —
at System.Net.Security.NegotiateStream.ProcessWrite(Byte buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.NegotiateStream.Write(Byte buffer, Int32 offset, Int32 count)
at System.ServiceModel.Channels.StreamConnection.Write(Byte buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)
— End of inner exception stack trace —

Blockquote
Server stack trace:
at System.ServiceModel.Channels.StreamConnection.Write(Byte buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)
at System.ServiceModel.Channels.StreamConnection.Write(Byte buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout, BufferManager bufferManager)
at System.ServiceModel.Channels.FramingDuplexSessionChannel.OnSendCore(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.TransportDuplexSessionChannel.OnSend(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.OutputChannel.Send(Message message, TimeSpan timeout)
at System.ServiceModel.Dispatcher.DuplexChannelBinder.Request(Message message, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object ins, Object outs, TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

Blockquote
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at Ice.Contracts.ReportMonitorSvcContract.GetRowsKeepIdleTime(String whereClauseSysRptLst, Int32 pageSize, Int32 absolutePage, Boolean& morePages)
at Ice.Proxy.BO.ReportMonitorImpl.GetRowsKeepIdleTime(String whereClauseSysRptLst, Int32 pageSize, Int32 absolutePage, Boolean& morePages)

For the ARR configuration previously we used to have packets distributed on both servers, now the load balancing mechanism is that once the client contact the Load Balancer he will be assigned to one App server to serve all his sessions and this should some improvements.

But I’m looking to see if there is any specific recommendation or best practices on hat required to be on hardware Load Balancers to ensure smoot traffic flow and avoid performance issues.

Any feedback about my latest reply ?

Is that a consistent 2 hour 50 minute timeout or certain servers?

I think that error is stating the server has been listening almost three hours and recycled the connection - the client got disconnected. That should be fine though - the client reconnects. LONG running scenarios (Posting Engine?) have some long connections handling that manually but otherwise - not enough information. It would need a larger event log dump to see the context so I would recommend talking to support.