Remote/VPN Performance - Looking for Suggestions

andrew.johnson · February 1, 2023, 2:54pm

Thanks for the input, I appreciate it.

Would any of you be willing to share an average load time for a popular form? Job Entry, Quote Entry, etc?

I would like to know what is a realistic goal. I don’t want to chase some level of performance that is not attainable.

hkeric.wci · February 1, 2023, 3:36pm

We had an issue with SonicWall misconfiguration… SonicWall kept crashing our MRP sometimes, slowing down clients etc… because it was misconfigured and filtering internal<->internal traffic… I am not sure what it did… I know once they fixed SonicWall, stuff flew like a rocket.

I am curious if you disabled SonicWall for an hour, if your speeds would be much better.

andrew.johnson · February 1, 2023, 4:30pm

Since our VPN endpoints are the SonicWalls themselves we won’t be able to disable them entirely, but we will definitely review the configurations at each location.

Thanks

James_Stubbs · February 1, 2023, 6:07pm

With the PDT, you want to be looking at the Network Test, which downloads a set amount of data from the server using your client protocol, and is a test really designed to only be run from a client (the number of times people seem to run it on their server as if to prove something beggars belief).

So install the PDT on a client on your LAN, run the test and then compare with a client at the remote location.

Also, I can’t see what your main site version is, and which protocol you are using - https may work better over a WAN than Net.tcp (this can be tested with the PDT Network Test tool).

andrew.johnson · February 1, 2023, 11:21pm

I’m having multiple errors when trying to use the PDT software. It’s happening on any computer I install it on. One error in Application Setup when attempting to ‘Read Configuration from SysConfig’, and another when attempting to use the Config Check or Network Diagnostics tools if the Connection Method is set to anything other than ‘REST’.

Epicor Support is telling me to only use the REST Connection Method, and that I should not use the ‘Read configuration from Sysconfig’ button.

Doesn’t that defeat the purpose of the tool? I thought the goal was to use it to test the settings of the SysConfig file that is in use by our clients? Using the same Connection Method / Binding?

James_Stubbs · February 2, 2023, 7:34am

PDT was initially just an internal dev tool as far as I am aware, so there may be a few glitches.

Error reading Config from Sysconfig: this can be bypassed by entering URI, connection method, user name & password and Client Directory.

Regarding the connection method, this will only work with the connection methods you have set up against your appserver. So Kinetic 202x and higher only support REST. But if you are still on 10.1.400, then you can use the older bindings.

Oh and it may be worth making sure you have downloaded the latest version from EpicWeb.

andrew.johnson · February 2, 2023, 4:51pm

Here are my results:

All tests were run from client workstations.
All tests were run ‘Uncompressed’.
Iowa is the location of the App Server.
Iowa (VPN) is the VPN connection from my home, using a 4G hotspot.
California is our other facility, connected to Iowa via SonicWall Site-to-Site VPN.

According to the text in the Network Diagnostics tool, I am well within acceptable ranges. Much lower in fact.

Mark_Wonsil · February 2, 2023, 5:56pm

5 posts were merged into an existing topic: Improve Azure Performance

andrew.johnson · February 2, 2023, 5:55pm

Just to quantify the performance issues I’m referring to… I collected some timings of starting the client, as well as Job Entry and Part Entry forms, both initial open and re-open.

Epicor Kinetic 2022.2.8
Windows Authentication
Active Homepage start screen

How does this compare to you all? Are these load times quite long or is it just us?

To me, this doesn’t seem to align with the Network Diagnostics results and the supposed target values that Epicor states in that tool.

James_Stubbs · February 3, 2023, 9:34am

I would say those times do align to a degree with your PDT results. PDT results show that the network time is approx 3-4x Iowa times (and I would ignore the ‘Targets’ mentioned on the screen I don’t think they have been updated since E9 days and think compression went out with E10).

So client load times being 2x longer in California, sounds about right.

I would compare that with the results you are seeing with your Mexico system - the PDT Network should be a pretty similar test vs Kinetic.

Other things you may want to consider: are the forms are caching properly on the local disk, and is RDS the way to go for remote users?

andrew.johnson · February 3, 2023, 7:42pm

You’re right, the load times are roughly proportional to the Network Diagnostics results.

My problem is, I have nothing to benchmark against. Especially if the numbers in the PDT are old and no longer relevant.

What should our load times be, how do I compare to other Epicor systems? Are these load times abnormally high? They certainly seem that way from the user’s perspective. Waiting 30+ seconds for a form to load is pretty jarring to a user’s workflow.

James_Stubbs · February 3, 2023, 9:30pm

What should your times be? There’s an answer just there - “Waiting 30+ seconds for a form to load is pretty jarring to a user’s workflow.”

Not helpful I realise, but your system can deliver times significantly faster. So what are your options?

Well, as I said, check that the local client cache is being properly used. By default it is c:\ProgramData\Epicor\Server-Port… (or similar). If a user can’t write there, then they have to download the full forms each time, significantly increasing workload and all the wonderful info that @Rich posted about caching won’t help in the least. If you don’t want to give them access to this location, you can specify an ‘AlternateCacheLocation’ in the sysconfig file (Though this does seem to be more of an issue on RDS servers, which brings me on to…)

Consider RDS / Citrix / Other Remote Desktop systems are available: Generally these have much lower bandwidth requirements (they only have to draw the screen), and you are likely to approach “Local speeds”.

Dare I say it, try the Kinetic versions of the screens - I suspect these will be faster.

Work to reduce the reason your network test is 2x as slow over the WAN: Epicor clients don’t have HUGE bandwidth requirements - current differences seem excessive, but it’s been a while since I drew comparative data for sites over a WAN vs LAN. I do however seem to recall a user who only felt the need to go to RDS when they were communicating from OZ to EU.

andrew.johnson · February 3, 2023, 9:44pm

I appreciate the input. Thanks.

TomAlexander · May 21, 2023, 7:35pm

Of course. But is it better to keep them as close as possible - i.e. both on one beefy server - or separate them to two servers? Maybe oversimplifying here, but it’s basically a question of whether the lack of resource contention in split servers makes up for the distance you add between SQL and the AppServer, right?

I’ve heard a lot of debate on this but the consensus amongst the community seems to be that split is better. Not only better but so much better that anyone suggesting otherwise should be fired for incompetence. Some people really are that vocal about it, so I decided to listen to them…

We are in Azure and just did an upgrade from 10.2.700 to Kinetic 2022.2. On 10.2 we had one beefy server for SQL and the AppServer, but when we moved to Kinetic we decided to split SQL and AppServer to two machines. We are still throwing a lot of resources at both servers, but performance has definitely decreased. The P&D tool’s SQL Query Speed Test fails on us (it’s 5-6 times slower than the old prod box), and when we look at server logs there are many, many method calls exceeding their millisecond thresholds (we’ve ruled out BPMs as culprits).

Of course, there are a million variables here, but as I rule more and more things out, I am left wondering if splitting servers was the wrong call. I’m optimistic that there is still some sort of misconfiguration we can find and correct, but we’re running out of things to look at. I hate to be the SQL/developer guy who blames his problems on networks and infrastructure, but that’s all I’m coming up with… Any thoughts on this are much appreciated!

Mark_Wonsil · May 21, 2023, 8:26pm

Do you know if the VMs were configured with Accelerated Networking? It would make them “closer” and it’s free.

mgoodwin · May 22, 2023, 12:53am

Old thread I know, but in your case I’d spring for RDS (even though I passionately hate RDS)

our MES load time is around 3 seconds
Job Entry 5 seconds
Part 4 seconds

any of these re-opened is Sub second

TomAlexander · May 22, 2023, 4:00pm

Thank you. Looks like Accelerated Networking was already enabled.

TomAlexander · May 24, 2023, 12:11pm

Turns out servers were on different vnets so Accelerated Networking wasn’t doing anything for us… We’ll be correcting that, also looking into Azure Proximity Groups… I’m optimistic that will make a big difference.

James_Stubbs · May 31, 2023, 5:11pm

Azure Proximity Groups definitely do make a difference. As for the PDT SQL Test failing, you don’t say which VM sizes you have gone for, but the D Series and E Series VMs pass the SQL Test adequately (when sized appropriately for your user count).

The Millisecond Threshold in the PDT is NOT a target or marker of ‘bad performance’; it is a configurable parameter when you analyse the logs, to allow you to easily pull out ‘long running’ method calls: I would never expect MRP to fail to exceed the threshold.

That said, it is set at 3, because after that users get a little impatient waiting.

TomAlexander · May 31, 2023, 7:13pm

Thanks James. I understand the millisecond threshold; it was poor phrasing on my part.

Moving DB and Appserver to the same vNet has made a big difference. Config check passes now and we’re seeing a large reduction in method call durations in the server logs.

Proximity Groups may be our next step.