Performance Problems and Chatty Linq Queries

On several occasions we’ve had to investigate performance problem in Epicor 10 (running trial balance reports, running AFR reports, doing data loads, etc).

It seems a bit hard to diagnose performance issues, considering the layered architecture. But on the very back-end in SQL Server, we can enable the SQL Profiler and view the ERP activity as it is experienced by the database.

Whenever we do this, it shocks me how many queries are being generated and sent from the application server back to the database. For a single user, it can be 1000-4000 SQL batches per second, depending on what they are doing. Most of this looks like tiny, little queries generated by LINQ and Entity Framework.

Taking one step back to the “application server” tier (and the SSRS server), it appears that the CPU and network are both quite active. It appears that this is mostly related to the chatter between the application server and the SQL database.

Can somebody please confirm if these observations sound familiar? What are best performance counters to use when troubleshooting performance in the ERP (versions 10.1.600.19 and 10.2.200.10)? Should we be looking primarily at the database or at the application server to find the true bottleneck? Or is it based on the number of network round-trips? Is there a good way in profiler to isolate the activity of one business operation from another? We have a very fast 10 Gbit connection that connects the two tiers so it doesn’t seem to me like the network should be the problem, but I suppose that is possible if the application code gets chatty enough. I realize that my questions are very broad, but it would be nice to hear how others troubleshoot and diagnose performance issues. Also, I’m wondering how severe these types of performance issues would need to get - until Epicor would be willing to take a support case and work on a software fix.

What is the hardware platform you are running on? I have seen issues on different hardware stacks (such as blade architecture) with VMware and each guest is on a different physical host. While the backbone was 10Gb it never performed like it and for some, rather than troubleshoot, they chose affinity rules to keep the app / db together which produced a significant improvement. Root cause would have been better in these cases but sometimes people like the quick and easy solution without worrying about future impact.

You mean the app and db ended up living on the same host?

We use separate hyper-v VM’s for both the appserver and SQL. They don’t run on the same host. I’m pretty certain the network is correctly configured between them. I don’t think that putting them on the same host would improve things substantially but I guess we can try.

How far did you get in finding the root cause before moving the two onto the same host? It sounds like you must have at least isolated the problem to a certain degree, if you decided that being on a single host would help. Did you take any measurements?