Poor Epicor Performance - Is there an SQL Server Indexing Guide?

Hello,

I have only just discovered this site and signed up. I am a SQL Server DBA that is tackling poor performance of an on-prem Epicor 10.2.700.28 running on a fully patched SQL 2017 Enterprise Edition.

I have been tasked with speeding things up, a process that used to take 3 hours now takes 16 and suffers from long running queries often causing deadlocks and blocking. There has been little to no database maintenance, and I am addressing this now by introducing statistics updates during the week, and larger longer running Index rebuilds etc. at the weekend. Previous to this little maintenance had been done at all. This Epicor has apparently been through a couple of upgrades but the last one was quite some time ago.

I am using SQL tools to identify missing indexes etc and I am at the start of my mission. I am just getting to know Epicor, so I do not now too much about it at the moment.

My first question is whether Epicor supply any kind of official Index guides to speed up Epicor queries? Or maybe built in analysis which helps with such things.

Apart from that, is there any standard things people with more experience with Epicor do to keep it running as efficiently as possible?

Thank you for any help or advice given, I’ll appreciate all of it.

Kind regards,

CG.

Epicor’s official position is that they are not responsible for SQL maintenance for on-prem customers. Index/stats maintenance, backups, etc are all up to the customer. Routine stats updates with full scan is arguably the most important thing.

Explain ā€œPoor performanceā€ a little more. When is the performance poor? Is it only running certain queries? Is it a certain process? Is it all users all of the time? This can be anything from poor sql maintenance, to a bad query, to lack of hardware.

What’s the hardware/server setup? How many users?

What process went from 3 hours to 16 hours?

There’s a Performance and Diagnostic Tool that’s available from Epicor. I don’t know if you can download it right from the Epicor Admin Console in 10.2, but it’s on Epic Web. That is a good place to start.

Be careful adding custom indexes. They can get deleted during upgrades. There’s also a lot to Epicor, so while an index might make a single query run better, it might tank performance during a large process like MRP.

There’s a menu called Data Dictionary Viewer, which shows you all of the keys for the tables.

If it’s custom queries/dashboards that are running poorly you want to make sure they’re joined correctly. Always join tables on Company first, except the Customer table in certain situations. In BAQ designer if you go to Actions, you can go to execution settings and add options like recompile, or optimize for unknown to force a new execution plan and handle parameter sniffing issues.

You came to the right place. There are a lot of people here with a lot of experience. So give as much detail as possible and I’m sure somebody will be able to point you in the right direction.

The Epicor Admin guide does provide some some guidance on setting up maintenance plans, however. I’d suggest you take a look at Brent Ozars website and get you hands on Sp_blitz and Sp_blitzfist along with the ola hellengren scripts which are an improved version of the standard plans (plus more). Brent Ozar scripts upon registering give you access to some free online courses to help you learn about indexing and system tuning,

If you have not had any maintenanace plans setup then I would start with getting reindexes setup, but before you do that, record the time it takes for these long running areas first then apply the updates and run again. To probe if the change made any difference..

As @cpilinko mentioned some more. Information around you long running task would be helpful to so we could help better.

One of the useful things that the Brent Ozar scripts do is provide a list of priorities to work on, prioritising the worst performing changes first.

Thank you for getting back, I will need to get someone else to run those items you mentioned, but that will not be a problem. The main slow down is the end of month processes specifically. I did some research into the high compilations today, it seems its a combination of small queries that are run almost constantly. The issue I have is not knowing if its an Epicor statement or not, AI (cringe) say it is and as such there is not much I can do about it. One thing it suggested was that (and it said it would only be a small improvement) was that the Epicor app servers may have differing connection strings, that different session settings (ANSI_NULLS, ARITHABORT, etc.) cause SQL Server to store separate plans, also that the logins may not have the Epicor database as the default database for the login. This was brought up because there appeared to be a lot of cross database querying because there is a process that uses Object_ID for an Ice. UD table making constantly repeated metadata checks. Optimize for ad hoc workloads is already on, as is forced parameters. I think there is a lot of parameter sniffing as well, because what works for a small query does not work for a much larger one.

There is also a CDC running on the Customer table (actually several hot tables but customer is 349 columns wide), so when inserts are done, as the table is wide, it is taking along time for that process. Again, I’m not sure if CDC is there because of Epicor or some other reason (I will be asking them).

I am going to answer Hally question next. But what are your thoughts? Thank you so much for answering, that is awesome.

One thing I have is queries that are running 16k (yes, 16 thousand!) times a minute on an ICE table. That cannot be right. Why is it executing over and over like that? I have several doing this all to do with a table in the ICE schema.

Hello Hally, thank you getting back. As part of my taking this on, I have Ola scripts running already, I have a big maintenance windows coming up so hoping to get some good stuff done there. If you can recommend any Epicor table big hitters that would be great. I am quite familiar with the first responder kit, and funnily enough I installed that on the server just last night to try to get to the bottom of things, so I’m glad I’m already using what you recommended. We are experiencing a few deadlocks, but that might to an external process. The database is several TBs in size.

ā€˜Information around you long running task would be helpful to so we could help better.’

Happy to give some info, is there anything in particular that would be useful for me to get?

HI,

I did some index work with Claude and it gave me some good suggestions, you can figure this out yourself its just easier to ask Clause to cross ref the missing index query with the ā€œRecent expensive Queriesā€ list

Are you defragging you current index’s? And are you running Solid-state or HDD on your SQL Server?

I think you answered the long running task thing. I am guessing that ice table is ice. Systask. Systask param or such.

Which wants me to ask what do your System agent schedules and tasks look like, have you go a situation where a user has scheduled a whole heap of tasks as recurring around the end of the month,. At capture cos/WIP Inventory WIP revon or customer statements. Several Tb size DB sounds very large, what does the disk space report tell you, have you got a massively huge log file.

To find out more about your CDC (Change Data Capture) you will need to find out what that is setup and connecting to. I’m not that familiar with the tool but essentially it allows synching of external systems with Epicor. Perhaps the schedule on the log processor is too frequent.

As far as big hitter tables, it really does depend on how you use your system and what modules you have licenced, big active tables typically are TranGLC, GLJrnDtl, PartTran. If you use configured parts then tables that start with PC (the main one being PCValueIn IIRC). Going off the top of my head for some of these tables.

If you have external reporting tools like Fasclose, SSRS, power BI, or OData Excel spreadsheets (I am sure there are others) that are more commonly used around the end of the month then the SP_Blitzfirst should show you the longest running queries pretty quickly.

The Sp_blizt scripts will also give you the query and the database that the queries are being executed against so it should be helpful to narrow down where the problems are coming from, as well as giving you a quick snapshot of the health of your sql server.

If you have not run the PDT tool then I would also be running that to check your app server config. Do make sure you check the KBs as to all of the detail around each reported value, in some instances it does report incorrectly depending on the SQL server version you are using.

Futher more the output from sp_blitz are recommendations and you should do your own research, or seek expert advice before just applying the change.

I hope that helps

And dare I say that SP_Blitz also has a AI option that will generate rather detailed prompts that you can use in any AI tool to help you. Here is a link

Hey Mark, thank you for replying. I am using Copilot yes, and it is helping with the indexing, I give it the query and a shot of the execution plan as well. I am currently reorganizing my indexes at the weekends along with updating statistics, and I am doing mid-week statistics updates as well. I’ve only just got going on it so I’d say the plans still had a fair bit to cover. Drives are SSD as far as I’m aware, but I’ll check and come back on that.

Currently I cannot allow any info to go out to AI. I use it and remove certain information.

The execution plan is useful. You can add index’s on at anytime during the day, I dont bother with downtime as the system will build and just not use it until it’s done. You may need to clear your query cache. Which Version of SQL are you on and are you using the Legacy Cardinal Estimator?

I did a writeup and a thread somewhere that goes though some of the settings to apply to SQL to help with performace.

FYI below is what we use as a weekly main script. It’s basically the default MS SQL Server 2005 one (still the best scripti for the job :slight_smile: ) passed though AI for any improvements.

USE Epicor10Live; --change this to you epicor database
SET NOCOUNT ON;
SET XACT_ABORT ON;

RAISERROR(ā€˜=========================================’,0,1) WITH NOWAIT;
RAISERROR(ā€˜WEEKEND MAINTENANCE STARTED’,0,1) WITH NOWAIT;

RAISERROR(ā€˜=========================================’,0,1) WITH NOWAIT;


– CHECKDB (PHYSICAL ONLY)

RAISERROR(ā€˜Running CHECKDB (PHYSICAL_ONLY)…’,0,1) WITH NOWAIT;
–DBCC CHECKDB WITH PHYSICAL_ONLY, NO_INFOMSGS;
RAISERROR(ā€˜CHECKDB complete.’,0,1) WITH NOWAIT;


– Capture Fragmentation (>10%)

IF OBJECT_ID(ā€˜tempdb..#frag’) IS NOT NULL DROP TABLE #frag;

SELECT
OBJECT_SCHEMA_NAME(ps.object_id) AS SchemaName,
OBJECT_NAME(ps.object_id) AS TableName,
i.name AS IndexName,
ps.avg_fragmentation_in_percent,
ps.page_count
INTO #frag
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, ā€˜LIMITED’) ps
JOIN sys.indexes i
ON ps.object_id = i.object_id
AND ps.index_id = i.index_id
WHERE
ps.index_id > 0
AND ps.page_count > 1000
AND ps.avg_fragmentation_in_percent > 10;


– Rebuild

DECLARE @Schema sysname,
@Table sysname,
@Index sysname,
@sql nvarchar(max);

DECLARE rebuild_cursor CURSOR FOR
SELECT SchemaName, TableName, IndexName
FROM #frag;

OPEN rebuild_cursor;
FETCH NEXT FROM rebuild_cursor INTO @Schema, @Table, @Index;

WHILE @@FETCH_STATUS = 0
BEGIN
RAISERROR(ā€˜Rebuilding %s.%s (%s)…’,0,1,@Schema,@Table,@Index) WITH NOWAIT;

SET @sql = '
ALTER INDEX [' + @Index + '] ON [' + @Schema + '].[' + @Table + ']
REBUILD WITH (SORT_IN_TEMPDB = ON, MAXDOP = 8);
';

EXEC(@sql);

FETCH NEXT FROM rebuild_cursor INTO @Schema, @Table, @Index;

END

CLOSE rebuild_cursor;
DEALLOCATE rebuild_cursor;

RAISERROR(ā€˜Index rebuild phase complete.’,0,1) WITH NOWAIT;


– FULLSCAN Statistics

RAISERROR(ā€˜Updating statistics WITH FULLSCAN…’,0,1) WITH NOWAIT;

DECLARE @tbl nvarchar(500);

DECLARE stats_cursor CURSOR FOR
SELECT QUOTENAME(SCHEMA_NAME(schema_id)) + ā€˜.’ + QUOTENAME(name)
FROM sys.tables
WHERE is_ms_shipped = 0;

OPEN stats_cursor;
FETCH NEXT FROM stats_cursor INTO @tbl;

WHILE @@FETCH_STATUS = 0
BEGIN
RAISERROR(ā€˜Updating stats on %s…’,0,1,@tbl) WITH NOWAIT;
EXEC(ā€˜UPDATE STATISTICS ’ + @tbl + ’ WITH FULLSCAN;’);
FETCH NEXT FROM stats_cursor INTO @tbl;
END

CLOSE stats_cursor;
DEALLOCATE stats_cursor;

RAISERROR(ā€˜=========================================’,0,1) WITH NOWAIT;
RAISERROR(ā€˜WEEKEND MAINTENANCE COMPLETE’,0,1) WITH NOWAIT;

RAISERROR(ā€˜=========================================’,0,1) WITH NOWAIT;

Hello Simon,

The ice table is a user table, a UD table. Yes I’ll be leaning on Blitz first and will be testing the recommendations in a lower environment before putting them forward for production use. Lof is backed up every 15 minutes so it tends to not get too big. I do know already the TempDB is hideously misconfigured, so that is also getting sorted out. And those tables are very familiar already, and I am focusing on those for maintenance. I will be asking them about the CDC to find out why it is there, and for whom. I have asked to get the PDT tool rerun to check anything, I’ve not seen or heard of it before coming here so I’ve no idea what to expect from it.

Query usage will just show what queries are being used on you DB and most of them will probably be Epicors. Either way I asked Clause to summarise my last work on Index Improvement:

AI Content Below

1. Recent expensive queries

This is the most useful starting point — shows what’s actually consuming CPU on the server right now:

sql

SELECT TOP 25
    qs.execution_count,
    qs.total_worker_time / 1000 AS total_cpu_ms,
    qs.total_elapsed_time / 1000 AS total_duration_ms,
    qs.total_logical_reads,
    qs.total_physical_reads,
    
    (qs.total_worker_time / qs.execution_count) / 1000 AS avg_cpu_ms,
    (qs.total_elapsed_time / qs.execution_count) / 1000 AS avg_duration_ms,
    qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
    
    qs.creation_time,
    qs.last_execution_time,
    
    SUBSTRING(st.text,
        (qs.statement_start_offset / 2) + 1,
        ((CASE qs.statement_end_offset
            WHEN -1 THEN DATALENGTH(st.text)
            ELSE qs.statement_end_offset
          END - qs.statement_start_offset) / 2) + 1
    ) AS query_text,
    
    DB_NAME(st.dbid) AS database_name,
    qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE st.dbid = DB_ID('Epicor10Live')
ORDER BY qs.total_worker_time DESC;

Change the ORDER BY to find different things:

  • qs.total_worker_time DESC — biggest cumulative CPU drain (default)
  • (qs.total_worker_time / qs.execution_count) DESC — slowest individual queries
  • qs.total_logical_reads DESC — biggest IO consumers
  • qs.execution_count DESC — death by a thousand cuts

2. Missing index report

This is the DMV’s opinion on what indexes would theoretically help. Useful as input, not as gospel — many of its suggestions don’t correlate with queries that actually matter:

sql

SELECT TOP 50
    DB_NAME(mid.database_id) AS [Database],
    OBJECT_SCHEMA_NAME(mid.object_id, mid.database_id) AS [Schema],
    OBJECT_NAME(mid.object_id, mid.database_id) AS [Table],
    migs.user_seeks,
    migs.user_scans,
    migs.last_user_seek,
    migs.avg_total_user_cost,
    migs.avg_user_impact,
    CONVERT(DECIMAL(18,2),
        migs.avg_total_user_cost
        * migs.avg_user_impact
        * (migs.user_seeks + migs.user_scans)
    ) AS improvement_measure,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns,
    mid.statement
FROM sys.dm_db_missing_index_groups        mig
JOIN sys.dm_db_missing_index_group_stats   migs ON migs.group_handle = mig.index_group_handle
JOIN sys.dm_db_missing_index_details       mid  ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID()
ORDER BY improvement_measure DESC;

The recommendation process

Quick reminder of how we used these together — you went through this with me before, but it’s worth restating:

  1. Run the expensive queries (#1) first
  2. Pick the biggest-impact queries you can do something about
  3. Cross-reference against the missing index report (2)
  4. Only act on suggestions that correlate with a query you’ve identified as expensive
  5. Check what indexes already exist on the candidate table before creating anything
  6. Refine the suggestion (drop inequality columns from the key, add missing INCLUDE columns from the actual SELECT list, default to PAGE compression, drop ONLINE since you’re on Standard)
  7. Deploy in a maintenance window with SET LOCK_TIMEOUT 30000
  8. Verify usage afterwards

Send both outputs through and I’ll have a look at what’s surfacing.

One thing to mention while you’re at it — please also run this so we can see whether the SQL service has been bounced recently, because that resets all the DMV counters and would change how we read the numbers:

sql

SELECT sqlserver_start_time FROM sys.dm_os_sys_info;

If the start time is days ago, the data is meaningful. If it’s hours ago, we’ll need to give it time to accumulate before drawing conclusions.

Tahnk you, Mark. I’ll stick with the Ola plan for now as I’ve only just put that in place. I’m on a fully patched SQL 2017 Enterprise, at database level legacy cardinality estimation is off.

CDC is there for integration into other systems like Enterprise Search IF your pushing changes to it and not having ES pull changes. CDC subscriber management till tell you if it’s in use and you can turn it off in CDC Table Management. When Kinetic is installed on the server it should create a script to delete old rows

We find that SQL runs better with it on as it allows for slighter older queries to get optimised correctly.

The prompts don’t include any PII, you get the ability to preview the prompts before you send it to the AI of your choice.

For the most part the output of sp_blitz reports on information from the syatemn dmvs so it is all statistical. You can remove or substitute some of the column information if you had concerns.

Just another note the recommend indexes that get generated from Ssms toolsoften have the column names in the wrong order. Brent Ozar mentions this in his documentation.

Be cautious with how this is scoped. The compute workload can be surprisingly disruptive.

I think you’re on top of this… but rebuild only during off hours or users will notice. reorganize can be selectively run often, even on live data to a cautious extent, and usually does everything one expects from rebuild without the default impact of a statistics overhaul. Splitting rebuild into reorganize + update statistics allows you to keep those indexes tightly organized, and target updating stats where and when it’s needed and helpful.

Ola’s solution is the golden standard to start from. It’s also a great resource to sit down and read, reasonably transparent and lots to learn. Don’t be afraid to cherry pick and fine tune. Some of the metrics you’d want for checking your work are touched on inside.

ā€˜Splitting rebuild into reorganize + update statistics allows you to keep those indexes tightly organized, and target updating stats where and when it’s needed and helpful.’

Perfect! This is what I am doing.

ā€˜We find that SQL runs better with it on as it allows for slighter older queries to get optimised correctly.’

Interesting! I’ll get the maintenance up to scratch, deal with the indexes (I have an awful lot of duplicate and unused indexes to deal with) and then I’ll look into that. I do not want to make too many changes in one go, I’ll be moving the TempDB in a next couple of weeks as well, so hoping to see a difference there.