Process MRP Failing

web.config timeout?

Normal is 9 hours, though, so 2 seems odd to me.

1 Like

A couple months back we were getting a timeout error about once per week. We discovered the timeout setting in web.config and increased it to 10 hrs, and that address that particular error. The 2-hr trigger for ‘PROGRAM CANCELLED’ must be due to something else.

I would consider that but the first failure in this series occurred Fri eve and end of DST was early morning hours of Sun, so I’m thinking it’s something else.

I figured, but had to ask.

Any updates install on app or SQL servers at that time? I know I got bit by that many times; even the slightest restart of .NET (I think) would do it. This is way outside my comfort zone, so hopefully this makes sense…

I think you might be on to something Jason. When we first started messing around with MRP settings, it was because we were hitting a timeout at 8 hours. The issue appeared seemingly out of the blue. But once we figured out what was going on, it made sense. We had slowly been generating more and more jobs over time due to backlog and so on. One day, we started bumping up against the limit.

In this case, when I review the log, what I see is that there must be “operations” within the process… and it appears to me that those operations have a timeout of 1 hour. We’ve been seeing it timeout at 1 hour when it fails in full regen mode, but last night’s failed at 2 hours. I believe that’s because the first operation completed in just under 1 hour, therefore did not timeout (see screenshots below)… and the process moved on to the next operation. I don’t know if I am using the right terminology, but I see the word “operation” in the error.

Does anyone know where there might be a timeout setting that controls these “operation” timeouts? I checked web.config and I don’t see one that looks like it would control this.


Spitballing, as I’m in the Admin Console right now:

image

But if that’s 5 minutes, I cannot believe our server does EVERYTHING in under 5 minutes.

Another good question would be why does deleting the TFOrdDtl table take close to an hour. I found this in the archives. It’s over my head, though. @tkueppers maybe you can make sense of it?

There is an MRPTImeout setting that can be placed in the web.config. That solved a lot of our timeout errors while deleting unfirm jobs. If you look at the MRP log it tells you what those settings are at the top (at least in 10.2.400.13 it did)

  <appSettings>
    <add key="MRPRowCount" value="4000" />
    <add key="MRPCommandTimeout" value="21600" />
1 Like

aha! The query timeout is set to 3600 (60 seconds x 60 minutes). It’s failing to execute the delete query and then timing out. I bet this will get us past that timeout problem. I still want to investigate why it is taking an hour to execute… as it should only take seconds… like the other tables.

Right, that’s the doozy.

So, our MRP run generates (and deletes) over 1.5 million rows of JobMtl. It’s takes hours (by my math) to delete all of that but our timeout is also 3600, and when I read the “Delete of Table: JobMtl” line in the log, it supposedly took only 4 minutes. (Or zero; I don’t exactly know what “after” means there.) Details below if you expand it.

I’m not saying the timeout isn’t worth changing, since it helped @Paul_Millsaps . But I guess all I am saying is I have even less of a clue of what is going on than when this started.

My latest MRP log
                Sunday, November 7, 2021 08:00:22
08:00:22 MRP Regeneration process begin.
08:00:22 ------------------------------------------------------------
08:00:22 Part Run MRP                               -> False
08:00:22 Cut Off Date                               -> 11/7/2031 12:00:00 AM
08:00:22 Schedule Start Date                        -> 11/7/2021 12:00:00 AM
08:00:22 Run Finite Scheduling                      -> False
08:00:22 Ignore Constrained Materials               -> True
08:00:22 Allow Historical Dates                     -> True
08:00:22 Use Production Preparation Buffer          -> False
08:00:22 Sort Level 0 MRP Jobs by Date              -> False
08:00:22 Recycle MRP Jobs                           -> False
08:00:22 Get Details From Quote                     -> False
08:00:22 Run Multi Level Pegging                    -> False
08:00:22 Include Contract PO Parts                  -> False
08:00:22 Generate Purchase Schedules                -> False
08:00:22 Number of MRP Processes                    -> 7
08:00:22 Number of Schedulers                       -> 7
08:00:22 Rough Cut Schedule When Getting Details    -> True
08:00:22 Site List                                  -> MfgSys
08:00:22 Part List                                  -> 
08:00:22 Part Class List                            -> 
08:00:22 Product Group List                         -> 
08:00:22 ------------------------------------------------------------
08:00:22 MRPRowCount (rows)                         -> 2000
08:00:22 MRPCommandTimeout (seconds)                -> 3600
08:00:22 ------------------------------------------------------------
08:00:22 Process Control
08:00:22   Name -> ProcessorMain; Code -> Erp.Internal.MR.MrpExp.dll
08:00:22     Process Stop Queues -> Part~Delete~LotJobSplit~SchedJob~FirmJob~SaveLoad~QuotePart
08:00:22     Controller -> True; Range Start -> 0; Range End -> 0
08:00:22 Version 1.00.01
08:00:22   Name -> ProcessorMRP; Code -> Erp.Internal.MR.MrpExpCD.dll
08:00:22     Process Stop Queues -> 
08:00:22     Controller -> False; Range Start -> 1; Range End -> 100
08:00:22 Version 1.00.01
08:00:22   Name -> ProcessorSched; Code -> Erp.Internal.MR.MrpExpSched.dll
08:00:22     Process Stop Queues -> 
08:00:22     Controller -> False; Range Start -> 101; Range End -> 200
08:00:22 Version 1.00.01
08:00:22 ------------------------------------------------------------
08:00:22 Process Queue
08:00:22   Name -> BuildDelPOQueue; Queue In -> ; Queue Out -> DeleteAllPO; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Delete; Check Group -> ; Repeat -> False; Regen Only -> True; Finite -> False
08:00:22   Name -> BuildDelJob; Queue In -> ; Queue Out -> DeleteJob; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Delete; Check Group -> ; Repeat -> False; Regen Only -> True; Finite -> False
08:00:22   Name -> BuildDelTO; Queue In -> ; Queue Out -> DeleteTO; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Delete; Check Group -> Delete; Repeat -> False; Regen Only -> True; Finite -> False
08:00:22   Name -> ProcessDelPO; Queue In -> DeleteAllPO; Queue Out -> ; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Delete; Check Group -> ; Repeat -> False; Regen Only -> True; Finite -> False
08:00:22   Name -> ProcessDelJob; Queue In -> DeleteJob; Queue Out -> ; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Delete; Check Group -> ; Repeat -> False; Regen Only -> True; Finite -> False
08:00:22   Name -> ProcessDelTO; Queue In -> DeleteTO; Queue Out -> ; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Delete; Check Group -> ; Repeat -> False; Regen Only -> True; Finite -> False
08:00:22   Name -> BuildSched; Queue In -> ; Queue Out -> PreJobF; Controller -> True
08:00:22     Alt Queue Out 1 -> PreJobI; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Sched; Check Group -> Sched; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> SchedJobF; Queue In -> PreJobF; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Sched; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> True
08:00:22   Name -> SchedJobI; Queue In -> PreJobI; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Sched; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> BuildQuoteJobQueue; Queue In -> ; Queue Out -> QuotePart; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Quote; Check Group -> Quote; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> ProcessQuoteJob; Queue In -> QuotePart; Queue Out -> ; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Quote; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> BuildNonPart; Queue In -> ; Queue Out -> Part; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> NonPart; Check Group -> NonPart; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> ProcessNonPart; Queue In -> Part; Queue Out -> SchedJobF; Controller -> False
08:00:22     Alt Queue Out 1 -> LotJobSplit; Alt Queue Out 2 -> SchedJobI; Alt Queue Out 3 -> FirmJob
08:00:22     Group -> NonPart; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> LotJobSplitNonPart; Queue In -> LotJobSplit; Queue Out -> SchedJobF; Controller -> False
08:00:22     Alt Queue Out 1 -> LotJobSplit; Alt Queue Out 2 -> SchedJobI; Alt Queue Out 3 -> FirmJob
08:00:22     Group -> NonPart; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> SchedNonPartF; Queue In -> SchedJobF; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> FirmJob; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> NonPart; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> True
08:00:22   Name -> SchedNonPartI; Queue In -> SchedJobI; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> FirmJob; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> NonPart; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> FirmJobNonPart; Queue In -> FirmJob; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> NonPart; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> BuildPart0; Queue In -> ; Queue Out -> Part; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Zero; Check Group -> Zero; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> ProcessPart0; Queue In -> Part; Queue Out -> ZeroJob; Controller -> False
08:00:22     Alt Queue Out 1 -> LotJobSplit; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Zero; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> LotJobSplitPart0; Queue In -> LotJobSplit; Queue Out -> ZeroJob; Controller -> False
08:00:22     Alt Queue Out 1 -> LotJobSplit; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Zero; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> SortPart0; Queue In -> ZeroJob; Queue Out -> SchedJobF; Controller -> True
08:00:22     Alt Queue Out 1 -> SchedJobI; Alt Queue Out 2 -> FirmJob; Alt Queue Out 3 -> 
08:00:22     Group -> ZeroSort; Check Group -> ZeroSort; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> SchedPart0F; Queue In -> SchedJobF; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> FirmJob; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> ZeroSort; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> True
08:00:22   Name -> SchedPart0I; Queue In -> SchedJobI; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> FirmJob; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> ZeroSort; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> FirmJobPart0; Queue In -> FirmJob; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> ZeroSort; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> BuildPart; Queue In -> ; Queue Out -> Part; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Part; Check Group -> Part; Repeat -> True; Regen Only -> False; Finite -> False
08:00:22   Name -> ProcessPartL; Queue In -> Part; Queue Out -> SchedJobF; Controller -> False
08:00:22     Alt Queue Out 1 -> LotJobSplit; Alt Queue Out 2 -> SchedJobI; Alt Queue Out 3 -> FirmJob
08:00:22     Group -> Part; Check Group -> ; Repeat -> True; Regen Only -> False; Finite -> False
08:00:22   Name -> LotJobSplitPartL; Queue In -> LotJobSplit; Queue Out -> SchedJobF; Controller -> False
08:00:22     Alt Queue Out 1 -> LotJobSplit; Alt Queue Out 2 -> SchedJobI; Alt Queue Out 3 -> FirmJob
08:00:22     Group -> Part; Check Group -> ; Repeat -> True; Regen Only -> False; Finite -> False
08:00:22   Name -> SchedPartLF; Queue In -> SchedJobF; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> FirmJob; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Part; Check Group -> ; Repeat -> True; Regen Only -> False; Finite -> True
08:00:22   Name -> SchedPartLI; Queue In -> SchedJobI; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> FirmJob; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Part; Check Group -> ; Repeat -> True; Regen Only -> False; Finite -> False
08:00:22   Name -> FirmJobPartL; Queue In -> FirmJob; Queue Out -> SaveLoad; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Part; Check Group -> ; Repeat -> True; Regen Only -> False; Finite -> False
08:00:22   Name -> ProcessLoad; Queue In -> ; Queue Out -> ; Controller -> True
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Load; Check Group -> Load; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22   Name -> SaveLoad; Queue In -> SaveLoad; Queue Out -> ; Controller -> False
08:00:22     Alt Queue Out 1 -> ; Alt Queue Out 2 -> ; Alt Queue Out 3 -> 
08:00:22     Group -> Load; Check Group -> ; Repeat -> False; Regen Only -> False; Finite -> False
08:00:22 ------------------------------------------------------------
08:00:22 Beginning of Delete Unfirm Jobs
09:40:18  Site: MfgSys After Delete of Table: ShopLoad
09:40:40  Site: MfgSys After Backout for JobMtl
09:40:45  Site: MfgSys After BackOut of JobAsmbl
09:42:03  Site: MfgSys After Delete of Table: JobProd
09:42:04  Site: MfgSys After Delete of Table: JobAudit
09:42:04  Site: MfgSys After Delete of Table: PcValueGrp
09:42:04  Site: MfgSys After Delete of Table: PcValueHead
09:42:04  Site: MfgSys After Delete of Table: PcValueSet
09:42:04  Site: MfgSys After Delete of Table: JobMtlRestriction
09:42:04  Site: MfgSys After Delete of Table: JObMtlRefDes
09:42:04  Site: MfgSys After Delete of Table: JobMtlInsp
09:42:04  Site: MfgSys After Delete of Table: JobMtlRestrictSubst
09:42:04  Site: MfgSys After Delete of Table: PcValueGrp
09:42:04  Site: MfgSys After Delete of Table: PcValueHead
09:42:04  Site: MfgSys After Delete of Table: PcValueSet
09:46:09  Site: MfgSys After Delete of Table: JobMtl
09:46:09  Site: MfgSys After Delete of Table: JobOperInsp
09:47:33  Site: MfgSys After Delete of Table: JobOper
09:49:47  Site: MfgSys After Delete of Table: JobOpDtl
09:49:47  Site: MfgSys After Delete of Table: JobOperRestriction
09:49:47  Site: MfgSys After Delete of Table: JobOperRestrictSubst
09:49:47  Site: MfgSys After Delete of Table: PartSug
09:49:47  Site: MfgSys After Delete of Table: PcValueGrp
09:49:47  Site: MfgSys After Delete of Table: PcValueHead
09:49:48  Site: MfgSys After Delete of Table: PcValueSet
09:50:28  Site: MfgSys After Delete of Table: JobAsmbl
09:50:28  Site: MfgSys After Delete of Table: JobAsmblRestriction
09:50:28  Site: MfgSys After Delete of Table: JobAsmblInsp
09:50:28  Site: MfgSys After Delete of Table: JobAsmblRestrictSubst
10:13:27  Site: MfgSys After Delete of Table: TFOrdDtl
10:13:28  Site: MfgSys After Delete of Table: JobPart
10:24:02  Site: MfgSys After Delete of Table: PartDtl
10:24:02  Site: MfgSys After Delete of Table: PegDmdMst
10:24:03  Site: MfgSys After Delete of Table: ChgLog
10:24:03  Site: MfgSys After Delete of Table: Memo
10:24:03  Site: MfgSys After Delete of Table: XFileAttch
10:24:03  Site: MfgSys After Delete of Table: ResourceTimeUsedSub
10:25:24  Site: MfgSys After Delete of Table: ResourceTimeUsed
10:25:24  Site: MfgSys After Delete of Table: JobOperMachParam
10:25:29  Site: MfgSys After Delete of Table: JobHead
10:25:29 End of Delete Unfirm Jobs===
10:25:29 Starting sub-processes
10:25:29 Deleting old Suggestions.
10:25:29 Deleting unfirm jobs...
10:25:30 Deleting transfer order suggestions ...
10:25:50 Scheduling Jobs...
10:25:58 Building Non-Part List
10:26:10 Building PartList Level: 0-0
21:00:21 Building PartList Level: 1-0
21:00:46 Building PartList Level: 2-0
21:25:25 Building PartList Level: 3-0
21:30:15 Building PartList Level: 4-0
21:30:42 Building PartList Level: 5-0
21:31:14 Building PartList Level: 6-0
21:31:26 Building PartList Level: 7-0
21:31:31 Processing Schedule Load
21:34:51 Processing Orphan PO's...
21:34:56 Processing Orphan PO:326806 Line:1 Rel:1.
21:35:11 MRP Regeneration process finished.

It’s nearly 2 years later and we still struggle with this same issue. The timeout setting has enabled us to keep MRP running even when it is terribly slow… We have bumped the MRP start time back to 7pm and it still sometimes runs over 12 hours and into the next workday. When it happens, it is 100% because of the gremlin that locks up TFOrdDtl table delete. It will take close to 2 hours per site when it’s really acting up. So I’ve seen it take over 6 hours to finish the deletes and start to rebuild the jobs. We do full regen each night and we like it that way. I know it can go faster because for several months at a time, it will seemingly hum along with only seconds to complete that table delete step. What on earth could cause it to fluctuate from a few seconds to a few hours?? I’m beginning to wonder if there’s bad data somewhere. An orphaned record? What could it be? Any ideas on what I could do to try to catch the culprit? As an example, two nights ago MRP completed at 3:14am, a respectable 8ish hours. I’ve seen it finish as early as 1:30-2:00am. But then last night it didn’t finish until 7:45am this morning.

No kidding, I had a huge issue with TFOrdDtl 11 months ago and now I see what you mean - I am not alone there!

So, I have a whole write-up on the site here about this… BUT - I can’t get to it since I am not a “Regular” at the moment. :cry:

But the solution is to get into SSMS and “force a good plan” from the query store.

That was gibberish to me until a walkthrough from a great Epicor consultant (they do exist).

1 Like

I have been becoming more and more familiar with SSMS and all the little tools and goodies. But I do not know how to force a good plan. I know how to check my plan. And I don’t know what will make a better plan than what I have…

I’ll try to do some screenshots. I’ve been meaning to.

Here you go. Pick a flavor:

Why is this step of MRP taking so long.pptx (690.4 KB)

Why is this step of MRP taking so long.pdf (651.8 KB)

1 Like

I cannot tell you the level of anticipation I have as I sit here waiting for SSMS to freaking launch already.

1 Like

PM me if you want to talk on the phone/Teams or whatever, if I need to explain anything.

So we didn’t have Query Store turned on. I enabled it… now I guess I wait…

THANK YOU FOR THIS VERY SIMPLE EASY TO FOLLOW EXPLANATION.

Oh yeah, that’s something that was already on for me. I don’t know who to thank there, but someone here must have done that. Unless it’s a default.

I wouldn’t even know where that is. I’m sure it’s easily google-able.

Ugh, so now, yeah, you’d have to somehow rerun MRP in regen many times in order to build up that history.

Don’t know if this is a dumb idea?

Recently we had a domino effect where an order entered via WordPress / service connect, had a part with no partplant record for a new site, and it caused service connect to hang. Then MRP hung later, and after the part was fixed, MRP went from 2 hours to 4 hours, and then infinite as it started hanging too.

I went to check the log and found that the first log that gets updated every day, was a text file claiming to be 30gb. Not sure if that’s even possible but it crashed notepad trying to open it. The following 8 logs were equally huge, then there were about 20 of >100mb claimed size. The bad part had evidently caused projectile-vomit level log entries.

I started deleting them and it was estimating 15 hours; I used PowerShell instead and it took 2 minutes.

Then MRP ran in an hour!

It has been getting slower and slower, from 40min 2 years ago to 2 hours lately. It seems at least some of the log files get appended, and maybe simply opening them gets slower as they increase in size?

Thinking about PowerShell being fast, I wrote a script running on Task Scheduler to move each day’s logs to an archive folder, then delete them after an aging period in case I have to look them up, and email me a daily report. 2 weeks later MRP is still running in an hour.