Turning off Generate PO Suggestion on a part CRASHING EPICOR DB

emanpittfan · February 7, 2024, 3:33pm

We have an issue when we try and turn off “Generate PO Suggestion” in the “Plant” Menu in Part maintenance for a particular part and then click Save, our Epicor system immediately goes into lockdown and slows to a crawl. Rebooting is the only fix and then when we go back and check, the Generate PO Suggestion tickbox remained checked. Has anyone ever seen this kind of behavior and if so, what was it about the part that caused it? We are at a loss as we don’t see anything strange about this part that is much different from other parts.

jtownsend · February 7, 2024, 3:45pm

The only time I’ve ever crashed the DB is when I mangled a recursive query with the safeties off. That was entirely on me.

Do you follow Epicor or MS’s recommendations for DB maintenance? I mean, changing any single field from the UI shouldn’t crash the DB. I’m just trying to gauge the overall situation because this is really abnormal. It’s been a minute since I last touched 9.05, but all of my issues were with Progress. SQL Server was rock solid.

gpayne · February 7, 2024, 4:02pm

I have not, but I would follow the normal processes in e9 to find any issue. Is verbose logging turned on? Can you do the unchecking off hours and see what is happening in the log? Possibly bpms fire when this is unchecked.

I would also check in SSMS for any queries that run.

emanpittfan · February 7, 2024, 4:14pm

Thanks Greg, we are on Progress so no SSMS but after about an hour, the server finally responded that a timeout occurred and let me close the client. BUT, the Progress DB had STOPPED RUNNING.

gpayne · February 7, 2024, 4:25pm

Did the log have any complaints? I assume progress also has some logs of what it was trying to do for all of that time.

jim.ellis · February 7, 2024, 6:46pm

The only time I have ever seen the DB crash inside of Epicor was when a brand-new comptroller attempted to capture and post WIP inside of either V8 or E9 Progress (can’t remember which it was anymore) right as a one of our twice-daily online Progress DB backups launched.

The DB crashed hard after exploding in size, and all of our online backups failed from that point forward, forcing me to perform nightly offline backups for a while.

I was only able to rid the Progress DB of the corruption which it had suffered by performing a full DB dump and load over a weekend. Then the system was happy again.

emanpittfan · February 7, 2024, 9:37pm

Our problem is inside the Epicor application and not a timing thing. We have this ONE part # that if we attempt to turn off “Generate PO Suggestions” and then click SAVE, it stalls out and then takes down the WHOLE system. The .b1 file of the database starts to grow until the server runs out of disk space.

jim.ellis · February 8, 2024, 8:05pm

The reason why I posted that story was because it was one of the only times the DB crashed, and when that happened, a dump and load was the only fix that worked.

However, as I think back, I’m remembering one more occasion when we had something similar happen inside of our system when we were running E9 Progress. At my previous employer we had some truly gargantuan job trees, some of which contained more than 10K sub-assemblies.

On one occasion every time a user made an update which impacted a certain assembly on one particular job tree, the DB went into melt-down and the .b1 file started exploding in size. It took me a couple of weeks of nail-biting before I finally tracked down what was happening, and I devised a fix.

What I finally discovered was some corruption inside of Epicor’s job tree which was completely invisible through the UI. Some background: The JobAsmbl table contains internal pointers to both parent assemblies, sibling assemblies, as well as containing both Assembly and BOM sequences. I can’t remember if there is also a first-child assembly pointer as well off the top of my head.

If you queried the job assemblies ordered by the BOM sequence, everything looked fine at first. However, after writing a program to analyze where all the assembly pointers were pointing inside of the DB, I discovered that there was a cycle inside of the job tree! By definition there should, never EVER be a cycle inside of any tree-structure!

So, whenever assemblies at a certain level, inside of a certain branch of this particular job tree were modified or added-to, the system attempted to modify all of the associated assembly’s data, at which point it would get caught inside of an infinite updates-loop through this cycle which existed inside of this one corrupted branch of a single job tree!

While I do not know whether this is what is happening for you as well, it stands to reason that you might be experiencing this same issue. I say this because if, when you make a change to a part inside of the part master which also happens to be contained inside of an invisible cycle which exists inside of some corrupted job tree, then your system may well be getting caught inside of an infinite update loop inside of some cyclic job tree.

emanpittfan · February 8, 2024, 8:22pm

Jim, THANK YOU SO MUCH for this information. While I don’t know if this is what’s happening, I TRULY APPRECIATE you acknowledging that something like this CAN HAPPEN because what you described dealing with the .B1 file is EXACTLY what had occurred with us. I’m going to play around in our test system to see if I can go down the rabbit hole you describe and see if can come up with anything similar. At least now I have SOMEWHERE I can start looking as we’ve been TRULY lost trying to figure out what this occurs and Epicor generally is only interested in trying to get us up and running again, not figure out what caused it in the first place. Thank you so much!