Intermittent Report Locking Issue

josecgomez · November 19, 2019, 1:39pm

This is a bit of a hail-mary but I figured I’d post here this question to see if anyone has any outside the box thoughts.
We have a custom report written by Epicor CSG (Custom RDD, SSRS) which is a bear very large report contains and sorts through a LOT of data. This report works fine 9 out of 10 times
We can run the report, and poof out it comes a few minutes later

However we have this non repeating pattern that “sometimes” the report will go into Active Tasks in Task Agent and just sit there “running” for hours, days, weeks it will never finish. The same report with the same data and same parameters will run fine 9 times and on the 10th or 11th or 45th run it will just sit there and “hang”

There are no visible error reports anywhere we’ve located, and the only fix is a COMPLETE reboot of the server before the Report will run again. I have to manually remove the report from the Active Task queue, restart Task Agent, Restart App Server and even restard the DB Server and then it will work fine.

Just restarting Task Agent, and or App Server doesn’t do it

We are using routing rules to break down this report into smaller “batches” and automatically email these to our customers. The report may have 500+ pages and the “routing” breaks it into say 5-10 pages per customer and emails those out.

Does anyone have any idea what may be going on here? @Rich @JeffLeBert et all?

I keep leaning to either SQL Record locking because of an issue in RDD or a memory leak in the routing / breaking process or SSRS but frankly I can’t seem to find any evidence of this specifically and since it can’t be reliably replicated it has been a bear to troubleshoot.

I have told my employer several times we should just re-write the report using RDD-BAQ instead of the custom CSG RDD they are using but thus far that has fallen on deaf ears.

Thoughts? Comments? Suggestions? Support for my re-write efforts?

josecgomez · November 19, 2019, 1:44pm

Just got some more feedback internally apparently the “locking” can happen even if we don’t use the Routing Option so there goes “Routing” as a potential source of the problem.

Any other ideas?

Evan_Purdy · November 19, 2019, 1:55pm

Well I haven’t a clue about the actual problem, but my experience is that if this is important to them, and you know re-writing using RDD-BAQ will fix the issue, it would most likely be the best choice.

Its nice to think that a little stroke of genius or luck will allow you to fix the existing report, but I have found that this is the less common outcome for intermittent, difficult to troubleshoot issues. More often I bang my head against the problem for so long, damaging my mindset greatly in the process, and then have to develop the new solution under the mindset of having failed. Its more satisfying to figure things out than to go around them, but sometimes we need to make practical choice that doesn’t satisfy us.

JeffLeBert · November 19, 2019, 3:37pm

As you all know I’m a big fan of the RDD-BAQ path. If it is trivial (which I doubt it is) I would rewrite the report just to make it more understandable.

Assuming we don’t rewrite, let’s try to figure out what is wrong with this report. I think the first thing to do is figure out where things are going wrong. Printing a report can be broken down into two basic steps: preparing the data and rendering the report(s) from that data. You mentioned you thought it might be a database locking issue. If that is the case then you would see the report start running from System Monitor, but you would never see any report being rendered by SSRS. Once the report data has been collected, only data in the temporary report database is used. That means there should be no way of locking that data since it will only be used by the report itself.

Here’s how you would figure that out. First, turning on tracing for reports in the AppServer.config file is simple and easy. It’s been a while since I looked at this, but you will see “trace://ice/fw/reporting” and “profile://ice/fw/printrouting”. I’m not sure if you will get anything useful, but it is cheap to turn on.

Probably the best thing though is SSRS’s own logging. See this for how to use: https://docs.microsoft.com/en-us/sql/reporting-services/report-server/report-server-executionlog-and-the-executionlog3-view?view=sql-server-ver15. This should tell you enough to figure out if the report render requests are actually making it to SSRS and which ones failed.

That’s about as far as we can go without getting more information. It should be enough to tell us what to look at next.

josecgomez · November 19, 2019, 3:40pm

Thanks Jeff
I’ll try turning on report tracing, I will note that sometimes when it “hangs” I do see reports in SSRS but generally this via routing so “some” of the routes make it out through SSRS and some “don’t”
Out of say 50 Routing Emails, it will get through 35 and then just “hang” (somewhere)
I’ll keep investigating and let you know

JeffLeBert · November 19, 2019, 3:49pm

OK, that means we got past building the data in the temporary report database. That leaves us with something in how SSRS is rendering a specific report. The SSRS executionlog3 I linked to above could help with that. Unfortunately, at that point you have to start digging into the RDL and figure out what is broken.

Also, converting to an RDD-BAQ probably wouldn’t help here because the big difference is how the data is collected before the reports as sent to SSRS to render.