Automation Studio error monitoring

jepp · January 20, 2023, 2:44pm

How are others doing error monitoring within Automation Studio? I am seeing documentation within the Workato Academy referring to using the RecipeOps connector to monitor for errors within all of the other recipes and also monitor for when a recipe is stopped by Workato. I got that setup and working but then I got a notice that I had exceeded our connection limit.

It seems that having to pay for an additional connection just to monitor the iPaaS for errors is a little redundant and unfair, but I also might be missing some other error monitoring solution that doesn’t have that requirement.

jepp · February 6, 2023, 4:27pm

Just for future reference, in case anyone comes across this thread in their searches.

I have been working back and forth with Epicor Support and my CAM on this issue and, so far, it doesn’t sound like Epicor supports error monitoring in a way that is intuitive or easy. I am still waiting on final confirmation from my CAM and will update this thread when/if something changes, but the response from Epicor support was:

I reviewed but this would be a question to review with Professional Services for Best practices and any other considerations for your Recipe setup and configuration. If you did use that connector it would be counted as you have seen.

Within an Automation Studio recipe, there is an error handler object but that really only works for errors within the recipe, not connection/agent issues, etc. It also would need to be setup within each individual recipe, which means duplicating blocks within multiple recipes and just adds unneeded complexity, in my opinion.

The RecipeOps connection from Workato is the tailor made solution for this problem (it monitors for job failures, recipes that were stopped by Workato due to connection issues, etc.) but it is asinine to have to pay for an additional connection in order to use what should be a basic consideration/requirement for any integration platform.

I hope that Epicor will come up with a better solution for error handling because, other than this issue, I have been really happy with Automation Studio and it does seem to be very user-friendly, comprehensive, and powerful. It’s not all doom and gloom, as I think Epicor is on the right path here, but for anyone considering an Automation Studio implementation, you should expect to either have limited error monitoring options or to have to pay for one more connection than you really think you should need.

utaylor · February 6, 2023, 4:34pm

Yo thank you for bringing this important topic up.

I was going back and forth with @Edge and @hkeric.wci on LinkedIn and that’s when I commented, “where I always get hung up is how to make integrations closed and contained. So if you have to utilize 4-5 different services to make a complete solution and each one of those can break down in the process, how do you contain that error for each respective service and log it so that someone can troubleshoot.”

I had similar thoughts for automation studio when they unveiled it… How do we resolve any errors in the workflow? What’s that process look like?

-Utah

utaylor · February 6, 2023, 4:34pm

hmwillett · February 6, 2023, 4:48pm

Yeah, I’ve had issues with that too.
A lot of my error handling is within functions being called, but any widget that fails within Workato is a bit of a black hole.
It’s rather reactionary at this point–as in the users go “Hey–XYZ didn’t happen last night.” “Oookkay, let me go look at the portal and try and debug this after the fact…”

jepp · February 6, 2023, 4:52pm

Exactly! This all came up after we had an AppServer failure over a weekend (different issue).

On Monday, we recycled the app pool and everything came back up (or so we thought). Apparently, since the AppServer was down for an extended period, Workato eventually stopped the recipe after 50 consecutive connection failures, so even though the AppServer was back up, it required manual intervention to log into Automation Studio and start the recipe again. Unfortunately, we didn’t realize this until several days later when, as you pointed out, a user said, “Hey! This new customer that I created in Epicor isn’t showing up in the CRM!”

As soon as we noticed and restarted the recipe, it queued the 3,000 updates that had happened during the time that the recipe was stopped and all of the jobs completed successfully (i.e., there were no actual failures within the recipe itself). Using in-recipe error handling would have done nothing to solve this particular case. Someone would have to know to check Workato after an AppServer outage to get it running again.

hmwillett · February 6, 2023, 4:54pm

Interesting. I was not aware of this (hadn’t come up yet).
I’ll have to keep an eye on things.

MikeGross · February 6, 2023, 5:12pm

Thanks for sharing this - and I couldn’t agree more. On one hand I can see how the ‘headless’ processing state is really what we’re going for here (and Workato’s reasoning for the extra $$ “optional” endpoint), but until these recipes (or the task engine itself) can be made to self-monitor, self-determine, and self-heal, there simply must be a level of oversight, management, event logging, and notification that we can manage.

I’m not implemented yet, so this will be on my to-do list from the start.