Odata crapping out on large BAQ's

mchinsky · March 27, 2020, 9:18pm

I’m trying to use an external tool (trying a cdata driver and also PowerBI with odata) against a BAQ that is simply about 10 fields from the parttrans table.

The source table is about 3m rows. It keeps crapping out with something like (500) Internal Server error.

Not until I put a filter in the baq to reduce the number of rows to about 600k will it run without an error.

I’m getting the sense something is dying insite the IIS worker process, perhaps memory issue or something.

Is there some setting to ratchet that up so it can handle a larger BAQ?

Have any of you been able to pull over parttrans or gl transactions into an external application via odata on a large database (ie. more than 500k records)?

bordway · March 28, 2020, 2:18pm

Never used that approach so not sure these would help
but… here are a couple things I have looked at when I’ve had a long running “PartTran” BAQ.

include fields Company and SysRowID
played with timeout setting in BAQ designer
ref: BAQ TIMEOUT - what is used - #6 by bmanners

mchinsky · March 30, 2020, 4:28pm

Unfortunately for SaaS clients, you can’t do anything in the admin tool

Olga · March 30, 2020, 5:09pm

With Saas client you typically cannot return more than 100 rows with OData, as far as I know.

Probably you can create BAQ with aggregations and then use its results?

mchinsky · March 30, 2020, 5:21pm

100 Rows? I’m getting as many as 500,000 rows. The problem is parttran is 3,000,000 rows

TommyD · March 31, 2020, 8:26pm

I’m an E10 SaaS customer, and I get 20k+ rows without a problem with Excel / Odata.

Olga · March 31, 2020, 9:02pm

I see, I did not know that limit was removed.
But anyway, there are no way you can get infinite number of rows from there, so you have to retrieve data in chunks

John_Mitchell · April 1, 2020, 2:19am

Can you not export the BAQ as a CSV initially and then load the CSV into your Azure Data Lake? After that you will only need to get the delta from each load which will be a much much smaller dataset. You can look at any of the zCRM BAQ’s to see something very similar.

It would be really really nice if we could get more BAQ Export destinations like Azure Data Lakes and Amazon S3. @Dmitry_Kashulin @Edge

Dmitry_Kashulin · April 1, 2020, 11:31am

100 rows is about methods that end up in calling GetRows method for a service.
BAQ via OData tries to execute the query, then transform the result into OData format and the return it out.
Probably that is an out-of-memory issue, as the error gone when you put a filter.
Did you try to run the query via BAQ Designer? You can turn off the UI limitation for maximum number of returned rows by adding RemoveTestRowLimit query execution setting.

Dmitry_Kashulin · April 1, 2020, 11:38am

@John_Mitchell, do you mean a ‘push’ approach in providing a query result? Currently, BAQ provides a query result as a response on OData request or as a scheduled upload to specified folder.
As I understand, you suggest teaching BAQ export process to pump query output directly into an external destination point, e.g. Amazon S3 storage.

Mark_Wonsil · April 1, 2020, 11:54am

Would not an Azure Web or AWS Batch Job serve as a way to extract and load to Data Lakes or S3 buckets?

danbedwards · April 1, 2020, 12:01pm

Power Automate aka Microsoft Flow does this really well. Pulls Odata (incrementally if you want) and loads the Target whether it is Azure Data Lake or S3. Not a lot of native functionality for S3 but getting better every release.

John_Mitchell · April 1, 2020, 1:46pm

@Dmitry_Kashulin Exactly. Is there a limit to the BAQ Export size though? I think @mchinsky is trying to dump PartTran to be loaded into PowerBI dataset.

@Mark_Wonsil I don’t think that would work in this example because of the limits on Odata size and they are using an Epicor hosted server. If it was on-prem or smaller tables that would work for sure.

danbedwards · April 1, 2020, 3:15pm

One way to do this is to pass the BAQ a date range and use multiple Power Queries to bring the data together - if you don’t want an intermediate data source.
Example:
Power Query1 pulls BAQ passing a date range of 2018
Power Query 2 pulls BAQ passing date range on 2019
Power Query 3 pulls BAQ passing 2020
You then aggregate them in Excel or Power BI. You can make the date dynamic like year -1 etc.

Mark_Wonsil · April 1, 2020, 3:28pm

Not sure why SaaS matters. We pulled large sets all of the time. In fact, staying within Azure while pulling from SaaS into the Data Lake (and maybe even AWS S3) will likely use bigger pipes than the ones pulling from On-Prem installations.

As Dan suggests, one could break down the number of pulls with some scheduled clever PowerShell using a year. But assume a year is still too big, I think you can rely on the SysRevID always increasing, maybe even the TranNum. Now pass the last SysRevID or TranNum from your Data Lake or S3 bucket into your BAQ and run the query with every record above that with a TOP limit. Keep running until no records are pulled or less than the limit.

hkeric.wci · April 1, 2020, 3:30pm

I always use pagination, or like you said pull 3 months at a time, then the next 3 months, then the next 3 months – eat the elephant one byte at a time and then combine it on my end and do what I have to do.

If you are pulling that much over and over again, you are prob using it wrong. store the historical records somewhere then don’t keep re-pulling them, find a checkpoint (mark the SysRowID you left off at or something) and then grab the batch going forward. If you were paying for your own Azure space, you would be mindful of this.

I think (personal opinion) Epicor should charge Cloud Users 1 penny per megabyte So that strategic planning is used by companies IT departments.

Bart_Elia · April 1, 2020, 3:40pm

It sounds like you want to just grab changed / added data. The CDC approach in 10.2.500+ is a great foundation for that. Stay tuned for 600 announcements that improve on that approach.

For a short term approach, you can use what @Mark_Wonsil mentioned -

select * from BAQ where SysRevID > ‘lastsyncsysrevid’.

Basically grab a chunk of data and the max sysrevid is the start of the next page of data. It worked wonderfully to keep multiple data sources in sync.

GTI07094 · March 1, 2021, 10:09pm

@danbedwards I am exploring using Power automate. Did you find some examples on this? Would like to see how you made this happen if you have some time.

John