Pulling SQL-Like queries from Server for Machine Learning and Data Analytics

Peter1 · June 24, 2021, 11:49am

I was brought on for data analytics, and I’m trying to learn the underlying structure of our information in epicor. At first, I thought a reasonable approach to get our data was cleaning production detail reports and efficiency logs and clean them for processing with python packages. This is definitely not the move and I need to find ways to query the underlying server for mass data collection.
In your opinion would connecting to the sql server via excel be a good place to start? Where would you start if you wanted to build searches with SQL queries?

NOTE: I have just started this so anything is helpful. I’ll do my best to provide as much information about our epicor layout if needed

josecgomez · June 24, 2021, 12:15pm

I would recommend data replication setup. Where you SQL DB or certain views are replicated to a separate server / db and that can be used to get data out make cubes etc.

I would not recommend that you connect your production db to excel that’s a terrible idea IMO the production DB should be as pristine as possible and excel is a pig.

You can then connect to the replicated db with Power Pivot or the analytics engine of your choice. If you insist on using Excel Id recommend an OData Feed Pull via BAQ’s and the REST Interface.
(Quite a few articles about that in this forum, but the gist is you can pull data directly into Excel via the Epicor REST interface using an OData feed)

Peter1 · June 24, 2021, 12:53pm

These recommendations are priceless. Excel is a mess I couldn’t agree more.

utaylor · June 24, 2021, 1:21pm

Great idea.

John_Mitchell · June 24, 2021, 2:23pm

You can also pay Epicor for a replicated DB that you can build analytics around depending on your requirements.

Peter1 · June 24, 2021, 2:31pm

Is the ability to clone your data onto a replicated DB surrounded by red tape or relatively straightforward?

John_Mitchell · June 24, 2021, 2:39pm

I honestly do not know. It’s just a replicated SQL db so I can’t imagine that there is too much red tape. Have you made a decision on your BI platform or is that still being decided?

Peter1 · June 24, 2021, 2:43pm

Haven’t decided yet. It isn’t as much about packaging the data into digestible dashboards, as it is taking it and running regressions and decision trees on it.

utaylor · June 24, 2021, 2:45pm

Peter, I would love to talk to you about this after you get something up and running. Would you keep me in mind?

I am looking to learn more about the subject matter and it would be cool to connect with you to learn about what you end up doing.

bmanners · June 25, 2021, 4:22am

How clean is the data? Do you need to edit it to clean it up? Is your system on-prem or in the cloud?

If your data is clean enough and doesn’t need any editing then I guess you dont need to stage it anyway. In that case a replicated Db you can work with will be good. You could connect to it with python, excel or any other tool you have without the worry of impacting production. If you are on prem then spinning up a copy of the database should be straight forward. I assume you don’t need update the minute data to work with and if the data is a month old that is fine.

I did some experimental machine learning on our data. I had to do a fair bit of editing to clean up the data and I didnt want to edit the live data. I used Azure Data Studio to query the data with SQL queries then saved it out as a csv file. Cleaned it up in excel then used ML.Net/C# to create the machine learning model (I am familiar with C# so this option for easiest for me). As i had to edit the data the csv files were a good staging area for that process.

Very keen to hear how you go.

Brett

PS there is the Data Dictionary screen in Epicor that lists all the tables and fields. You can use that to start to get to know the structure of the tables.

Mark_Wonsil · June 25, 2021, 12:45pm

Said every data scientist everywhere.