IDC training - on what "level" is it done?

ballisticks · October 1, 2024, 11:41pm

Using IDC/ECM with a customer. They are getting bad data capture rates. We don’t use Class verification - they only have one type of document - Invoices.

They have numerous stores, and I suspect that a user in store 1 is interfering with the training done in store 2, for example.
Currently, it’s set up with one DFD (for all invoices), one Document Type (for all invoices), and numerous batch types (one for each department within each store, so lots).

Should I instead make multiple DFDs - one for each store, and/or multiple Document Types - one for each store (or both?)

Does each DFD get its own training data pool? Or is it each Document Type? I don’t think it’s each batch type based on the behaviour I’ve seen, nor do I think it’s by user.

gpayne · October 2, 2024, 12:13am

I suspected that myself and the AP trainer were giving different training so I stopped and when we did SO automation I had one trainer for all teams.

I don’t know how different DFDs would store their training, so it is possible that would not help.

Below is from the Ancora Training Data counter utility which I got to be able to remove badly trained documents. It only references fields not users or documents, but the Ancora guys are great so hopefully they can steer you in the right direction.

swilliasc111 · October 2, 2024, 1:23am

What version of IDC?

In version 9.34, IDC can have different dfds that share or don’t share training data. There is a setting in dfd configuration.

Since IDC upgraded cloud to 9.34, the Azure engine is the default and with this new engine I see 100% training retention on documents with very little training.

The usual story I see is strange setups by users and consultants can cause issues with the training processes, I would suggest contacting support to work through the issue if you don’t have a consultant that understands IDC. Using IDC and configuring IDC are different roles.

I usually try to strip away the complexity and export the training out of IDC, then delete the training data. Import the document and test how the processing is working. If the document imports and can be trained a second time, this may point to training data was interfering with the training process.

vleveris · October 2, 2024, 10:50am

To build on to what @swilliasc111 has stated, in the later versions you can lock the training of a DFD once you are satisfied with it. This can be done for the entire DFD or based on a field, such as the VendorName. This means that incorrect mappings after the lock wouldn’t have an impact on the training data.

utaylor · October 2, 2024, 1:08pm

a really nice feature. I need to upgrade.

Banderson · October 2, 2024, 1:30pm

You can also turn off training retention for specific users. That way you can have people doing processing to at least get things through the system, but not have it use their work for training. That way you can keep the training consistent.

ballisticks · October 2, 2024, 3:17pm

Ver 9.33

So not cloud yet.

Is it the default? I thought you had to specifically enable Azure with a special config file? It works out of the box now?

utaylor · October 2, 2024, 3:41pm

that’s awesome

utaylor · October 2, 2024, 3:42pm

That’s really good news.

swilliasc111 · October 2, 2024, 4:16pm

For on prem, azure is not default, it needs to be enabled.

utaylor · October 2, 2024, 5:14pm

thanks for letting me know.

swilliasc111 · October 3, 2024, 12:34am

V9.34 is where azure is an option, v9.33 azure is not available.

I would encourage an upgrade to v9.34 but I always encourage backup databases before upgrade. Try it out for your customer/team and by default v9.34 will not be allot different, the extra features will need to be enabled like splitting by doc separator variable, attachment detection, azure ocr engine, and date format handling in dfd with format recognition.

utaylor · October 8, 2024, 6:58pm

Thanks Scott, I have been looking to upgrade for some time.