cancel
Showing results for 
Search instead for 
Did you mean: 

How to Train SAP Document Information Extraction

Hi,

I'm trying to use SAP Document Information Extraction to extract data from invoices. I've used the standard model, and it is somewhat accurate. However, I would like to train the model using invoices from my company and test the results. When I use the standard model (no schema/template), manually fix incorrect results, and confirm, am I training the global model or am I training it locally? Also, when I create my own schema and add my own fields, there are no options in the default extractor dropdown. When I upload a document and use my own schema and then manually enter the results, when I use the same schema the second time it doesn't extract anything. Are schemas the thing you are training, or are they only a definition of which items you want? Also, when I created a template, when I only had one document in the sample documents and used the template on that same document again, it was 100% correct. But when I tried it on a document of a different format, it didn't work, it appeared to just look for the same spot on the page as the one sample document. When I added a second sample document of a different format, it seemed to confuse the template and was unable to read the first sample document correctly. Overall, if I'm trying to create my own invoice processor for my company, should I be looking to train the standard model, a schema, or a template? Thanks for your help.

Accepted Solutions (0)

Answers (1)

Answers (1)

tomasz_janasz
Advisor
Advisor

AC: I've used the standard model, and it is somewhat accurate.

SAP: We maintain a global model approach, it means that the service generalizes on unseen document structures/layouts.

AC: However, I would like to train the model using invoices from my company and test the results. When I use the standard model (no schema/template), manually fix incorrect results, and confirm, am I training the global model or am I training it locally?

SAP: The service is not being automatically retrained, yet. However, you can enable the "feedback loop" (confirm document) to SAP, so SAP can use correct values for future retrainings: https://help.sap.com/viewer/5fa7265b9ff64d73bac7cec61ee55ae6/SHIP/en-US/aca65c29c87246e8b93e385ceb2a...

AC: Also, when I create my own schema and add my own fields, there are no options in the default extractor dropdown.

SAP: You can use default extractors. For that you need to select the document type that is supported by the standard service while creating a new schema.

AC: When I upload a document and use my own schema and then manually enter the results, when I use the same schema the second time it doesn't extract anything. Are schemas the thing you are training, or are they only a definition of which items you want?

SAP: Schema is an “artefact” you use to define the “data model” (fields) for extraction. With schemas you define and describe the fields you want to extract from the template. As a basis you can use the schemas provided by SAP and amend them as you wish.

AC: Also, when I created a template, when I only had one document in the sample documents and used the template on that same document again, it was 100% correct. But when I tried it on a document of a different format, it didn't work, it appeared to just look for the same spot on the page as the one sample document.

SAP: That is exactly the purpose of the “Template” feature: to process incoming documents of known templates. You can use it e.g. for strategic suppliers where you get hundreds of invoices per month or for recurring layouts such as personal IDs or non-machine readable forms. You create a template for each layout.

AC: When I added a second sample document of a different format, it seemed to confuse the template and was unable to read the first sample document correctly.

SAP: The template feature supports only one layout. Adding yet another layout will break the existing template.

AC: Overall, if I'm trying to create my own invoice processor for my company, should I be looking to train the standard model, a schema, or a template? Thanks for your help.

SAP: In that case the recommendation would be: use the standard invoice model by default. For strategic suppliers (frequently recurring) start creating templates. To create a template you have to first define your schema, i.e. a data model for fields you want to extract for your business process. You can reuse the standard SAP schema for invoices (or amend it, if needed). If you need help please contact us directly.

harianantha
Participant
0 Kudos

Hi Tomasz,

SAP: In that case the recommendation would be: use the standard invoice model by default. For strategic suppliers (frequently recurring) start creating templates. To create a template you have to first define your schema, i.e. a data model for fields you want to extract for your business process. You can reuse the standard SAP schema for invoices (or amend it, if needed). If you need help please contact us directly.

For the above point, it works for Document Extraction UI. But in my case, I am using the DOX service(swagger api) as a REST service in my SAP MDK application. So my question is if I create a new schema by adding data fields in the Doc Extraction UI as per the above point, can I expect/get the same extraction results within my MDK app?