Mechanism of Document Information Extraction

former_member797828 · ‎03-29-2022

Hi,

I'm trying to use SAP Document Information Extraction to extract data from documents.

I have read the answer to the question in the following URL and test the results using several documents, it would be appreciated if you could answer the two questions stated below.

https://answers.sap.com/questions/13508298/how-to-train-sap-document-information-extraction.html

I created my own schema and created a template using one sample document with annotation. When I used the template on that same document again, it was 100% correct. But when I tried it on a document of the same format but with different contents, several fields were not extracted from the right spot (the spot annotated in the sample document). According to the answer in the URL, "template" feature is used to process incoming documents of known templates(the same as which was used to define the template?), all the fields are expected to be extracted from the same spot or coordinate in the document. However, it seems not to be the case. If the fields are not extracted based on the annotated coordinate, what is the mechanism behind the extraction. Is there any learning mechanism instead of simply remembering the coordinates?

Also, I noticed that multiple sample documents(5 max?) can be uploaded and annotated within one template, and according to the URL, all the samples should share the same format. What is the benefit of uploading multiple samples with the same format? For example, higher accuracy/reliability of the extracted fields, or more robust to the misalignment or inclination of the format?

Thank you for your help.

Best Regards.

tomasz_janasz · ‎09-22-2022

Hi Ludovic,

our support team would need to see the template and the sample document. Please raise a ticket with the following component: CA-ML-BDP-TEM. Please provide the template (you can export it via the UI).

https://support.sap.com/

Best regards,
Tomasz

tomasz_janasz · ‎09-22-2022

Hi Ludovic,

pelase note: you can apply the Tempalte feature only to one layout. I.e. combining different layouts within one Template will raise issues.

If you use template for one of the standard documents (e.g. invoice or purchase orders) the pre-trained Global Model will also kick in to support the extraction of the values. If you do not want that to happen you need to avoid using Default Extractors. This you can define in your particular Schema that you use for your Template creation:

https://help.sap.com/docs/DOCUMENT_INFORMATION_EXTRACTION/5fa7265b9ff64d73bac7cec61ee55ae6/020ab638c...

Best regards,

Tomasz

tomasz_janasz · ‎04-07-2022

Hi Gao,

the current templating feature of Document Information Extraction is coordinates-based. It means that you specify the location on the document where you expect the key-value pair to reside. That is also why you do not need to annotate more than 5 samples because it does not add value.

If you use the template for incoming documents of the same layout you need to specify the template ID or use the template auto-detect function. Please refer to the corresponding help documentation:

https://help.sap.com/viewer/5fa7265b9ff64d73bac7cec61ee55ae6/SHIP/en-US/b722fe7170af4dd8b171f8394f43...

If you still experience poor extraction results with a template please raise a support ticket with the following component: CA-ML-BDP under https://launchpad.support.sap.com/. Please provide a sample document and the exported template.

Best regards,

Tomasz (from the product team)

Mechanism of Document Information Extraction

Accepted Solutions (0)

Answers (3)

Answers (3)

How to transform a date into calendar week in SAC ...

Re: Tile Personnalization

Re: Crystal Report / PrintOutputController.ModifyP...

Re: Need clarification on systems for Snote implem...

Need clarification on systems for Snote implementa...