Weekly Challenges

noravonthenen · 2 weeks ago

Welcome to week 1 of the May Developer Challenge on AI at SAP! The topic of this month’s challenge are the SAP AI Services; Document Information Extraction and Data Attribute Recommendation. To participate in the challenge you just have to post a screenshot of your solution as a reply in this discussion of the corresponding week.

SAP AI Services help you implement custom use cases by providing powerful algorithms specifically tailored to business problems.

Document Information Extraction:

The Document Information Extraction service is available in two editions, the original Base Edition and the new genAI-based Premium Edition. The genAI-based Premium edition is using a large language model via generative AI hub on SAP AI Core to extract information from all kinds of documents.
With Document Information Extraction you can extract information from the file types PDF or single page JPEG, PNG and TIFF.
Supported document types are: invoice, paymentAdvice, purchaseOrder, businessCard, deliveryNote, resume and birthCertificate. You can also create your own schema to process other document types.
You can also extract OCR results directly to process the raw text from you document files as well as use the classification capabilities to classify your documents into the three classes: invoice, purchase order and payment advice.
You can also enrich your extracted data with your metadata.
You can access the Document Information Extraction service via the UI, via swagger/client calls and the Python SDK.

Data Attribute Recommendation:

With Data Attribute Recommendation you can train your own model to classify data records, you can also tackle more complex classification problems such as hierarchical classification of products and predict missing data records
Data Attribute Recommendation can be used via swagger/client calls as well as the AI API Python SDK and SAP AI Launchpad
If you want to access Data Attribute Recommendations via Postman you can download this Postman Collection

Weekly Challenges

Week 1 Challenge – DOX UI

This week you will use the UI of the Document Information Extraction service to extract information from your favorite recipe. The UI is great to try out your use case and get a feeling of the capabilities of the service. For productive use cases you would call the APIs or implement a workflow using the Python SDK. Productively, you could then for example implement a workflow that processes documents right out of your mailbox, saves the extracted information in the system and structure you need as well as triggers other necessary workflows.

For this week’s challenge, use the UI to extract the header fields “recipe name”, “portions” and the line items “quantity” and “ingredient” from your chosen recipe. Therefore, you need to create a custom schema. Make sure the recipe is in one of the supported languages.

When creating a custom Schema chose the Setup Type auto to use the llm/genAI-based Premium Edition. In the description field provide information for the large language model to understand what you are referring to e.g. “the name of the recipe”.

Get a free trial account and run DOX booster: https://developers.sap.com/tutorials/cp-aibus-dox-booster-key.html
Get the Document Information Extraction UI: https://developers.sap.com/tutorials/cp-aibus-dox-ui-sub.html
Create a custom schema: https://developers.sap.com/tutorials/cp-aibus-dox-ui-gen-ai.html
OPTIONAL: Create a template and add your document to the template (improves performance for future recipes)
Upload your favorite recipe to extract the name, portions, quantity and ingredients. Make sure your recipe pdf is only 1 or 2 pages long, otherwise you will quickly reach the limit (50 pages) of the trial plan. And try not to use the entire 50 page quota because we will need it next week as well!
Submission: share a screenshot of the extraction results and the document and write a comment to share your experience using the UI in the discussion below.

Example Screenshot:

Additional information:

Processing a ©Pokémon Card in 90 seconds with Document Information Extraction powered by generative AI: https://community.sap.com/t5/technology-blogs-by-sap/processing-a-pok%C3%A9mon-card-in-90-seconds-wi...

Be aware of limits that apply in free tier and trial accounts: https://help.sap.com/docs/document-information-extraction/document-information-extraction/free-tier-...

How to improve your results: https://help.sap.com/docs/document-information-extraction/document-information-extraction/best-pract...

In this “2-min of” video I am describing the technical aspects of the BASE service (without use of LLM) behind the scenes.

geek · 2 weeks ago

Some positive results:

Some less so:

PieterB · 2 weeks ago

Here my result

Not yet a 100% correct result, but looking forward to the next challenges to learn more about the AI services

satya-dev · 2 weeks ago

Read restaurant name and address from image

M-K · 2 weeks ago

Here is my result:

Interestingly some of the ingredients were highlighted in the instructional text and not in the list, however they were all correct.

Alpesa1990 · 2 weeks ago

My submission.

In my case (Spanish language), the IA doesn´t could separate the quantity and the ingredients... But it´s so close...

IanStubbings · 2 weeks ago

My recipe. All good.

jasperdebie · 2 weeks ago

Interesting to see how it extracts data with the minimum of information:

Some small mistakes like the quantity not separately placed in the quantity field but merged in the ingredients field, even after changing the type of Quantity. Highlights almost fully correct.

Ruthiel · 2 weeks ago

Hello @noravonthenen!

Thanks for this wonderful content!

I am mesmerised by this tool and the results of it!

The unit on the time-related field surprised me since I had minutes and hours in the recipe however, all the units were correctly inserted in minutes.
I could distinguish the main ingredients and the quantities independently of the unit of measure for each line item!

kishore_kumar_g · 2 weeks ago

Hi @noravonthenen,

Thank you for the excellent content, it's truly impressive!

johna69 · 2 weeks ago

Is it May, Mai or M-AI challenge 😉

Nearly right:

noravonthenen · a week ago

@johna69 LOVE the M-AI challenge comment 😄

narendran_nv · 2 weeks ago

Not bad though, in the first attempt it wasn't able to identify any of the ingredients from the document. But I tried to mark them explicitly (only for the first 5), then on my second run of the same document it tried to map those exact same lines.

gphadnis2000 · 2 weeks ago

Interesting how Document Information extraction reads data with minimum efforts.

thomas_mller13 · 2 weeks ago

Is this AI service using LayoutLM algorithms? - In the context of a specific business application as e.g. incoming invoices or delivery notes for a single company a large language model is maybe a sort of overkill, since there is so much more specific information available about these documents and these documents are contained in a very small subset of all documents? A lot of specific informatin is not used. What AI model would you suggest in such a case?

noravonthenen · a week ago

Hi @thomas_mller13, no this service does not use LayoutLM but there are other algorithms based on layout that are being used. Here is a description of the underlying algorithms of the base edition: this “2-min of” video. The premium edition uses GPT in the background to determine all kinds of other values.

thomas_mller13 · a week ago

Thx

Sabarim_07 · 2 weeks ago

Hi,

If the document / image has text in it, then the data is extracting. That is working fine.
Whether the ingredient or quantity wont be determined from the picture which doesn't have any text?

Thanks

noravonthenen · a week ago

Hi @Sabarim_07, Yes the service we are using is for extracting text from documents (pdf or images) and identifying what the test is. So title and ingredients in our example. In business context that could be order number or customer or phone number, email and address, line items, total amount or currency and so on. Therefore, feeding only an image without text does not work with this service. What you are suggesting would be an image recognition and object detection task.

moh_ali_square · a week ago

Hi,

I got nice results. title of the recipe and the ingredients.

Venkat_Vyza · a week ago

Thank you for starting this AI Challenge @noravonthenen

Nagarajan-K · a week ago

@noravonthenen - Thanks for the challenge.

Here are my results. Pretty awesome. The Qty field did not detect the 1/2 and 1/4 in the image but rest was good.

Tried editing to make the model learn but I believe it was not able to detect this field. Tried converting the Quantity to String then it did not detect the qty but both qty and ingredient was extracted into the ingredient field.

Hira · a week ago

Hi @noravonthenen ,

I tired my all time favorite recipe. and successfully able to read all ingredients. I tired to read Steps as well, but in case of Line Items system will get confused.

Can we make sections above line-items just to differentiate data.

RAHUL1221 · a week ago

Hey @noravonthenen thank you for organizing this as this is really great stuff.Most important this is really simple to use i still remember i had to write entire so many lines of python code to get this done. Can wait to see more such simple to use tools.

______________________________________________________
1. Document that is uploaded for DOX(Homemade pizza).

___________________________________________________________
2. Image of result.

___________________________________________________
learnings - Produced best result if the wordings in image is simple and short. As all my items it was able to recognize.

____________________________________________________
- RAHUL1221

sainithesh21 · a week ago

Hi, Here are my results

JoshuaLaw · a week ago

It worked quite well!

CameronWilson · a week ago

Great toolset to use, especially for when more complex tasks are issued. Great documentation and easy to read. Will definitely use this for future projects.

My submission

Bharathi_K · a week ago

Clearly my recipe didn't have the item, quantity, uom separated. So, everything is taken into ingredient, which is cool.

Jordi_C · a week ago

Done!

Salma_M · a week ago

Hi, Here my Result.

Tried my best ,but not get 100% correct result, but looking forward to the next challenges to learn more about the AI services

xavisanse · a week ago

Last but not least 🙂 sorry for the delay! I'm a little bit disillusioned with the results. I tried first for Thermomix book without any results. I've thinked the maybe the engine with spanish couldn't be as much accurated as in english. So I look for a book with recipes in internet. Uploaded in a new schema 4 of them and putting them in a template and the unique improvement that I've seen is that from the second template they learnt the allergens. I'm pretty much sure that with more convencional formats will improve the result a lot. But maybe I expected a little bit more

noravonthenen · 15 hours ago

Hi @xavisanse Have you opened the line items and checked the result in there? On the screenshot it looks like it detected the ingredients.

MioYasutake · Thursday

My submission for week1.

emiliocampo · Sunday

In my case, the same thing happened as with @Alpesa1990 . I have entered a recipe in Spanish and the service doesn't differentiate well between the ingredient and the quantity.

martaseq · Monday

I got an almost perfect result!

Only information I was unable to extract was the temperature for preheating the oven, which is at the beginning of the instructions. Maybe because I put it as a header field? Maybe because my description was not complete enough?

Anyway, it is a spectacular tool nonetheless!

acmebcn · 7 hours ago

Is it too late to engage on this challenge? 😊

May Developer Challenge - SAP AI Services

Weekly Challenges

Week 1 Challenge – DOX UI

Additional information: