cancel
Showing results for 
Search instead for 
Did you mean: 

SAP CPI - Forwarding Raw Image Data through Integration Flow

Hi Experts,

I have a use case where I am calling a CPI endpoint, passing in the URL of a hosted jpeg image (like this one: http://braiden.net/images/img.jpg) and then I make a GET request to that URL using a request-reply to get the raw image data into the message body.

I have successfully built this part out. Now that I have this raw image data in the message body, I would like to make a request-reply to the SAP Optical Character Recognition service (documented here).

This API requires that the request is formatted like this:

•Content-Type: multipart/form-data; boundary=CPI

•APIKey: [some-valid-api-key]

•Content-Length: [length-of-content]

  • Must contain a multipart form body, with a single field called "files" containing the image data

Because of the requirements of the request for this API, I have added a content modifier to remove all unnecessary headers (that may have been brought in from the first request) and add the required headers. (It seems that the "Content-Length" header is automatically added by CPI, and any attempts to change/delete this header do not work.)

Inside this content modifier, I also added the following in the body as an expression:

--CPI
Content-Disposition: form-data; name="files"; filename="img.jpg"
Content-Type: image/jpeg

${body}
--CPI--

This wraps the raw image data with the necessary boundary information required for the multipart/form-data content type. (This method of wrapping raw data was borrowed from this blog post.)

Here is a picture of my iflow:

Click here to download this iflow as a zip file.

One thing I have noticed is that when sending the request from Postman vs. making the request from CPI is that the value for the Content-Length header is drastically different, and as I have already mentioned, it seems there is no way to manually set this header to a different value.

I have also noticed that when viewing the raw HTTP request made by CPI, the image data is slightly different from the image data when sending it using a REST Client. There are a few characters missing every so often in the request that CPI makes.

Here are two examples of the requests, one being sent from a REST client, the other being sent from CPI. The request from the REST client works when sent to the OCR API, but the request from CPI does not (returns a 500 error).

REST Client Request

CPI Request

Again, the differences are the Content-Length header value and some missing characters from the CPI request.

Here are some questions:

1. Is using a request-reply to get the raw image data the proper way to store the image in the message body, or is there a better method?

2. How can I properly send this image to the OCR API? I have only been getting 500 errors from the OCR API server so far.

3. Should I be using a MIME Multipart encoder or script to modify encoding? If so, how?

TL;DR: I want to send an image to an optical character recognition API using CPI. Any ideas on what I could be doing wrong, and why my request is not working?

Any help would be greatly appreciated!

View Entire Topic
engswee
Active Contributor

Hi Braiden

You have a very interesting scenario 🙂

I do not have a straight answer but I would like to offer my suggestion based on analysis of what you have provided.

I compared the files you provided, and looked up the character differences in a hex editor (e.g. HxD). The request from the REST-Client have additional characters EF BF BD which is considered the Unicode replacement character (U+FFFD). I'm not sure why REST client has this while, CPI does not.

I would break up the scenario (if you haven't done so already) to ensure that the first part that retrieves the image from the URL works fine. After the request-reply, try and route it to a SFTP server to save the image file. Then download the file from SFTP, and try viewing the file to see if there are any corruptions. Compare it also against the original file (downloaded via browser) at byte level.

Regarding content length, I'm not sure if there is a way to override. I'll try to dig more into that and let you know if I find something.

Regards

Eng Swee

0 Kudos

Hi Eng Swee,

Thank you for your quick and thorough reply.

Yes, it is somewhat of a strange scenario! So to elaborate, I have a chatbot made with SAP Conversational AI, and I want to use CPI to handle the backend logic for the chatbot. Users can interact with the chatbot through mobile channels like Facebook Messenger. When the user uploads a photo, Facebook Messenger provides a link to the photo, hosted on Facebook's content distribution network. The chatbot then forwards this URL to CPI.

Anyway, I have done as you suggested and routed the message to an SFTP server right after the first request-reply. As a result, I received the image properly with no corruption and I was able to view it. I downloaded HxD, and both files appear to be identical:

Because of this, the problem does not appear to be with the first request-reply.

One other thing I tried was hardcoding the image data as a base64 string, and calling decodeBase64 on it to get the raw data and storing it in the message body using a groovy script. I then fed that message through the rest of the iflow, but when it comes time to make the request-reply to the OCR API, the request fails again with an internal server error. Do you happen to have any other suggestions?

Thank you for your support.

Regards,

Braiden Psiuk

engswee
Active Contributor

Hi braidenspencer

I downloaded your IFlow and had a look and my next "suspicion" would be at the Content Modifier. You are constructing the multipart body there, but my concern is that the evaluation of the ${body} line may undergo binary to string encoding which could potentially corrupt the content.

I'm going to try this out as well on my end to see how it goes.

Regards

Eng Swee

0 Kudos

Hi Eng Swee,

Thank you for your continued support.

You were correct! The Content-Modifier is definitely the culprit. Oddly enough, if you still include the Content-Modifier in the iflow, but you leave the body as just

${body}

the data passes through just fine as if the Content-Modifier wasn't there. This is great because I can still use a Content-Modifier to manipulate the headers without corrupting the data in the body. But when you introduce some other text it corrupts the image data, for instance:

${body}SomeText

(Verified these findings using the SFTP transfer method, and using HxD)

Do you know of a way to retrieve the body in the form of a byte array using a groovy script, and modify it to add the multipart body information while avoiding the conversion to a string?

If you think it would be beneficial, I can post this as a separate question here now that we have pinpointed the problem in the hopes that others will provide their input (since this question has probably lost some traction due to its age).

Thank You,

Braiden

engswee
Active Contributor

Hi braidenspencer

Since the image is in binary, you need to deal with the construction of the Multipart section in bytes to avoid conversion to String.

Please find below the Groovy script that will:-

1) Extract the image into bytes

2) Construct the multipart

3) Set the content type (you can remove the hardcode in Content Modifier)

4) Store the multipart into the message body

import com.sap.gateway.ip.core.customdev.util.Message

import javax.activation.DataHandler
import javax.mail.internet.ContentType
import javax.mail.internet.MimeBodyPart
import javax.mail.internet.MimeMultipart
import javax.mail.util.ByteArrayDataSource

Message processData(Message message) {
    byte[] bytes = message.getBody(byte[])
    //  Construct Multipart
    MimeBodyPart bodyPart = new MimeBodyPart()
    ByteArrayDataSource dataSource = new ByteArrayDataSource(bytes, 'image/jpeg')
    DataHandler byteDataHandler = new DataHandler(dataSource)
    bodyPart.setDataHandler(byteDataHandler)
    bodyPart.setFileName('img.jpg')
    bodyPart.setDisposition('form-data; name="files"')

    MimeMultipart multipart = new MimeMultipart()
    multipart.addBodyPart(bodyPart)

    // Set multipart into body
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream()
    multipart.writeTo(outputStream)
    message.setBody(outputStream)

    // Set Content type with boundary
    String boundary = (new ContentType(multipart.contentType)).getParameter('boundary');
    message.setHeader('Content-Type', "multipart/form-data; boundary=${boundary}")

    return message
}

I've tested this and it should work fine. Let me know how it goes.

Regards

Eng Swee

0 Kudos

Hi Eng Swee,

Your script works perfectly! I got a 200 back from the OCR server. Thank you very much!

But one last thing if you don't mind, could you explain the following lines? I am a bit confused as to why double quotes are required for the second parameter when setting the "Content-Type" header, and I'm not sure how the ${boundary} part works.

String boundary = (new ContentType(multipart.contentType)).getParameter('boundary');
message.setHeader('Content-Type', "multipart/form-data; boundary=${boundary}")

I will mark this as "answered" now. Thank you again for all of your help 🙂

Best,

Braiden Psiuk

engswee
Active Contributor

Hi braidenspencer

In Groovy, the use of double-quotes allows for interpolated expression within the string. This allows a more compact code (compared to Java) for writing Strings which contain dynamically populated variables, i.e. boundary in the case above. Such double-quoted strings are called GStrings in Groovy - it's really named that! 🙂

Since this thread is answered now, can you please also close the thread?

Lastly, once you have your end-to-end scenario built-up and working, would be really great if you could write a blog post about it. It is a very interesting use case and truly showcases the integration capability in CPI, and also would be a good reference material for others who are looking into such scenarios in the future.

Regards

Eng Swee

vijay3773
Explorer
0 Kudos

Thank you braidenspencer for posting the query and thank you for the solution engswee.yeohi've tried using the same script but some how the form-data isn't being fomred as expected as a result i'm facing 400 Bad Request error saying request is missing media type.
I'm just trying to post a csv file through the form-data files to a HTTP end point as shown below, can you please advise what i'm missing here.


Sincere RegardsVijay