cancel
Showing results for 
Search instead for 
Did you mean: 

SAP CPI - Forwarding Raw Image Data through Integration Flow

Hi Experts,

I have a use case where I am calling a CPI endpoint, passing in the URL of a hosted jpeg image (like this one: http://braiden.net/images/img.jpg) and then I make a GET request to that URL using a request-reply to get the raw image data into the message body.

I have successfully built this part out. Now that I have this raw image data in the message body, I would like to make a request-reply to the SAP Optical Character Recognition service (documented here).

This API requires that the request is formatted like this:

•Content-Type: multipart/form-data; boundary=CPI

•APIKey: [some-valid-api-key]

•Content-Length: [length-of-content]

  • Must contain a multipart form body, with a single field called "files" containing the image data

Because of the requirements of the request for this API, I have added a content modifier to remove all unnecessary headers (that may have been brought in from the first request) and add the required headers. (It seems that the "Content-Length" header is automatically added by CPI, and any attempts to change/delete this header do not work.)

Inside this content modifier, I also added the following in the body as an expression:

--CPI
Content-Disposition: form-data; name="files"; filename="img.jpg"
Content-Type: image/jpeg

${body}
--CPI--

This wraps the raw image data with the necessary boundary information required for the multipart/form-data content type. (This method of wrapping raw data was borrowed from this blog post.)

Here is a picture of my iflow:

Click here to download this iflow as a zip file.

One thing I have noticed is that when sending the request from Postman vs. making the request from CPI is that the value for the Content-Length header is drastically different, and as I have already mentioned, it seems there is no way to manually set this header to a different value.

I have also noticed that when viewing the raw HTTP request made by CPI, the image data is slightly different from the image data when sending it using a REST Client. There are a few characters missing every so often in the request that CPI makes.

Here are two examples of the requests, one being sent from a REST client, the other being sent from CPI. The request from the REST client works when sent to the OCR API, but the request from CPI does not (returns a 500 error).

REST Client Request

CPI Request

Again, the differences are the Content-Length header value and some missing characters from the CPI request.

Here are some questions:

1. Is using a request-reply to get the raw image data the proper way to store the image in the message body, or is there a better method?

2. How can I properly send this image to the OCR API? I have only been getting 500 errors from the OCR API server so far.

3. Should I be using a MIME Multipart encoder or script to modify encoding? If so, how?

TL;DR: I want to send an image to an optical character recognition API using CPI. Any ideas on what I could be doing wrong, and why my request is not working?

Any help would be greatly appreciated!

Accepted Solutions (1)

Accepted Solutions (1)

engswee
Active Contributor

Hi Braiden

You have a very interesting scenario 🙂

I do not have a straight answer but I would like to offer my suggestion based on analysis of what you have provided.

I compared the files you provided, and looked up the character differences in a hex editor (e.g. HxD). The request from the REST-Client have additional characters EF BF BD which is considered the Unicode replacement character (U+FFFD). I'm not sure why REST client has this while, CPI does not.

I would break up the scenario (if you haven't done so already) to ensure that the first part that retrieves the image from the URL works fine. After the request-reply, try and route it to a SFTP server to save the image file. Then download the file from SFTP, and try viewing the file to see if there are any corruptions. Compare it also against the original file (downloaded via browser) at byte level.

Regarding content length, I'm not sure if there is a way to override. I'll try to dig more into that and let you know if I find something.

Regards

Eng Swee

0 Kudos

Hi Eng Swee,

Thank you for your quick and thorough reply.

Yes, it is somewhat of a strange scenario! So to elaborate, I have a chatbot made with SAP Conversational AI, and I want to use CPI to handle the backend logic for the chatbot. Users can interact with the chatbot through mobile channels like Facebook Messenger. When the user uploads a photo, Facebook Messenger provides a link to the photo, hosted on Facebook's content distribution network. The chatbot then forwards this URL to CPI.

Anyway, I have done as you suggested and routed the message to an SFTP server right after the first request-reply. As a result, I received the image properly with no corruption and I was able to view it. I downloaded HxD, and both files appear to be identical:

Because of this, the problem does not appear to be with the first request-reply.

One other thing I tried was hardcoding the image data as a base64 string, and calling decodeBase64 on it to get the raw data and storing it in the message body using a groovy script. I then fed that message through the rest of the iflow, but when it comes time to make the request-reply to the OCR API, the request fails again with an internal server error. Do you happen to have any other suggestions?

Thank you for your support.

Regards,

Braiden Psiuk

engswee
Active Contributor

Hi braidenspencer

I downloaded your IFlow and had a look and my next "suspicion" would be at the Content Modifier. You are constructing the multipart body there, but my concern is that the evaluation of the ${body} line may undergo binary to string encoding which could potentially corrupt the content.

I'm going to try this out as well on my end to see how it goes.

Regards

Eng Swee

0 Kudos

Hi Eng Swee,

Thank you for your continued support.

You were correct! The Content-Modifier is definitely the culprit. Oddly enough, if you still include the Content-Modifier in the iflow, but you leave the body as just

${body}

the data passes through just fine as if the Content-Modifier wasn't there. This is great because I can still use a Content-Modifier to manipulate the headers without corrupting the data in the body. But when you introduce some other text it corrupts the image data, for instance:

${body}SomeText

(Verified these findings using the SFTP transfer method, and using HxD)

Do you know of a way to retrieve the body in the form of a byte array using a groovy script, and modify it to add the multipart body information while avoiding the conversion to a string?

If you think it would be beneficial, I can post this as a separate question here now that we have pinpointed the problem in the hopes that others will provide their input (since this question has probably lost some traction due to its age).

Thank You,

Braiden

engswee
Active Contributor

Hi braidenspencer

Since the image is in binary, you need to deal with the construction of the Multipart section in bytes to avoid conversion to String.

Please find below the Groovy script that will:-

1) Extract the image into bytes

2) Construct the multipart

3) Set the content type (you can remove the hardcode in Content Modifier)

4) Store the multipart into the message body

import com.sap.gateway.ip.core.customdev.util.Message

import javax.activation.DataHandler
import javax.mail.internet.ContentType
import javax.mail.internet.MimeBodyPart
import javax.mail.internet.MimeMultipart
import javax.mail.util.ByteArrayDataSource

Message processData(Message message) {
    byte[] bytes = message.getBody(byte[])
    //  Construct Multipart
    MimeBodyPart bodyPart = new MimeBodyPart()
    ByteArrayDataSource dataSource = new ByteArrayDataSource(bytes, 'image/jpeg')
    DataHandler byteDataHandler = new DataHandler(dataSource)
    bodyPart.setDataHandler(byteDataHandler)
    bodyPart.setFileName('img.jpg')
    bodyPart.setDisposition('form-data; name="files"')

    MimeMultipart multipart = new MimeMultipart()
    multipart.addBodyPart(bodyPart)

    // Set multipart into body
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream()
    multipart.writeTo(outputStream)
    message.setBody(outputStream)

    // Set Content type with boundary
    String boundary = (new ContentType(multipart.contentType)).getParameter('boundary');
    message.setHeader('Content-Type', "multipart/form-data; boundary=${boundary}")

    return message
}

I've tested this and it should work fine. Let me know how it goes.

Regards

Eng Swee

0 Kudos

Hi Eng Swee,

Your script works perfectly! I got a 200 back from the OCR server. Thank you very much!

But one last thing if you don't mind, could you explain the following lines? I am a bit confused as to why double quotes are required for the second parameter when setting the "Content-Type" header, and I'm not sure how the ${boundary} part works.

String boundary = (new ContentType(multipart.contentType)).getParameter('boundary');
message.setHeader('Content-Type', "multipart/form-data; boundary=${boundary}")

I will mark this as "answered" now. Thank you again for all of your help 🙂

Best,

Braiden Psiuk

engswee
Active Contributor

Hi braidenspencer

In Groovy, the use of double-quotes allows for interpolated expression within the string. This allows a more compact code (compared to Java) for writing Strings which contain dynamically populated variables, i.e. boundary in the case above. Such double-quoted strings are called GStrings in Groovy - it's really named that! 🙂

Since this thread is answered now, can you please also close the thread?

Lastly, once you have your end-to-end scenario built-up and working, would be really great if you could write a blog post about it. It is a very interesting use case and truly showcases the integration capability in CPI, and also would be a good reference material for others who are looking into such scenarios in the future.

Regards

Eng Swee

vijay3773
Explorer
0 Kudos

Thank you braidenspencer for posting the query and thank you for the solution engswee.yeohi've tried using the same script but some how the form-data isn't being fomred as expected as a result i'm facing 400 Bad Request error saying request is missing media type.
I'm just trying to post a csv file through the form-data files to a HTTP end point as shown below, can you please advise what i'm missing here.


Sincere RegardsVijay

Answers (7)

Answers (7)

pharswan_sunil
Explorer

Thank you engswee.yeoh for the script.

The auto generated boundary has '=', which cause message failure.

Example -> "------=_Part_16_497424239.1620649784325"

Workaround code:

import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;
def Message processData(Message message) {
    byte[] fileContent = message.getBody(byte[]);
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    String fileName = message.getProperty("fileName")
    String formDataPart1='----cpi\r\nContent-Disposition: form-data; name="cmisaction"\r\n\r\ncreateDocument\r\n----cpi\r\nContent-Disposition: form-data; name="media"; filename="'+fileName+'"\r\n\r\n'
    String formDataPart2='\r\n----cpi\r\nContent-Disposition: form-data; name="propertyValue[0]"\r\n\r\n'+fileName+'\r\n----cpi\r\nContent-Disposition: form-data; name="propertyId[0]"\r\n\r\ncmis:name\r\n----cpi\r\nContent-Disposition: form-data; name="propertyId[1]"\r\n\r\ncmis:objectTypeId\r\n----cpi\r\nContent-Disposition: form-data; name="propertyValue[1]"\r\n\r\ncmis:document\r\n----cpi\r\nContent-Disposition: form-data; name="succinct"\r\n\r\ntrue\r\n----cpi\r\nContent-Disposition: form-data; name="filename"\r\n\r\n'+fileName+'\r\n----cpi\r\nContent-Disposition: form-data; name="_charset"\r\n\r\nUTF-8\r\n----cpi\r\nContent-Disposition: form-data; name="includedAllowableActions"\r\n\r\ntrue\r\n----cpi--\r\n'
    byte[]  formDataPart1Bytes = formDataPart1.getBytes();
    byte[]  formDataPart2Bytes = formDataPart2.getBytes();
    outputStream.write(formDataPart1Bytes);
    outputStream.write(fileContent);
    outputStream.write(formDataPart2Bytes);
    message.setBody(outputStream);
    message.setHeader("Content-Type","multipart/form-data; boundary=--cpi");
    return message;
}

Regards,

Sunil

yannickhorstmann
Discoverer

You, Sir, are a legend! This snippet helped me so much! This helped me to create a multipart-body to upload a binary-file to an AWS S3 bucket.

In my case the http-endpoint requires the http-header "Content-Length". Since you can't manually add it and the http-adapter won't generate it using a ByteArrayOutputStream, change it to

message.setBody(outputStream.toByteArray())

to make it work.

Thanks a lot!

Yannick

Hi Sunil,

thanks for posting the code. it really helped me solving an issue with my iflow where I have to send a zip file as part of form data

pharswan_sunil
Explorer
0 Kudos

Hi yannickhorstmann bhattasankar ,

It is great to hear that the code snippet helped.

Cheers.

0 Kudos

Hello Sunil,

I have used the same code. could you please help me how to send the same file to HTTP as an attachment. my requirement also pick zip file from SFTP folder and send it to HTTP as an attachment.

Regards,

Janardhan

former_member198979
Participant

Thanks engswee.yeoh for the script. I was stuck in this part for quite sometime and couldn't complete my POC in past few month back due to issue in handling of multipart/form body for the API request.

Cheers!

Chandan

engswee
Active Contributor
0 Kudos

Great to hear that. Looks like I killed two birds with one stone 😉

0 Kudos

Hi Eng Swee Yeoh, I am using your script but I am getting a 415 error.

The error is

"status":415,"error":"Unsupported Media Type","message":"Invalid mime type \"multipart/form-data; boundary=----=_Part_12_1515749279.1587462416702\": Invalid token character '=' in token \"----

Is there a way to escape the = while adding it as part of header?

ErikDM
Explorer
0 Kudos

Hi,

engswee.yeoh , I tried your script, but unfortunately the receiver has problems with the auto-generated boundary format:
"------=_Part_16_497424239.1620649784325"

When testing with Postman it worked perfect with another boundary format:
"----------------------------874363087126357024911758"

Is there any way/possibility to generate or change a custom boundary?

Thanks

tverbeec
Explorer
0 Kudos

User message.setHeader('Content-Type', "multipart/form-data; boundary=\"${boundary}\"")

rafaeltsouza84
Explorer
0 Kudos

engswee.yeoh , could you share the script or let us know where to find it? We are facing a very similar issue in our project.

engswee
Active Contributor
0 Kudos

The script is already in this thread. You need to expand the full conversation on the accepted answer.

rafaeltsouza84
Explorer
0 Kudos

engswee.yeoh , thanks! Now I see the code!

At our project we are using a similar version of your code to send several files to a content server but when we do the multipart via groovy it seems it's corrupting the pdf data. I tried writing to the body with several different options and it always corrupts the content of the PDF. Any ideas?

message.setBody(outputStream)

message.setBody(outputStream.toString(StandardCharsets.UTF_8 as String))

message.setBody(outputStream.toByteArray())

We are using the ByteArrayDataSource dataSource = new ByteArrayDataSource(bytes, 'application/octet-stream') differently to handle the different attachment options. Basically we would have to handle anything that comes

0 Kudos

Hi Rafael Tadeu(@rafaeltsouza84). Were you able to achieve your use case. I also have similar kind of requirement. Can you please provide your input.

rafaeltsouza84
Explorer
0 Kudos

shameer.shaik , yes. It seems the HTTPS adapter doesn't support this type of payload and it corrupts the attachments (at least as far as the API we are calling). We tried a number of different way and the only way we were able to solve it temporarily is by using a groovy script that does the http post. We will later replace this with a custom adapter.

0 Kudos

Thanks Rafael ,HTTP calls from groovy is not recommended. It would be good if SAP comes up with some fix/tweak on standard connector.

However as temporary workaround, Could you please share script if you have it handy. Thank you so much!

JuanDK78
Participant
0 Kudos

Hej Everyone,

If you are having issues in CPI when using /api/v2/image/ocr returning error: "message": "Wrong request: No file sent in the request, please set the field 'files'"

If turns out that CPI HTTP adapter is sending the payload with "Transfer-Encoding: chunked"

If I trace the message and use the created Content-Type and body by Eng Swee's code, the boundaries are correct and the payload is correct, It works from POSTMAN when Posting RAW

I have not figured out if is possibkle to disable the chuked encoding, like the SOAP adapter, or how to generated chucked content.

0 Kudos

Hi engswee.yeoh,

I also have same scenario and I have used your code to post the attachment from CPI to Third party application.

While trying to post the attachment, I am getting "File is required" error. But I could post the attachment using Postman.

I have tried many options. But no luck.

Could you please help me on this.

Regards,

Vijay

gaetanbroutin
Explorer
0 Kudos

Thanks engswee.yeoh !!! I lost hours trying to find the problem why I could not open my PDF and it worked after 5 minutes trying with your code. Soooo good. Thanks!

Too bad the solution is lost in the comments. It was not that easy to find with google..