cancel
Showing results for 
Search instead for 
Did you mean: 

Smart Data Streaming - Push or Pull technology?

former_member258450
Participant
0 Kudos

I was trying to understand the similarities and differences between Apache Kafka and SAP HANA Smart Data Streaming. Kafka's topic subscribers, consumers make use of the "pull" methodology to request for messages whenever needed from Kafka's queue - this is done to avoid consumer bottlenecks at the subscriber's end.

What methodology does SAP Smart Data Streaming follow - Push or Pull? I read on a blog that it uses "push"? Is this right? How are consumer bottlenecks prevented in this scenario?

Accepted Solutions (1)

Accepted Solutions (1)

JWootton
Advisor
Advisor

First, let me say that HANA Smart Data Streaming (SDS) and Apache Kafka are designed for very different tasks and are often used together. Kafka is a message broker that delivers messages from publishers to subscribers. SDS is a streaming analytics engine that subscribes to message streams and applies "continuous queries" or rules to analyze, filter, transform or enrich the data, publishing derived results to configured destinations or subscribers. SDS comes with support for Kafka - both as an input and an output. So SDS can be configured to subscribe to Kafka topics, pulling new messages off the queue, and can also publish output/results to Kafka.

The output from SDS is "push". Is is sent (pushed) to all subscribers/destinations in real-time.

There are various mechanisms to deal with consumer bottlenecks (i.e. slow consumers):

  • Each subscriber has a buffer on each stream they are subscribing to - the buffer will smooth out small pacing issues but will fill up for a subscriber that simply can't keep up
  • Subscriptions can be configured to be "droppable" such that once the buffer fills up the connection will be dropped. Note that if the subscription was configured for "guaranteed delivery" the output messages will be queued until the subscriber re-connects. But this won't solve the problem of a perennially slow consumer
  • To deal with consumers that simply can't keep up, there are options for slowing down the data rate. You can conflate, sample, intervalize the data, etc - whatever fits the use case best
  • And of course finally, you can use a message broker (e.g. Kafka) between SDS and the consumer if you want

Also, it's worth noting that the SDS output connection to HANA supports micro-batching with configurable parameters, so you can achieve very high insert rates from SDS into the HANA database by doing bulk inserts rather than single row inserts.

former_member258450
Participant
0 Kudos

Very clear explanation. Thank you, Jeff !

Answers (1)

Answers (1)

bintu__
Explorer
0 Kudos

Hi jeff.wootton ,

Even I am facing Slow consumer issue while consuming messages via JSON Kafka adapter. Slow consumption resulted in lag in Source side.

I am using below mentioned parameters in my kafka Input adapter.

ATTACH INPUT ADAPTER i_kafka_1 TYPE toolkit_kafka_json_input

TO i_biz_stream

PROPERTIES

kafkaTopicPartition = 'EVENT1' ,

kafkaOffset = 0 ,

kafkaFromBeginning = FALSE ,

kafkaBootstrapServers = 'bk.0001.111' ,

kafkaGroupId = 'hanasds_consumer' ,

kafkaConsumerProperties = 'consum.properties' ,

kafkaPollInterval = 1 ,

jsonColumnMappingFile = '' ,

jsonColumnMappingXml = '' ,

jsonColsMappingList = 'topic,offset,value.content.eventId',

jsonRootpath = '' ,

additionalStreams = '' ,

gdMaxRetryTimes = 1 ,

jsonSecondDateFormat = '' ,

jsonMsDateFormat = '' ,

jsonTimeFormat = '' ,

jsonBigdatetimeFormat = '' ,

jsonCharsetName = '' ,

enableGdMode = TRUE ,

maxPubPoolSize = 100000 ,

useTransactions = FALSE ;

In the above comment , in 4th point you mentioned regarding controlling data rate. Could you please guide me how configure data rate. Is there any parameter change I need to do from my end. Please suggest. Thanks in advance.

RobertWaywell
Product and Topic Expert
Product and Topic Expert
0 Kudos

You need to post this as a new question.