10-19-2016 3:08 AM - edited 02-03-2024 7:53 PM
I was trying to understand the similarities and differences between Apache Kafka and SAP HANA Smart Data Streaming. Kafka's topic subscribers, consumers make use of the "pull" methodology to request for messages whenever needed from Kafka's queue - this is done to avoid consumer bottlenecks at the subscriber's end.
What methodology does SAP Smart Data Streaming follow - Push or Pull? I read on a blog that it uses "push"? Is this right? How are consumer bottlenecks prevented in this scenario?
First, let me say that HANA Smart Data Streaming (SDS) and Apache Kafka are designed for very different tasks and are often used together. Kafka is a message broker that delivers messages from publishers to subscribers. SDS is a streaming analytics engine that subscribes to message streams and applies "continuous queries" or rules to analyze, filter, transform or enrich the data, publishing derived results to configured destinations or subscribers. SDS comes with support for Kafka - both as an input and an output. So SDS can be configured to subscribe to Kafka topics, pulling new messages off the queue, and can also publish output/results to Kafka.
The output from SDS is "push". Is is sent (pushed) to all subscribers/destinations in real-time.
There are various mechanisms to deal with consumer bottlenecks (i.e. slow consumers):
Also, it's worth noting that the SDS output connection to HANA supports micro-batching with configurable parameters, so you can achieve very high insert rates from SDS into the HANA database by doing bulk inserts rather than single row inserts.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi jeff.wootton ,
Even I am facing Slow consumer issue while consuming messages via JSON Kafka adapter. Slow consumption resulted in lag in Source side.
I am using below mentioned parameters in my kafka Input adapter.
ATTACH INPUT ADAPTER i_kafka_1 TYPE toolkit_kafka_json_input
TO i_biz_stream
PROPERTIES
kafkaTopicPartition = 'EVENT1' ,
kafkaOffset = 0 ,
kafkaFromBeginning = FALSE ,
kafkaBootstrapServers = 'bk.0001.111' ,
kafkaGroupId = 'hanasds_consumer' ,
kafkaConsumerProperties = 'consum.properties' ,
kafkaPollInterval = 1 ,
jsonColumnMappingFile = '' ,
jsonColumnMappingXml = '' ,
jsonColsMappingList = 'topic,offset,value.content.eventId',
jsonRootpath = '' ,
additionalStreams = '' ,
gdMaxRetryTimes = 1 ,
jsonSecondDateFormat = '' ,
jsonMsDateFormat = '' ,
jsonTimeFormat = '' ,
jsonBigdatetimeFormat = '' ,
jsonCharsetName = '' ,
enableGdMode = TRUE ,
maxPubPoolSize = 100000 ,
useTransactions = FALSE ;
In the above comment , in 4th point you mentioned regarding controlling data rate. Could you please guide me how configure data rate. Is there any parameter change I need to do from my end. Please suggest. Thanks in advance.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
84 | |
10 | |
9 | |
8 | |
6 | |
6 | |
6 | |
5 | |
3 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.