Skip to Content
0

Smart Data Streaming - Push or Pull technology?

Oct 19, 2016 at 02:08 AM

109

avatar image

I was trying to understand the similarities and differences between Apache Kafka and SAP HANA Smart Data Streaming. Kafka's topic subscribers, consumers make use of the "pull" methodology to request for messages whenever needed from Kafka's queue - this is done to avoid consumer bottlenecks at the subscriber's end.

What methodology does SAP Smart Data Streaming follow - Push or Pull? I read on a blog that it uses "push"? Is this right? How are consumer bottlenecks prevented in this scenario?

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

1 Answer

Best Answer
Jeff Wootton
Oct 20, 2016 at 05:50 PM
2

First, let me say that HANA Smart Data Streaming (SDS) and Apache Kafka are designed for very different tasks and are often used together. Kafka is a message broker that delivers messages from publishers to subscribers. SDS is a streaming analytics engine that subscribes to message streams and applies "continuous queries" or rules to analyze, filter, transform or enrich the data, publishing derived results to configured destinations or subscribers. SDS comes with support for Kafka - both as an input and an output. So SDS can be configured to subscribe to Kafka topics, pulling new messages off the queue, and can also publish output/results to Kafka.

The output from SDS is "push". Is is sent (pushed) to all subscribers/destinations in real-time.

There are various mechanisms to deal with consumer bottlenecks (i.e. slow consumers):

  • Each subscriber has a buffer on each stream they are subscribing to - the buffer will smooth out small pacing issues but will fill up for a subscriber that simply can't keep up
  • Subscriptions can be configured to be "droppable" such that once the buffer fills up the connection will be dropped. Note that if the subscription was configured for "guaranteed delivery" the output messages will be queued until the subscriber re-connects. But this won't solve the problem of a perennially slow consumer
  • To deal with consumers that simply can't keep up, there are options for slowing down the data rate. You can conflate, sample, intervalize the data, etc - whatever fits the use case best
  • And of course finally, you can use a message broker (e.g. Kafka) between SDS and the consumer if you want

Also, it's worth noting that the SDS output connection to HANA supports micro-batching with configurable parameters, so you can achieve very high insert rates from SDS into the HANA database by doing bulk inserts rather than single row inserts.

Show 1 Share
10 |10000 characters needed characters left characters exceeded

Very clear explanation. Thank you, Jeff !

0