cancel
Showing results for 
Search instead for 
Did you mean: 

SDS Architecture

0 Kudos

I have been reading the SDS whitepaper and the developer guide of SDS. I have a basic understanding of how projects can be developed on SDS.

I am interested in understanding the underlying architecture of SDS. How actually SDS works that makes it possible to handle multiple incoming data at such high velocity.

Something like 'SanssouciDB paper by Hasso Plattner' which gives insights into how in-memory DB can have delta buffer, cold storage etc for improving performance.

Could somebody please guide me to the correct document that gives more insight into the SDS architecture?

Thanks

Accepted Solutions (1)

Accepted Solutions (1)

JWootton
Advisor
Advisor
0 Kudos

Hi Rusheel,

I'm afraid there isn't a published architecture paper on SDS - at least not that I'm aware of (if anyone else knows of one, speak up!).

But I'm happy to answer any specific questions.  And I can describe the architecture at a high level. While this no doubt stops short of the level of detail you were looking for,  it's a start.

Basic components:

1. Projects and the SDS "server":  each CCL project runs as a separate multi-threaded Linux process. When you deploy a project on the SDS cluster,  the cluster manager starts the project on one of the available nodes. Once the project is running, however, since it is a separate process, it runs independently of all other components (to some extent).

2. The SDS cluster manager normally runs as a peer network of two or more cluster managers to provide resiliency.  The cluster managers collectively provide services such as starting and stopping projects,  monitoring project status and restarting projects as necessary, providing URI name service and providing security services.

3. The WSP and SWS (providing REST and WebSocket services - interface to cluster and projects) run as separate processes under the management of the cluster managers.

4. Most adapters run as separate processes that connect to projects - the exceptions are the HANA, IQ and ASE output adapters that run in-process as part of a project

Connectivity:

- When an adapter or application connects to a project,  it first contacts the cluster manager with the URI of the stream/window it wants to connect to.  The URI is essentially:  workspace/project/stream.  It also provides credentials.  If the cluster manager "approves" the connection, then it is established

- but here's an important thing to realize:  the dataflow is direct between the client and the project - it does not go through the cluster managers.

Projects:

- Projects themselves are multi-threaded.  Each stream/window runs is a separate thread and all data is held in memory.  Even windows backed by logstores such that they recover state after a restart, still hold all data in memory.

- The multi-threading allows for significant parallel processing.  When a stream/window emits a record it is queued for processing by any/all downstream streams/windows.  And rather than being copied, a pointer is passed for added efficiency

- So processing within the project is entirely event-driven, using a dataflow model, with each thread pulling records off the queue, processing the record, and then putting any output records onto the output queues

0 Kudos

Thanks a lot Jeff. This helps.
I have few more doubts that I would like to get clarified

  1. As per what I understood, cluster manager is like the 'master'. The projects act as slaves? Also o these projects run independently of cluster manager once started, or the manager can still exercise some level of controls over them? If yes, could you please give a brief idea about them.
  2. In case of multiple nodes, does each project runs on a single node or the project and its data are replicated on multiple nodes at runtime. Although I doubt projects themselves are distributed, because you mentioned that there is direct data flow between projects and clients.
  3. To continue previous point, are SDS clusters distributed Shared Nothing architecture or shared architecture. That is, in case of multiple nodes, do cluster managers work together through some kind of 'metadata exchange' or by reading at some common memory location?
  4. In case of failure, cluster manager should restart the project. So how are managers notified about this. They periodically check the health of projects via something like 'pings' or a constant connection is maintained between manager and project or there is some other mechanism?.
  5. As per what I read in developer guide of SDS, each window/stream works as a separate thread and there can be multiple subscribers to each thread. But this thread can get blocked if any one of the subscribed process queue gets full. Is there any specific reason to implement it like this, because then one subscriber is capable of denying service to the rest.
  6. What would happen when there is an overflow of data. That is, assume that a window is created with 'KEEP' option for 24 hours. But the data in-flow at end of 23 hours exceeds the internal memory available at SDS cluster. In such a scenario, what actions would be taken.
  7. To extend the previous point, does increase in memory consumption due to 'KEEP' option creates any negative impact on performance? What I understand is that when events per second increase, performance decreases gracefully. But what happens when the events/second is constant, but memory get filled due to 'KEEP' modifier.

Sorry for so many questions. I would highly appreciate if you could please guide me in the correct direction.

Thanks a ton!

Best Regards

Rusheel

JWootton
Advisor
Advisor
0 Kudos

Good questions....

1. As per what I understood, cluster manager is like the 'master'. The projects act as slaves? Also o these projects run independently of cluster manager once started, or the manager can still exercise some level of controls over them? If yes, could you please give a brief idea about them.

Answer: Yes, that's sort of it.  But a caution in thinking of the projects as "slaves":  once they are started, they are completely independent Linux processes that will keep running, even if the SDS cluster went away.  In fact, if the cluster died, it wouldn't even affect existing client connections to the project (though new ones could not be established, because new connection requests go through the cluster manager).  So really the only control the cluster manager has is the ability to start and stop projects, and to "approve" incoming connection requests.

2. In case of multiple nodes, does each project runs on a single node or the project and its data are replicated on multiple nodes at runtime. Although I doubt projects themselves are distributed, because you mentioned that there is direct data flow between projects and clients.

Answer: a single project instance runs on a single node and all of its data is held in memory on that single node.  You can start multiple instances of a project (in different workspaces), and thus have different instances on different nodes, but the cluster treats those as separate projects and not separate instances of the same project.  The one exception to this is if you start a project in "HA" mode, in which the cluster manager starts 2 instances of the project - a primary and a secondary.

3. To continue previous point, are SDS clusters distributed Shared Nothing architecture or shared architecture. That is, in case of multiple nodes, do cluster managers work together through some kind of 'metadata exchange' or by reading at some common memory location?

Answer: in a multi-node cluster, the cluster managers (one running on each node) communicate in a peer network, and stay synchronized using messaging (not shared memory).  The projects arent' aware of each other and are essentially "shared nothing".  The exception to this being a project started in "HA" mode where there are 2 instances with a private communication channel the primary uses to keep the secondary in synch.

4. In case of failure, cluster manager should restart the project. So how are managers notified about this. They periodically check the health of projects via something like 'pings' or a constant connection is maintained between manager and project or there is some other mechanism?.

Answer:  The project instances send heartbeats to the cluster manager.  If/when the cluster manager detects a missing heartbeat it will restart the project. The heartbeat interval is configurable (though you wouldn't normally need to change it).

5. As per what I read in developer guide of SDS, each window/stream works as a separate thread and there can be multiple subscribers to each thread. But this thread can get blocked if any one of the subscribed process queue gets full. Is there any specific reason to implement it like this, because then one subscriber is capable of denying service to the rest.

Answer: you are correct, and if a downstream subscriber isn't keeping  up, then once the queue fills the publisher will be blocked until the subscriber catches up.  In an SDS project,  this as the queues fill up from the bottleneck back up to the input streams/windows this can result in a project not accepting new input until the queues start to empty.  There are ways to prevent this from happening, however:  for internal bottlenecks within the SDS project, you can use the CCL partitioning feature to create multiple parallel partitions (threads) to eliminate the bottleneck. For slow subscribers, this is why the SDK provides options on subscriptions to make them "droppable" or "lossy" - to cope with slow subscribers.  As for the reason this design was implemented:  as with all designs there are tradeoffs and this was felt to offer the best balance of low latency, high throughput, resiliency and versatility.

6. What would happen when there is an overflow of data. That is, assume that a window is created with 'KEEP' option for 24 hours. But the data in-flow at end of 23 hours exceeds the internal memory available at SDS cluster. In such a scenario, what actions would be taken.

Answer: if a project runs out of memory it will shut down (gracefully - hopefully - set project properties to ensure there is enough memory held in reserve for gracefull shutdown).  Thus you do need to take care that your project doesn't create windows that grow unbounded, or windows that have a KEEP policy that is incompatible with the amount of available memory. Here's more info on monitoring and managing memory usage.

7. To extend the previous point, does increase in memory consumption due to 'KEEP' option creates any negative impact on performance? What I understand is that when events per second increase, performance decreases gracefully. But what happens when the events/second is constant, but memory get filled due to 'KEEP' modifier.

Answer: larger windows can affect performance - it depends on what you are using the window for.  Simply having a large window doesn't affect performance,  but doing a lookup on a large window, or aggregating over a large window, will be slower than doing the same over a smaller window. When you say that as events per second increases, performance decreases gracefully,  that's not quite true:  as long as the system is sized properly such that it's not overloaded, then "performance" in terms of latency should remain relatively constant regardless of the message rates. Latency will grow if the system capacity is strained such that queues start to build.  Now in terms of how latency is affected if message rate is constant but windows grow:  again, that's down to how you are using the window and whether your window operations are "size sensitive" (e.g. best example would be an operation that iterates over all rows in a window will obviously suffer the most when windows get large). You might want to check out the Sizing Guide for more insight.

0 Kudos

Thanks again. You answered all of my questions to the point


Just a final question, although this is not directly with SDS architecture.

Are there any plans to make SDS distributed. Something like GFS, where cluster manager can start project on another node whenever size of window goes beyond some bounds. Finally a clubbed result would be sent to the subscribers. Or starting another project when incoming events rate increases beyond manage limits for single project.
Would that even be practically possible, while maintaining the latency to respectable limits; as inter node communication can be expensive.

JWootton
Advisor
Advisor
0 Kudos

Yes - this is in the roadmap.

Answers (0)