Skip to Content

SDS Architecture

I have been reading the SDS whitepaper and the developer guide of SDS. I have a basic understanding of how projects can be developed on SDS.

I am interested in understanding the underlying architecture of SDS. How actually SDS works that makes it possible to handle multiple incoming data at such high velocity.

Something like 'SanssouciDB paper by Hasso Plattner' which gives insights into how in-memory DB can have delta buffer, cold storage etc for improving performance.

Could somebody please guide me to the correct document that gives more insight into the SDS architecture?

Thanks

Add a comment
10|10000 characters needed characters exceeded

Related questions

1 Answer

  • Best Answer
    Posted on Jun 27, 2016 at 04:04 PM

    Hi Rusheel,

    I'm afraid there isn't a published architecture paper on SDS - at least not that I'm aware of (if anyone else knows of one, speak up!).

    But I'm happy to answer any specific questions. And I can describe the architecture at a high level. While this no doubt stops short of the level of detail you were looking for, it's a start.

    Basic components:

    1. Projects and the SDS "server": each CCL project runs as a separate multi-threaded Linux process. When you deploy a project on the SDS cluster, the cluster manager starts the project on one of the available nodes. Once the project is running, however, since it is a separate process, it runs independently of all other components (to some extent).

    2. The SDS cluster manager normally runs as a peer network of two or more cluster managers to provide resiliency. The cluster managers collectively provide services such as starting and stopping projects, monitoring project status and restarting projects as necessary, providing URI name service and providing security services.

    3. The WSP and SWS (providing REST and WebSocket services - interface to cluster and projects) run as separate processes under the management of the cluster managers.

    4. Most adapters run as separate processes that connect to projects - the exceptions are the HANA, IQ and ASE output adapters that run in-process as part of a project

    Connectivity:

    - When an adapter or application connects to a project, it first contacts the cluster manager with the URI of the stream/window it wants to connect to. The URI is essentially: workspace/project/stream. It also provides credentials. If the cluster manager "approves" the connection, then it is established

    - but here's an important thing to realize: the dataflow is direct between the client and the project - it does not go through the cluster managers.

    Projects:

    - Projects themselves are multi-threaded. Each stream/window runs is a separate thread and all data is held in memory. Even windows backed by logstores such that they recover state after a restart, still hold all data in memory.

    - The multi-threading allows for significant parallel processing. When a stream/window emits a record it is queued for processing by any/all downstream streams/windows. And rather than being copied, a pointer is passed for added efficiency

    - So processing within the project is entirely event-driven, using a dataflow model, with each thread pulling records off the queue, processing the record, and then putting any output records onto the output queues

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.