Solved: SAP HANA Partitioning

matma24 · ‎04-18-2021

Hello guys,

I've been working with HANA for a little bit of time right now but I'm not sure about some of the quirks/ ways that HANA does stuff so I'd like to ask a questions (provide answer on how I see/ understand it) and get some feedback if my thinking is correctly or not.

1. How does insert works and when is locking happening?

As far as I know in traditional DB locks are put on a page - which simply means that if im trying to update two records (in to different sessions/ transactions) that are being stored in the same page then second transaction will wait for lock to being freed up (which might cause some delay), but in terms of HANA there is an additional data structure - Delta Storage.

Delta storage is basically the place where all inserts/ updates are stored for the time being and when this Delta storage grow up to some space content of Delta storage (optimized for Write operations) is being moved into the Main storage (optimized for Reads) and persisted on the disk.

Locks are happening only when we are trying to update the same ID record (one transaction getting access to that place in memory, while second one waiting for it to be freed up) and this is basically the only moment when locking is happening?

What are object locks? I've found that we have a lot of locks actually, most of them are Row Locks which seems to be really straightforward what is happening there but we also have some "Object Locks" on procedures and functions and I'm not really sure what's happening there.

2. How does partitioning a table influences performance?

For the time being I would like to consider Hash partitioning.

We are having a table that is partitioned by a Hash. Each partition act as a "separate table" which basically means that each partition got their own Delta Storage and Main Storage - this can improve inserts into the tables because theoretically if content of one Delta Storage (in partition 1) is being moved to Main Storage we still are able to write data to other partitions because they got their own Delta Storage. If we don't partition the table then everything lands in the same Delta Storage so performance of large inserts/updates is very heavily influenced by this merge operation because if content of Delta is being moved to Main Storage fot the time being all Write Operations are "halted"?

I've tested it - created two tables with the same schema, one with no partitions and one with 10partitions and inserted 500million rows to each of them (5 sessions 100million rows each). No surprise inserting to partitioned table was faster - around 4.2x faster to be precise.

However partitions come with a cost - they can heavily influence performance of Reads if we are not able to provide predicate in WHERE clause that points to columns used in hash. This simply mean that if I have a table with ID, FIRST_NAME and LAST_NAME where ID is primary key and is used in hash - then when I'm putting this ID in WHERE or JOIN everything is fine because HANA knows in which partition requested data is but when I'm doing search by LAST_NAME then I need to visit all partitions separately and union the result instead of going into one when there are no partitions.

3. Are triggers a good idea?

In our processing after two of the tables are being changed - there are inserts/updates going on there we are running a few triggers (between 7 and 12 - depends on the operation), this seems to impact performance by really a lot which seems to be clear because each operation (update/insert) is really 7+1 - 12+1 operations that need to happen in the background so it's "holding" the transaction to finish and that's why there is such a big delay before running another statement on the "main" tables?

4. Insert performance during big data loads.

Let's assume that I'm not able to partition my table so I'm not able to improve performance this way, the "simplest" thing I can do to improve inserts efficiency on non partitioned table is increasing Delta Storage space? By theory I will be able to have more data there in the same time = Merge Operation won't happen as that often and from documentation I can see this operation is the most time consuming one.

Would be really grateful for any hints/ tips/ confirmation/ critique and clarification.

Best regards,

Mat I

lbreddemann · ‎04-18-2021

Oh, gee, just one little question there, right? 🙂

Your questions cover a wide range of DBMS implementation topics.
In order to get a good understanding of how and possibly why HANA implements things in one way or another, I recommend getting familiar with the fundamentals. In "How to know all these things" I linked a few sources, that I found very helpful. I especially want to point out the OpenHPI course "In-Memory Data Management" as well as "The Red Book", both of which are available for free.

Now, let's see what I can add to your questions.

1. How does insert works and when is locking happening?

Locking is a mechanism to make sure that access to shared resources (records, DB objects like tables or views, storage structures, memory area, etc.) happens in an orderly fashion. That is to say: we want to be able to know in advance, what state the resource will be in after a series of actions. This is where the term "transaction protocol" comes into play as the playbook for how to share "stuff" in the system among independent processes/actors/sessions.

Locks are used in this context to indicate which changes are possible by which party at what point in time.

This in place, let's see what resources may need coordination in HANA:

- records? Sure, we want ACID transaction support for our SQL transactions, so records accesses need to be coordinated. These have locks.

- Columns? Tables? Views? Yes, again, as we don't want to allow things like changing the column definition while another session is still using the data. So, here are our object locks.

- Storage structures? You mentioned Delta Storage which is one of the HANA column store data structures that is used to capture changed data. This structure is a lock-less append-only data structure for the user - which is the SQL session running an UPDATE/INSERT/DELETE, etc. This means, for the action of adding a change record into this delta storage, no additional lock needs to be acquired.
However, this data structure stops being available for sessions, when a DELTA MERGE occurs. In this scenario, a second delta store structure gets used going forward and the old one is only read for the delta merge process itself.

There are plenty of other locking mechanisms in place, e.g. mutexes or semaphores/critical regions mostly for even more HANA internal processes, like memory management.

2. How does partitioning a table influences performance?

Wow, what a simple question ... not really.

One way to answer would of course be "a lot" and leave it at that.

The main driving factors for the performance of data change operations in column store tables are

- available CPUs

- available table partitions

- commit frequency

- delta merge frequency

Savepoints that can/will occur during the data change operations can also have a big impact, but for the sake of keeping this simpler, let's go with these four factors.

As this question is not new, there does exist a blog post about this: Merging right. Some tests with data loading in HANA from Down Under.

The fact that there is a single delta store for several session is less critical (the sessions are not waiting for a "delta store lock") than that the memory structures for this delta store "belong" to a single NUMA node and that the different threads trying to access it face scheduling by the operating system.

When a delta merge happens, as explained in my blog post, the data changing can continue, since there is a secondary delta log structure that is put in place for this reason. With the caveat that only new change transactions get assigned the new delta log. Committing data changes in between, can help with performance and resource consumption.

Read cost of partitions

You summed up one of the potential drawbacks of splitting tables into partitions well. If I have four files in which I store my information, and no dictionary for the information, then I have to go through every file. That's right.

This is, partly, what makes designing a partitioning scheme complex. Write patterns in DBMS are typically much better understood than read-pattern. So, commonly, finding the best compromise between all writers and readers becomes an iterative process.

3. Are triggers a good idea?

In my opinion: No!

Typically, triggers are thought of as some form of event-based business logic driver. But the data in the tables and the events the triggers react to are far too low level to represent actual business events well.

But not only are triggers a poor development artifact (how would you test and debug a series of triggers in a production environment with thousands of active sessions?), they tend to be very non-obvious.

When interacting with the database, there is nothing telling you which triggers you, well, trigger when performing any activity. They are like black-magic-f**ckery happening in the background with no obvious indication of their involvement. You update table A and tables B, C, D, and Z get updated/deleted/loaded into memory or copied. Why? You cannot know unless you explicitly look for triggers on the tables you just touched.

How would you do that? The standard HANA tools don't easily show triggers and in tools like DBeaver you have to go and open the table definition to learn if there are any triggers involved.

Are there certain scenarios that can reasonably be built with triggers? I'd say, logging data changes or versioning data cover the range of scenarios fairly well.

Again - this is my opinion; If you like triggers, fine with me.

4. Insert performance during big data loads.

Nope, "increasing delta storage" is not an option you have. HANA manages the delta store for each column/table automatically. What you can do is to influence under which conditions a delta merge occurs. Details on that are in the already mentioned blog post "Merging right".

There you go, now you know.

Lars

SAP HANA Partitioning

Accepted Solutions (1)

Accepted Solutions (1)

Answers (0)

Re: SSO using OKTA - launches FLP twice

Re: SSO using OKTA - launches FLP twice

SSO using OKTA - launches FLP twice

Re: SAP API Management and Target Endpoint OAuth2....

Recommended approach for Fiori Adapt UI in S/4H On...