Skip to Content
Oct 18, 2014 at 12:40 AM

HANA Merge and Optimize Compression process



I'd love to know if anyone has any insight into MergeDog works. The best article I can find is 2 years old:…

What I understand is that when you load, you load into the delta store, which is columnar, but isn't sorted or compressed, so inserts are fast. This has a penalty of read performance, so you periodically merge into the main store. Easy so far.

There is a token process - defaulting to 2 tokens per table by default (parameter token_per_table). You can force this, by using:


This is supposed to use all available resources to merge, at the expense of system performance. In my system, it doesn't do this - instead using just 3 processes for the pre-merge check (which presumably evaluates which partitions/tables need merging) and then just one process for the merge itself. I have big tables, so the merge takes forever.

Now, some while after loading the tables, when the system is quiet, Mergedog wakes up and scans my tables again. It then goes and compresses the partitions using thread method "optimize compression". It is possible to force an optimize compression evaluation using:


I guessed that syntax, it isn't in any reference guide I could find. But it only causes an evaluation, and it won't run if you have high system load anyhow.

So does anyone understand how this thing works, how to force an optimize compression, how to get it to use more cores and finish faster? And whilst we're there... what does optimize compression actually do? Does it improve query performance in most cases and does it generally improve compression? Presumably this depends on the data in the table, and whether the change in entropy means a different compression technique would make a difference? Why is it needed? Surely when the merge process happens, it could happen any time since HANA builds a new main store anyhow, so it could easily recompress using a different algorithm during every merge?

My guess is it reads the statistics of the table and defines a compression algorithm for the dictionary and attribute vector (runlength, prefix etc.) and then recompresses the table using the most appropriate compression technique.

This is all incredibly clever and in 99% of cases it means you never need to touch the system, it is self-tuning and requires no maintenance. But there are extreme circumstances (like the one I'm in) where I really want to control this process!

Guessing about the only person who can answer this is @Lars Breddemann but would be fascinated by anyone who understands this process!