Differences in Insert Speed Colum Store vs. Row St...

Former Member · ‎08-19-2016

Hi,

I'm currently looking into the insert speeds into column- and row store using CSV files.

To keep it short the main points are:

Using CTL files for maximized performance
CSV files are present on hana machine
Using as many threads as I have CPU cores (12)
Disabled logging and delta logging for pure comparison of column and row store without being limited by disk

So now I expect the insert into the row table to be faster than the insert into the column table or, considering the delta storage, of the same speed as the insert into column table. And that is true at first glance, but when looking at the CPU load I notice, that when loading the csv into the row table, that the load is high (as expected) but low (15%), when loading into the column table. Also the insert speed into the row table is more than twice as fast as the insert into the column table, which seems to be quite much to me.

So I tried to further optimize the insert into the column table to reach the row store performance. Therefore I partitioned the column table (12 partitions, 12 threads, 12 cores) and voila: the CPU load gets up to around 100% and the speed is even higher than with the row table (even with merging delta into main storage after load). So my question is:

Why does the column table have to be partitioned in order to get high performance and to leverage the full CPU performance while the row table (which obviously can not be partitioned) is not that limited? Also: why is the insert into the partitioned column store even faster than the insert into the row store, where no compression, merge or further optimization in needed?

thank you very much!

EDIT:

could it be, that the massive insert is skipping L1-Delta, which is row-based and enter L2-Delta directly? And since L2-Delta is column-based, the insert speed is much slower (when not writing in parallel), then when inserting into the row-table directly? So that could explain, why the inserts into the column table are that much slower, than the inserts into the row table, with no partitioning. right? but it does not explain, why when using partitioned column table the speed is higher than with the pure row table. 😕

Former Member · ‎08-21-2016

Hello Moritz,

The shortest/simplified answer as to why inserts on partitioned tables is faster is that each partition is being worked upon independently. So if you have 12 partitions then 12 cores/threads will be working in parallel hence your seeing your CPU consumption. You can see this behaviour when you run "top -H" and you'll see the running threads.

I don't want to try and paraphrase the contents of a good article by on the subject so I'll point you to the article directly which will help you better understand what's going on and some best practices for data loads into HDB.

KR,

Amerjit

Differences in Insert Speed Colum Store vs. Row Store

Accepted Solutions (0)

Answers (1)

Answers (1)

Iterating through JSONModel with multiple nested a...

Re: Connecting SAP CAP with SAP Cloud ALM

How to use CopyProvider on Table created by TypeSc...

How to execute/fire onPost when user press enter o...

Re: Connecting SAP CAP with SAP Cloud ALM