Hello guys,
I've been working with HANA for a little bit of time right now but I'm not sure about some of the quirks/ ways that HANA does stuff so I'd like to ask a questions (provide answer on how I see/ understand it) and get some feedback if my thinking is correctly or not.
1. How does insert works and when is locking happening?
As far as I know in traditional DB locks are put on a page - which simply means that if im trying to update two records (in to different sessions/ transactions) that are being stored in the same page then second transaction will wait for lock to being freed up (which might cause some delay), but in terms of HANA there is an additional data structure - Delta Storage.
Delta storage is basically the place where all inserts/ updates are stored for the time being and when this Delta storage grow up to some space content of Delta storage (optimized for Write operations) is being moved into the Main storage (optimized for Reads) and persisted on the disk.
Locks are happening only when we are trying to update the same ID record (one transaction getting access to that place in memory, while second one waiting for it to be freed up) and this is basically the only moment when locking is happening?
What are object locks? I've found that we have a lot of locks actually, most of them are Row Locks which seems to be really straightforward what is happening there but we also have some "Object Locks" on procedures and functions and I'm not really sure what's happening there.
2. How does partitioning a table influences performance?
For the time being I would like to consider Hash partitioning.
We are having a table that is partitioned by a Hash. Each partition act as a "separate table" which basically means that each partition got their own Delta Storage and Main Storage - this can improve inserts into the tables because theoretically if content of one Delta Storage (in partition 1) is being moved to Main Storage we still are able to write data to other partitions because they got their own Delta Storage. If we don't partition the table then everything lands in the same Delta Storage so performance of large inserts/updates is very heavily influenced by this merge operation because if content of Delta is being moved to Main Storage fot the time being all Write Operations are "halted"?
I've tested it - created two tables with the same schema, one with no partitions and one with 10partitions and inserted 500million rows to each of them (5 sessions 100million rows each). No surprise inserting to partitioned table was faster - around 4.2x faster to be precise.
However partitions come with a cost - they can heavily influence performance of Reads if we are not able to provide predicate in WHERE clause that points to columns used in hash. This simply mean that if I have a table with ID, FIRST_NAME and LAST_NAME where ID is primary key and is used in hash - then when I'm putting this ID in WHERE or JOIN everything is fine because HANA knows in which partition requested data is but when I'm doing search by LAST_NAME then I need to visit all partitions separately and union the result instead of going into one when there are no partitions.
3. Are triggers a good idea?
In our processing after two of the tables are being changed - there are inserts/updates going on there we are running a few triggers (between 7 and 12 - depends on the operation), this seems to impact performance by really a lot which seems to be clear because each operation (update/insert) is really 7+1 - 12+1 operations that need to happen in the background so it's "holding" the transaction to finish and that's why there is such a big delay before running another statement on the "main" tables?
4. Insert performance during big data loads.
Let's assume that I'm not able to partition my table so I'm not able to improve performance this way, the "simplest" thing I can do to improve inserts efficiency on non partitioned table is increasing Delta Storage space? By theory I will be able to have more data there in the same time = Merge Operation won't happen as that often and from documentation I can see this operation is the most time consuming one.
Would be really grateful for any hints/ tips/ confirmation/ critique and clarification.
Best regards,
Mat I