Skip to Content
avatar image
Former Member

Table partition in scale out architecture

Hello,

We have a huge table with 857M records which it's key is GUID (In our reports we never select records from this table with the key\GUID) . Our system is scale out of 4 indexservers - 512G each.

We partitioned the table with 2 levels:

  1. 1.HASH for the key/GUID with number of partitions equal to the number of hosts = 4 in our case.
  2. 13 partitions of range according to the periods.

The table before the partition:

The table after the partition:

Questions:

  1. Because the key has a key we were forced to partition according to it although we never use it, Will it improve the performance?
  2. The memory consumed by the table was grown significantly from 73G to 107G and also he size on disk. Why?
  3. After the partitione finished all the partition are located in one indexserver. I know I can move them but does it make sense? On the one hand the calculation will be parlalize between 4 indexservers on the other hand the traffic of data between the servers will be huge. Because the key of the table doesn’t have any meaning I guess that every select above this table will use all the 4 indexservers and have a lot of traffic.

What do you think?

Thanks,

Amir

Untitled.jpg (93.5 kB)
Untitled2.jpg (125.1 kB)
Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Best Answer
    Apr 26, 2014 at 12:04 AM

    Somehow I know this partitioning scheme...

    Ok, briefly to the questions:

    1. Because the key has a key we were forced to partition according to it although we never use it, Will it improve the performance?

    Why would it?

    And what performance are you talking about?

    Storing new data in the table?

    Updating existing data?

    Performing a delta merge?

    Selecting data?

    2. The memory consumed by the table was grown significantly from 73G to 107G and also he size on disk. Why?

    While the primary key column (your GUID) cannot be compressed anyway, most of the other columns very likely can and will be compressed.

    Now you copy the one single value catalog for each column 13 times and very likely it will contain the same set of values for each partition.

    On top of that, the additional compression on each column might be less effective as now the data is distributed in a different way in each column.

    3. After the partitione finished all the partition are located in one indexserver. I know I can move them but does it make sense? On the one hand the calculation will be parlalize between 4 indexservers on the other hand the traffic of data between the servers will be huge. Because the key of the table doesn’t have any meaning I guess that every select above this table will use all the 4 indexservers and have a lot of traffic.

    Do you plan to shuffle the majority of all the data over from the slave nodes? Why would you want to do that?

    Given the fact that you don't ever seem to access your data via the primary key (why are you using a expensive GUID then??) you could also simply go for round robin partitioning.

    Also, the 13 range partitions... 12 month + others, right?

    Really, before you even consider doing such a pattern, be absolutely sure about the request pattern of the reports you are going to run agains these tables.

    And, again, if you don't provide the HASH key when accessing this table, the range partitioning won't be evaluated. Or expressed otherwise: you won't get what you guess you'd get.

    - Lars

    Add comment
    10|10000 characters needed characters exceeded

    • Amir,

      HANA doesn't force you to do anything.

      If you want to use HASH partitioning to distribute the partitions over multiple hosts, that's your choice.

      Concerning point 3: I did not say that there is no point or reason to distribute the data.

      All I'm saying is that you don't want to transport a lot of data between the hosts.

      You want to aggregate first.

      To me the question is: what makes you believe that partitioning solves your problem at hand?

      Could it be sufficient to distribute tables over the different nodes without partitioning them?

      SAP BW powered by HANA is very reserved about partitioning of it's database objects, so I wouldn't rush into partitioning if I wouldn't know what exactly I get out of it and how to get it.

      - Lars