Hi Surya,
your question is a really very deep question...and it's not easy to give a best answer because there is no magical formula that can assure an optimum star schema from a performance point of view....
The fact table and dimension tables are both relational database tables.
These tables are linked to one another thanks to identification numbers (dimension IDs), which are in the key part of the particular database table.
The characteristics determine the granularity (the degree of detail) at which the key figures are kept in the InfoCube.
Characteristics that logically belong together (district and area, for example, belong to the regional dimension) are grouped together in a dimension.
If you follow this design criterion, dimensions are to a large extent independent of each other, and dimension tables remain small with regards to data volume, which is desirable for reasons of performance. This InfoCube structure is optimized for Reporting.
But this is not always true.
Dimension tables should have a 'relatively' small number of rows (in comparison to the Fact Table: factor at least 1:10 until 1:20).
If the dimension table is approaching the same size as the fact table, the dimension should be selected as a line item (you have to flag it in dimension management inside your cube).
This dimension can then be assigned to exactly 1 InfoObject (the line item InfoObject).
When activating the InfoCube, NO database tables will be created for this dimension; a field with the data element RSSID is written in the fact table instead, that points directly to the SID table for the InfoObject.
In other words: the route using a dimension table is omitted. From the star schema diagram: InfoCube-> Dimension->InfoObject will be (for this a line item InfoObject) InfoCube->InfoObject.
Besides, there is also the "High Cardinality" flag (for every dimension) that indicates that a single dimension contains a high number of records (at least 20% the size of the fact tables, measured in number of records).
If you set this flag, when you generate the data structures for a cube, the physical layout of the tables may be adjusted. For example, suitable index types are chosen (database-specific). This is necessary for good performance when reading the records from this dimnension.
Here you have to consider also the role of the master data attributes/hierarchies and so on...
Ralph Kimball writes in his book The Data Warehouse Toolkit:
"The nine database design decision points for a dimensional data warehouse consist of deciding on the following:
1.The processes, and hence the identity of the Fact Tables ((one Fact Table - one InfoCube...) -> intersection entities)
2.The dimensions of each Fact Table (-> strong entities)
3.The dimension attributes with complete descriptions and proper terminology (-> attributes and entities)
4.The grain of each Fact Table
5.The facts, including pre-calculated facts
6.How to track slowly changing dimensions
7.The aggregations, heterogeneous dimensions, mini-dimensions, query modes and other physical storage decisions
8.The historical duration of the database (archiving aspects)
9.The urgency with which the data is extracted and loaded into the data warehouse (time frame for loading)"
Hope it helps...
Bye,
Roberto
Add a comment