Skip to Content
author's profile photo Former Member
Former Member

Cube Redesign : how to identify the culprits in a bad dimension

Hello Community,

What is the best way to identify which "combination of characteristics" is guilty for causing a large record count in the dimeension table ?

We are working to redesign an infoCube that had many unbalanced dimensions. So far we've done quite well: all newly designed dimensions are well balanced *except for one*

Unfortunately, we don't have the luxury of just splitting it up further, since we're already using the maximum of 13 customer dimesions.

So now I must decide how to best shuffle the characteristics, without disrupting the balance of the other dimensions.

The cube was designed long before I joined the team, and I don't have a relationship diagram (nor, indeed, an intimate knowledge of the data). Is there a way to reverse engineer to find out which are the problem characteristics ?

For example, I've already selected the record count from each SID table. But I'm not certain what to do next.

So my question is : what is the best way to identify which "combination of characteristics" is guilty for causing a large record count in the dimeension table ?

Thanks so much for your help !

Keith

Add a comment
10|10000 characters needed characters exceeded

Related questions

3 Answers

  • Best Answer
    author's profile photo Former Member
    Former Member
    Posted on Mar 22, 2006 at 10:20 PM
    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member

      To give some feedback about what I've done : I decided to run selects against the ODS that was used to load this cube. (In reality there are 3 ODS, but I'm using just one to test the dimension designs).

      Here is the select statement I've used to query for unique combinations of characteristics in the ODS.

      select count(*) from (select distinct char_1, char_2, char_3, ..char_N from sapr3."/BIC/table_name") ;

      A couple of things to keep in mind when you're setting this up :

      • char_1, char_2, char_3, ..char_N represents a comma separated list of the characteristics you wish to test as a plausable dimension

      • be certain that char_1, char_2, char_3, ..char_N are actually represented in the SQL statement as the FIELD NAMES for the table you're selecting against. For example characteristic 'ZEXAMPLE' has a field name in the ODS of /BIC/ZEXAMPLE and characteristic 0EXAMPLE2 has a field name in the ODS of EXAMPLE2

      • In the select statement, fields with a slash in the field name must be qualified by double quotation marks like this: (select distinct "/BIC/ZEXAMPLE", ..)

  • author's profile photo Former Member
    Former Member
    Posted on Mar 22, 2006 at 09:13 PM

    Hi!

    Here's what we did in order to find "good" dimensions:

    1) Copy some sample data (we took it from an ODS object that feeds the cube) into a database table (MS Access should work okay)

    2) Create a 'control table' that defines possible dimensions

    3) Create a macro that reads the control table and generates SQL statements like "SELECT COUNT DISTINCT MATERIAL PLANT COMP_CODE <fields you'd like to have in the same dimension> ... FROM Sample_Table"

    That way you can find out how many entries the corresponding dimension table would have in BW. The bigger the sample the better the guess of course.

    This method allowed us to keep all dimensions of an SD cube at less than 1% of fact table (except for line items...)

    Hope this helps,

    Christian

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member

      Christian, thanks.

      We are doing our 'balance' testing directly in production with sample production data. Normally the cube would hold 3 months of data, but our sample is one month in size.

      So in that case, with the test cube loaded, we have only one dimension that is still out of balance.

      To find the problem characteristics in that dimension, would you suggest that I "select count distinct" for the fields I'm interested in from

      • the dimension table itself

      • or from the ODS that was used to load the cube ?

      and of course, would you also explain why ?

      thanks so much !

      Keith

  • author's profile photo Former Member
    Former Member
    Posted on Mar 22, 2006 at 10:05 PM

    Keith,

    the "select count distinct" simply gives you the number of distinct combinations for certain characteristics.

    Assume the dimension contains char1, char2, ..., charN.

    Now you can run the "select count distinct" for subsets of these characteristics. Let's say there are a huge number of combinations for char1, char3, char5. Then these characteristics should not be in the same dimension...

    I think it doesn't matter whether you use the source ODS or the SIDs in the dimension table.

    Hope this helps,

    Christian

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.