cancel
Showing results for 
Search instead for 
Did you mean: 

Cube Redesign : how to identify the culprits in a bad dimension

Former Member
0 Kudos

Hello Community,

What is the best way to identify which "combination of characteristics" is guilty for causing a large record count in the dimeension table ?

We are working to redesign an infoCube that had many unbalanced dimensions. So far we've done quite well: all newly designed dimensions are well balanced *except for one*

Unfortunately, we don't have the luxury of just splitting it up further, since we're already using the maximum of 13 customer dimesions.

So now I must decide how to best shuffle the characteristics, without disrupting the balance of the other dimensions.

The cube was designed long before I joined the team, and I don't have a relationship diagram (nor, indeed, an intimate knowledge of the data). Is there a way to reverse engineer to find out which are the problem characteristics ?

For example, I've already selected the record count from each SID table. But I'm not certain what to do next.

So my question is : what is the best way to identify which "combination of characteristics" is guilty for causing a large record count in the dimeension table ?

Thanks so much for your help !

Keith

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Keith,

here's the documentation for the "select distinct" statement: <a href="http://help.sap.com/saphelp_47x200/helpdata/de/40/1311fd2fa511d3a98100a0c9449261/frameset.htm">http://help.sap.com/saphelp_47x200/helpdata/de/40/1311fd2fa511d3a98100a0c9449261/frameset.htm</a>

I did this in MS Access, so I don't have a sample...

Christian

Former Member
0 Kudos

To give some feedback about what I've done : I decided to run selects against the ODS that was used to load this cube. (In reality there are 3 ODS, but I'm using just one to test the dimension designs).

Here is the select statement I've used to query for unique combinations of characteristics in the ODS.

select count(*) from (select distinct char_1, char_2, char_3, ..char_N from sapr3."/BIC/table_name") ;

A couple of things to keep in mind when you're setting this up :

  • char_1, char_2, char_3, ..char_N represents a comma separated list of the characteristics you wish to test as a plausable dimension

  • be certain that char_1, char_2, char_3, ..char_N are actually represented in the SQL statement as the FIELD NAMES for the table you're selecting against. For example characteristic 'ZEXAMPLE' has a field name in the ODS of /BIC/ZEXAMPLE and characteristic 0EXAMPLE2 has a field name in the ODS of EXAMPLE2

  • In the select statement, fields with a slash in the field name must be qualified by double quotation marks like this: (select distinct "/BIC/ZEXAMPLE", ..)

Answers (2)

Answers (2)

Former Member
0 Kudos

Keith,

the "select count distinct" simply gives you the number of distinct combinations for certain characteristics.

Assume the dimension contains char1, char2, ..., charN.

Now you can run the "select count distinct" for subsets of these characteristics. Let's say there are a huge number of combinations for char1, char3, char5. Then these characteristics should not be in the same dimension...

I think it doesn't matter whether you use the source ODS or the SIDs in the dimension table.

Hope this helps,

Christian

Former Member
0 Kudos

I see. Of course it is the interaction between the characteristics which will cause a large number of DIM IDs.

Would you be so kind as to provide me the example SQL statement that I would use to pull distinct values for multiple fields in one select statement.

For example :

select distinct char1,char2,char3,count(*) from sapr3."/bic/d" ;

is that correct ?

Former Member
0 Kudos

Hello Keith,

Try running the program

SAP_INFOCUBE_DESIGNS this will give you the percentage ratio of the dimensions to fact table. This way you will findout the bad dimensions.

GSM.

Former Member
0 Kudos

Another quick check on the distinct values is to use ST04 or RSRT and get teh Explain Plan. BW generates the plan and each table reference in the graphical portion of the plan serves as a hyperlink to the system info on the table. If you doubleclick on a dimension table, you'll get info on the distinct values of the index built on characteristics (first 16 only).

As mentioned, this may give you a sense of where your problems may be, but as mentioned, it is really the interplay among the characteristics. If you have this data in an ODS or cube already, you can certainly query those Infoproviders using just the characteristics you are curious about.

If your shop allows, there's always the possibility of running SQl quereis directly against the tables in question rather than pulling the data out to access.

There was an article not that long ago in BW Expert that detailed how to use the BW 3.5 APD (Analysis Process Design) capability to perform the characteristic analysis you really want.

Former Member
0 Kudos

Hello Pizzaman

I like the idea of using an explain plan, but I don't know how to setup the analysis.

Would I use an existing query against the cube ? And can you describe in more detail on how to setup this analysis ?

Thanks !

Former Member
0 Kudos

Here's a link that provides more deatil on using ST04 or RSRT to get an Explain Plan.

Former Member
0 Kudos

Hi!

Here's what we did in order to find "good" dimensions:

1) Copy some sample data (we took it from an ODS object that feeds the cube) into a database table (MS Access should work okay)

2) Create a 'control table' that defines possible dimensions

3) Create a macro that reads the control table and generates SQL statements like "SELECT COUNT DISTINCT MATERIAL PLANT COMP_CODE <fields you'd like to have in the same dimension> ... FROM Sample_Table"

That way you can find out how many entries the corresponding dimension table would have in BW. The bigger the sample the better the guess of course.

This method allowed us to keep all dimensions of an SD cube at less than 1% of fact table (except for line items...)

Hope this helps,

Christian

Former Member
0 Kudos

Christian, thanks.

We are doing our 'balance' testing directly in production with sample production data. Normally the cube would hold 3 months of data, but our sample is one month in size.

So in that case, with the test cube loaded, we have only one dimension that is still out of balance.

To find the problem characteristics in that dimension, would you suggest that I "select count distinct" for the fields I'm interested in from

  • the dimension table itself

  • or from the ODS that was used to load the cube ?

and of course, would you also explain why ?

thanks so much !

Keith