cancel
Showing results for 
Search instead for 
Did you mean: 

Aggregates versus Physical Cube Splits

Former Member
0 Kudos

Hi,

What would be the criteria to decide between aggregates and physical cubes. For E.g.

We have a lot of regional users who are interested in only regional data (say japan, americas, europe etc).

Should we build a global cube and build aggregates on it for the regions or should we build multiple cubes per region and have a single multi-provider on these.

Which would yield better performance ?

Thanks in advance.

Shailesh

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Interesting debate.

I personally think it depends on what the requirements of majority of the users are. If most of the reports are regional, then it might make a lot of sense to build individual cubes. If the majority spans over several regions, then a MP might suffice.

However, from a performance, maintainance and availability perspective, I think going for the physical cubes outweighs the MP.

Former Member
0 Kudos

Hi,

As a general point, it also depends upon the no of Regions.If regions are more ,for example more than 10, you have to maintain 10 IC and UR . So if regions are more, then aggregations is the better option than Multiprovider.

With rgds,

Anil Kumar Sharma.P

Message was edited by: Anil Kumar Sharma

Former Member
0 Kudos

Has anyone used a MP with more than 10 cubes ? Does the performance degrade significantly ?

If the users are all running regional queries on a MP with say 15 cubes, then the a query on the MP splits into 15 queries of which only one would return data. SO wouldnt the other queries be very fast ? Would this scenario be even better if the data in each cube was disproportionate ?

Say region 1 has 50% of the data and the other regions contribute the other 50. In a global cube wouldn't the users of the other region be impacted by the data size for region1

Shailesh

Former Member
0 Kudos

I wouldn't talk about aggregates and multi-providers as they are alternatives, and you have to choose one. They really serve different purposes.

Let's get aggregates out of the way first as that's fairly straight forward. An aggregate is useful if:

1 - The InfoCube is quereid reasonably frequently.

2 - The data in your InfoCube is more granular than the results that your users usually require, e.g. sales data in the cube is by day, but 80% of your user's queries are want results by month.

If both of these are true ( and under some circumstances I would consider a highly summarized aggregate that might not be used very often, but that's another discussion) then look at creating an aggregate(s) to summarize the date once, rather than over and over in each query. SAP recommends an aggregate if the data can be summarized by a factor of 10, but I would go with something less for some very highly used queries. You have to compare the query cost against the InfoCube and against the Aggregate and see if how much it helps.

Under Oracle, using aggregates to simply split the data into other smaller cubes, but doesn't summarize the data would be a bad idea. Oracle and STAR TRANSFORMATION quickly determines the rows it needs to read with minimal effort.

Multi_providers -

The decision to split data into multiple InfoCubes and use a mutliprovider. This is more difficult to answer - depends on the data, how much of it there is, how it's used, etc. You really have to go into the design process knowing that whatever you do, you'll probably be ready to scrap it and rebuild it in a year or two anyway as you get a better understanding of the tools, the data, and teh users needs. I know some might take issue with this, but I have seen this everywhere I've been.

Will users regularly query multiple regions, or generally just their own? If they will genrally only query their region, MP doesn't buy you anything performance wise. One of the main benefits of the MP, is that it if the queries are against all regions, it will run <b>parallel queries</b> (concurrently) to reduce overall clock time. That can, however, also be a problem if your system is small and lots ot queries are launched at the same time. You can turn parallel off at system, InfoProvider, or query level. Also, latest Svc Packs provide an option to restrict the number of queries that will launch at once from an MP. As the one post mentioned, there is still some shaking out of the multiprovider.

As far as splits - if you will really have just one region per InfoCube, you would specify the region's value in the Structure Specific Properties for each InfoCube. That way if a user submits a query just for 3 regions, the system looks at the Structure Specific values and excludes any InfoCubes that do not have one of the 3 requested values, so only 3 queries are launched.

If you see this InfoProvider quickly growing to the 10's of millions of rows, then give thought to MultiProvider, but if it will not be that big, I wouldn't waste more time thinking about it.

One other thing you can do to preserve some flexibility in your design, is start with a single InfoCube, but define a Multi_Provider on top of it. Building your queries on the Multi-Provider rather than the InfoCube. Then you can add more cubes to the MP later, or even point it to to different cube(s),etc. later without having to recreate all your queries, which can create a lot of aggrevation for your user base.

Pizzaman

Former Member
0 Kudos

Hi Pizzaman,

Thanks. That was informative. However let me clarify the scenario further.

1. There is very little aggregation required from a usage perspective.

2. The data in the regions is very disparate. One region has 50% versus the others.

3. The users all run regional reports. less than 2% need global views of data.

In such a scenario would a multiprovider be fast for a region with less data than a global cube ? Since it has to scan say 20% of the data rather than the whole cube.

The same logic being applied from an aggregate perspective. Queries will hit the mini cube (regional) rather than the global cube.

Is it even worth it ? Will a global cube without aggregates be good enough ?

I do not know about structure specific properties, where can i get info on them ?

Thanks

Shailesh

Former Member
0 Kudos

hi Shailesh ,

as Pizzaman already mentioned, it makes no sense to compare aggregates and multiproviders. They have different functionalities and can be combined perfectly. A multiprovider does not prevent you from building aggregates on the underlying infoproviders. When running a query on a multiprovider it will send parallel queries to the different infoproviders and will use the aggregates built on these infoproviders.

In your case, as 98% of the users will run regional reports, I suggest you create regional infoproviders on which you build the regional reports. For the other two percent, you can build a multiprovider to combine the different regions, without ignoring the aggregates and while running parallel queries on the underlying infoproviders.

Check out following links:

https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/library/biw/a-c/bw performance tuning presentation.pdf

https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/library/events/asug-biti-0... bw query performance tuning with aggregates

Grtz, Julie

Former Member
0 Kudos

Shailesh -

Was off for a couple of days, but I'm back in the office and ready to write....

1. Aggregation - if there is little aggregation of results, then don't bother with aggregates. I assume you have made this determination by reviewing the DB Rows Selected to Rows Transferred ratio? That would be the only way to know.

2 + 3. If users are running "region" reports, then the parallelism that a multi-provider provides offers you nothing.

Did you say what DB and release you are on? Can't be sure how all the different DB's execute, but if you are running Oracle, the use of bitmap indices and the Star Transformation results in the read accesses of the Fact table being indexed - that is, the query only reads the rows in needs. So if a query only needs 20% of the data, it does not read any of the unnecessary 80%.

It would be worth reviewing an Execution Plan so you can see how the query runs.

You can get to Strucuture Specific Properties from the Extras menu itme when you are in the InfoProvider Change dialog. This lets you set fixed values for a column, set the F4 properties for an InfoObject in just this InfoCube, and a fe w other options. Setting the fixed value would be very important if you do end up going with multiple cubes.

Based on everything I've read so far, there is no reason for you to consider splitting up the data by region.

you need for the results

Answers (2)

Answers (2)

Former Member
0 Kudos

I would go for the second option where the data relavent to the region is stored in the individual cubes and a single multiprovider presides over cross-region reporting need.

The reason why i would opt to this is becos when a query hits the region cube it has to deal with a lesser data set and extracts/transfers a fewer number of records which will be faster in many cases.

Coming to the first choice a global cube with several regions and users curious about there own region reports is a self-explanatory situation. Here the situation answers and the points in the direction of the second choice. Also global cube is or will be a problem as the data grows humongous.

Most likely the second solution will yeild better performance.

Hope it helped you.

Former Member
0 Kudos

Hi,

Good question...

I guess You could rephrase the question on performance to:

Which performance best Multi/parallel processing of SQL used by MP or OLAP aggregate manager? My experience with the MP query handling is that it is difficult to predict the behavior of this engine so often you end up forcing the query to hit a specific cube in the MP, whereas the aggregate handler i OLAP engine is more stabile.

When deciding whether to use MP or aggregate I think the aggregates are much more flexible as they are created and change very rapidly. You can respond to changes in requirements much faster. However if You have a situation where You need different logic on the different levels of granularity You are better of with individual cubes because You have individual updaterules for each cube. Obviously You do not have explicit update rules for aggregates.