cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding how BOBJ Explorer issues queries to HANA

justin_molenaur2
Contributor
0 Kudos

Hi all, just wanted to see if anyone had anything to add regarding the integration of BO Explorer connected to HANA. Starting to dive into a little more details in terms of trying to understand how InfoSpaces and ViewSets scale and also change the queries that are issued to HANA. This of course would be in support of trying to make the response time in the application as fast as possible. Since the code is produced by the tool directly, the only chance for improvement would be filtering the data and trying to process the model in the most effective manner.

In some cases I am finding that we are hitting serious roadblocks in performance where HANA will literally be at 100% CPU for up to 15 minutes while a single InfoSpace is processed, but then scaling back or removing a facet or two allows it to perform within a reasonable timeframe. As always, however there seems to be room for improvement and I found little information on SCN or otherwise on the web.

It looks like Explorer takes advantage of the GROUPING SETS with MULTIPLE RESULTS to produce the facets. The problem is, that I am not all too familiar with this technique. I can see SQL produced below, which is creating a grouping set according to each facet and the corresponding sort order, which makes sense. This specific InfoSpace is based on an Analytic View with a fact table of around 85 million records surrounded by a handful of attribute views, largest one being around 4 million. When I look at the explain plan (attached), the query performs reasonably under 3s, but I notice that there is activity in the Calc Engine which I can guess happens as a result of the Multiple result statement and the first result which provides some distinct value analysis along with the data.

SELECT "SOLD_TO" AS "ALIAS_B1C361C6" , "SIG" AS "ALIAS_000140B1" ,

"CUSTOMER_NAME" AS "ALIAS_3BB8F64C" , "SOLD_TO" AS "ALIAS_B1C361C6" ,

"VTEXT_2" AS "ALIAS_58A49976" , "PAPH2" AS "ALIAS_048619A9" , "VTEXT_1"

AS "ALIAS_58A49975" , "PAPH1" AS "ALIAS_048619A8" , "MATNR_T" AS

"ALIAS_5CEF3DD9" , "LVL3_NM" AS "ALIAS_4B61F44D" , "FPY_FLG" AS

"ALIAS_048A50D1" , "VTEXT_3" AS "ALIAS_58A49977" , "PAPH3" AS

"ALIAS_048619AA" , "FISCAL_YR_NUM" AS "ALIAS_591332E1" , "DIST_CHL" AS

"ALIAS_3FE4F66E" , "CLNDR_YR_MTH_NUM" AS "ALIAS_0F347DEE" ,

"ZZM_SOURCETYPE" AS "ALIAS_AE0F9C47" , "LVL4_NM" AS "ALIAS_4B6268AC" ,

"LVL5_NM" AS "ALIAS_4B62DD0B" , SUM("SALES") AS "ALIAS_864AB354"

FROM"_SYS_BIC"."Main.Analytics.Sales/AN_INTEGRATED_SALES"

WHERE (  ( "SALE_ITEM_GROUP_ID"= 'SRGP'  )  AND  ( "SIG"= 'SRGP'  )  )

GROUP BY

GROUPING SETS LIMIT 26 MULTIPLE RESULTSETS (

(("SOLD_TO") ORDER BY ALIAS_864AB354 DESC ,ALIAS_B1C361C6 ASC ),

(("SIG") ORDER BY ALIAS_864AB354 DESC,ALIAS_000140B1 ASC ),

(("CUSTOMER_NAME", "SOLD_TO") ORDER BY

ALIAS_864AB354 DESC ,ALIAS_3BB8F64C ASC ),

(("VTEXT_2", "PAPH2") ORDER BY ALIAS_864AB354 DESC ,ALIAS_58A49976 ASC ),

(("VTEXT_1", "PAPH1") ORDER BY ALIAS_864AB354 DESC ,ALIAS_58A49975 ASC ),

(("MATNR_T") ORDER BYALIAS_864AB354 DESC ,ALIAS_5CEF3DD9 ASC ),

(("LVL3_NM") ORDER BY ALIAS_864AB354 DESC ,ALIAS_4B61F44D ASC ),

(("FPY_FLG") ORDER BY ALIAS_864AB354 DESC ,ALIAS_048A50D1 ASC ),

(("VTEXT_3", "PAPH3") ORDER BY ALIAS_864AB354 DESC ,ALIAS_58A49977 ASC ),

(("FISCAL_YR_NUM") ORDER BY ALIAS_864AB354 DESC ,ALIAS_591332E1 ASC ),

(("DIST_CHL") ORDER BY ALIAS_864AB354 DESC ,ALIAS_3FE4F66E ASC ),

(("CLNDR_YR_MTH_NUM") ORDER BY ALIAS_864AB354 DESC ,ALIAS_0F347DEE ASC ),

(("ZZM_SOURCETYPE") ORDER BY ALIAS_864AB354 DESC ,ALIAS_AE0F9C47 ASC ),

(("LVL4_NM") ORDER BY ALIAS_864AB354 DESC ,ALIAS_4B6268AC ASC ),

(("LVL5_NM") ORDER BY ALIAS_864AB354 DESC ,ALIAS_4B62DD0B ASC )

)

WITH PARAMETERS('locale'='en_US' )

Anyone have positive experience or any tuning tips for working with Explorer on HANA?

Regards,

Justin

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

If explorer just send the query you outlined and it gets executed in 3 sec but it makes 100% CPU on HANA there is something which is seriously wrong with interaction between HANA and exploration server or there are many other queries Explorer is sending which you should be able to find out using session log..

I hope your explorer sever has been sized correctly based on your need but as explorer server sizing is different that BI4 sizing it could be possible there is a sizing issue which you might want to explore specifically on exploration and indexing server

Also you mentioned if you remove two facet the performance is better ; is there any pattern you are seeing with these two dimension ? like is it better to remove two facet of 4million record tables vs with smaller record tables..

So when you remove the facet is there any different in query run time from HANA studio for the new query that gets generated?

justin_molenaur2
Contributor
0 Kudos

Thanks for the info Durgamadhab.

That specific query is not the one that causes 100% CPU, but it is similar. I was able to modify the problematic one to a point that it will run reasonably. I couldn't really find a pattern of facet removal that would cause the condition, currently working with SAP on diagnosis. Part of the solution was to filter the largest dimension and that seemed to help.

I am taking the Explorer sizing out of the mix, it seems to bring very little overhead to the processing (3s on HANA and 5s overall in when opening InfoSpace), so I am trying to focus on the HANA side of things first.

Any other feedback for Explorer on HANA scenarios would be helpful

- What size datasets have you put Explorer on and what were the runtimes?

- Did you have more success optimizing the performance in Explorer? If so, any techniques in particular gave you the biggest boost? Were these one the HANA side or Explorer servers?

- Did you notice any major overhead on the Explorer servers and are there any techniques to optimize the processing there?

Just thinking off the top of my head here...any insight from is very welcome.

Regards,

Justin

justin_molenaur2
Contributor
0 Kudos

So I was able to resolve the performance issue of the query that caused the CPU to peak out.

The key lesson here is that nested IF statement presented as calculated columns are very expensive. Before when I ran the PlanViz I could see that the Calc engine was invoked with two separated processes.

Afterwards when I removed the calculated column I noticed that the Calc engine was only invoked for one process and that the performance was increased by 10x.

Regards,

Justin

Former Member
0 Kudos

Someone needs to disable calculated columns in HANA.

Curious what your solution was though - i.e. where you put the calculation - generated column?

justin_molenaur2
Contributor
0 Kudos

Hey Jody, thanks for stopping by

The culprit was a nested IF statement on a time dimension that was dynamically flagging a column (Y/N) in an attribute view. This of course joined to millions of rows in the fact got expensive very quickly, and was even able to bring the DB to its knees for about 15 minutes as described above.

Once I was able to recognize what the developer did, we materialized the logic in the underlying table (populates through a stored procedure) and the difference was incredible, something like 20s on 200m rows vs. 15 minutes at 100% CPU prior. Of course after applying some filters to help narrow the join I got it to about 3s, which was great.

The problem (for me at least), is that it was not obvious by analyzing the VizPlan where the expense was coming from. All I could see was an extra step in the calc engine (ceGNavAggregationPop) and a long running step in the ceOlapSearchPop step, example here (not the super long running example however). It seems that with this SQL syntax, that no matter what we are always invoking the calc engine.

By trial and error, I was able to isolate the problem column and remove it from execution, and accordingly the VizPlan dropped that extra step and looked more like this.

As I have seen you state a few times, the cost of calculated columns can be immense and this is a very serious example of that! In general however, would you say that these are really only expensive when there is expression logic or string manipulation? From what I have seen, as long as the calculations are only with measures, the processing seems to stay within the OLAP engine. Do you have any examples of VizPlan where it was obvious to you that a calculated column (string or logic) was causing the issue or how do you typically diagnose that?

Another interesting observation regarding Explorer queries to HANA is that any time there is a variable built on a model, you have the option to choose a value for this variable during Explorer InfoSpace design time (through 'validate' button). One would assume that this would affect the WHERE clause (which it does), but the added side effect is each time that the InfoSpace or ViewSet is executed, there is a SELECT DISTINCT <variable_column> FROM <model> also executed even though the value is already selected. This seems a little redundant and also adds useless overhead to the processing. The problem materializes itself even moreso when creating viewsets as it issues this same DISTINCT for each component in a viewset (maybe 4-7 times), and adds no value. So we are in the process of analyzing how to achieve similar functionality without using a variable.

Always something interesting to figure out these days!

Regards,

Justin

Former Member
0 Kudos

Interesting.

I don't have a whole lot to offer from your observations - mine have been similar. What I'd say is that I've never really seen awful performance from "je", "bw" or "olap" operators - since these are associated with Join/OLAP Engine, all they really ever do are joins, aggregations, etc - but nothing too tricky. Of course, there may be room for optimization via filters, correct modeling of join types etc - but most of the time, performance hang-ups have been with CalcEngine operations. So I've just become accustomed to looking there.

However - I'm now rather uncertain about the latest optimizations in HANA. Once upon a time - it was a fact that calculated fields that had formulas which referenced dimensions were calculated in the CalcEngine, after aggregation (in OLAP Engine). In fact, the studio would give you a warning that "Calculate before Aggregation" would be ignored in cases like these - meaning the OLAP Engine wouldn't handle such calculations.

However, I now just create a very simple Analytic View with one dimension table, and I created a calculated column with an if statement, which referenced a field from the dimension table - and I checked "Calculate before Aggregation" - and the VizPlan shows everything being executed in OLAP Engine (i.e. operators with "bw" or "olap" prefixes). See below:

So I'm not quite sure what to make of this. In general, I'd say the usual recommendations apply - avoid calculated columns when possible, create them as generated columns where possible, etc etc - but it looks like I need to get up to scratch on HANA's latest nuances.

Weird about Explorer as well. I haven't done much with Explorer before although I have seen problematic SQL generated from Lumira. But that's another story. Keep me posted on what you find!

justin_molenaur2
Contributor
0 Kudos

Interesting observation. Just out of curiosity, what Rev are you working on?

In our scenario, the calculated column was in the attribute view itself. If you still have that test model handy, can you perform an IF statement on the dimension  instead and see what the VizPlan spits out?

I did receive a response from SAP about the problem query. The analysis correlates with my findings as well, but they also offered a solution, so thought I would share. The developer had put the calculated column in the attribute view and it used the now() expression as discussed below.

>>

Regarding the high CPU Utilization, it is due the now() function in

the complex expression in the calculated column FPY_FLG. the function

now() if repeatedly call during runtime whereas it should be replace

by the correct value during compile phase.

So the workaround would be to create a mandatory input parameter

with default value as now() then use this input parameter in place of

now() in the FPY_FLG calculate column.

<<

As a side note and related to tuning explorer, we are having fun with resultcache testing over in this thread.

Regards,

Justin