cancel
Showing results for 
Search instead for 
Did you mean: 

using table variables in combined virtual and non virtual environment

former_member466180
Discoverer
0 Kudos

According to one of the earlier posts on table variables in HANA SQL Script, a table variable is not materialized when it is created but it is rather 'inlined' whenever it is used in a query. See answer from Lars Breddemann in this thread: https://archive.sap.com/discussions/thread/3633470. According to Lars this allows for more optimizations than simply materializing, which makes sense. However, for an environment where you have data in HANA and data in virtual tables (SDA), this is unwanted behavior (at least it is in my use case). I've tried creating table variables that only query data in the virtual tables and table variables that only query data in tables that are in HANA memory and then combining the two data sets assuming that HANA would materialize both data sets. But as Lars confirmed, HANA doesn't materialize the table variables which in the end causes the overall query that contains both virtual and non-virtual tables to be pushed to the virtual database. For performance reasons that is not what I want. Is there a way to force HANA to materialize the data that I pull from the virtual tables before I join it to the data that lives on HANA? If I can't force HANA to materialize a table variable, could I use an array to 'materialize' the virtual data?

Accepted Solutions (0)

Answers (1)

Answers (1)

lbreddemann
Active Contributor
0 Kudos

You are mixing different concepts/technologies here.

Query processing with virtual tables allows for "push-down" of join tables to the remote site. This is completely independent of how SQLScript deals with table variables.

As for SDA, it can be quite complicated to determine whether or not a join or sub-select should be pushed to the remote site or whether it is better to get all necessary data into the local instance and continue processing the query there.

To my knowledge, HANA 1 basically looked into optimizing for minimal total data transfer costs as this is one of the main drivers for long runtime. So HANA 1 tries to estimate how much data would need to be transferred to the remote site and back to the local instance to process the query. In many cases I've looked at - especially those with relatively large result sets - this approach will lead to the decision to not push down data to the remote site.

Note, that this data transfer cost estimation does not take into account how long the data transfer between two systems actually take. As far as I know, there is no configurable transfer cost coefficient available in HANA 1

HANA 2 appears to me to be more sophisticated for this scenario and produces even better execution plans. But, I don't have much insight into the implementation details for this.

A rather important factor for using this query-block push-down is that the table/view meta-data (primary key/unique constraints, NOT NULL constraints and data statistics) is correct, up to date and as specific as possible. Not specifying e.g. the primary key on a table can lead to vastly different optimisation results.

Coming back to your question: so far it worked out well for me to use a temporary table instead of a table variable in order to avoid push-down to the remote site, which is what you describe you want to do.

former_member466180
Discoverer
0 Kudos

Hi Lars, thanks for your speedy reply. I realize that I'm mixing different concepts here but not sure I fully understand what you're saying. Probably a bit of ignorance from my side. Still, I'd like to understand what is happening and if I can somehow influence the HANA behavior.

To set the scene: we have data in a remote Oracle database and data in HANA. I've created a scripted calculation view that needs data both from the remote db and from HANA. I'm using table variables inside the scripted view. My assumption (or better yet 'my guess') was that HANA would materialize the results for those table variables on the HANA platform so I could then join them with the data that resides on HANA. That assumption was wrong as your earlier response showed. For this specific use case, I'm only querying a small result set from a much bigger data sets on the remote database and also a small result set from a much bigger data set on HANA. I can't use a temporary table because as far as I'm aware I can't create a (temporary) table from within a calculation view. The performance of the view is not great, takes about 15 seconds to run. As a test I pulled the full data set from the remote database (once) into HANA and using the exact same view but now pointed to the local data set the query time went down to less than a second.I can totally imagine that having data on HANA vs querying data from a remote data source has a big impact on performance. However, again as test, we then created two Java version of the same logic, using the same queries as the scripted calc view: one that used only HANA as data source (both for the virtual and the non virtual data sets) and one that used a combination of HANA data and a direct connection to the remote database without going through HANA. The Java version that only used HANA data (combination of virtual and non-virtual) took about 8 seconds to run and the version that used HANA data and a direct connection to the remote Oracle database took less than 1 second to run. What I'm trying to understand is why the HANA version that uses a combination of HANA data and remote data is so much slower than the Java version that uses the same combination of data sets, same logic and same queries. It's all academic but still I'd like to understand why we're seeing what we're seeing. Next to that we've had a lot of issues where running the HANA calc view caused the creation of temporary table on the remote Oracle side. All of those issues have been resolved by SAP patches but I'm trying to understand why temporary tables are even created on the remote database since creating tables can be a time intensive operation. I can understand that this is a very specific example and perhaps this is not the right forum but is there anywhere where I can read up on how HANA optimizes scripted calc views that include table variables and how/when/why temporary tables are created on remote databases when you run a HANA view that uses data from that remote database.

lbreddemann
Active Contributor
0 Kudos

Ok, this user didn't accept the public profile, which means, this conversation is now rather single-sided.

From the description, it's not possible to pinpoint what causes the performance problem in this case. One would have to look into this in detail.

As for the questions at the end:

  • table variables and SQLScript are independent of the handling of SQL statements against virtual tables
  • the decision to create and use temporary tables in remote DBs depends on the available data statistics, HANA parameters and the HANA version. I don't know a specific single resource that covers this in every detail, but there are a couple of SAP notes on these topics.

former_member466180
Discoverer

Wasn't aware of my public profile settings. I've accepted it just now. I'll go look for the SAP Notes.