Skip to Content
avatar image
Former Member

How to avoid cursor and make more efficient code

Hello HANA experts,

We have a problem with a preprocessing procedure. It prepares some data to be processed with R. A simpler version of what we need to do can be summarized as:

We have two tables: A and B. We need to do an aggregation from table B for each row in table A, using some table A columns to limit the range of table B rows to be aggregated.

As a sample, table A has information of some buildings (construction start, construction end and where it is located), and table B has weather information for the last years for many locations. What we want to do is to aggregate how much it rained during the construction of each building, and add that information to table A.

Currently we've developed it using a SQLSCRIPT procedure with a cursor, but it takes too long to preprocess the hole table due to the fact that cursors are not parallelized. Thus resulting in a sequential execution of as many aggregation queries as the number of rows in table A.

Do you have any advise regarding how to make more efficient our preprocessing stage? There is any way to run the aggregation queries in parallel, avoiding the use of the cursor?

Thanks in advance!

Juan

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

3 Answers

  • Best Answer
    avatar image
    Former Member
    Mar 08, 2016 at 06:05 PM

    I think this can be done using a query like the below. The inner queries are to locate the station closest to the building. Outer query joins the building and weather tables using the found closest station ID.

    SELECT bloc.id_building, SUM(w1.prec) prec

      FROM (

        /* Select the station with the least distance */

      SELECT e.*

       FROM (

        /* Rank the distance */

      SELECT d.*, RANK() OVER (PARTITION BY d.id_building ORDER BY d.dist ASC) rnk

       FROM

      (

      /* Distance between the buildings & stations */

      SELECT b.*, w.id_estation, b.location.ST_DISTANCE(w.obs_location) dist

        FROM buildings b

       CROSS JOIN (SELECT DISTINCT id_estation, obs_location FROM weather) w

      ) d

      ) e

      WHERE e.rnk = 1

      ) bloc

    LEFT OUTER JOIN weather w1

       ON bloc.id_estation = w1.id_estation

      AND w1.obs_date BETWEEN bloc.cons_start and bloc.cons_end

    GROUP BY bloc.id_building

    ORDER BY bloc.id_building

      ;

    Add comment
    10|10000 characters needed characters exceeded

  • Mar 08, 2016 at 01:35 AM

    Hi Juan,

    I trust it might be possible do what you're doing with plain SQL or WITH but we need to understand what exactly you want to do.

    Provide the table create statements, data samples and desired results for the data samples.

    BRs,
    Lucas de Oliveira

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Hi Lucas

      A sample can be:

      DROP TABLE BUILDINGS;

      CREATE COLUMN TABLE BUILDINGS(

          ID_BUILDING INTEGER,

          CONS_START DATE,

          CONS_END DATE,

          LOCATION ST_GEOMETRY(4326)

      );

      DROP TABLE WEATHER;

      CREATE COLUMN TABLE WEATHER(

          ID_ESTATION INTEGER,

          OBS_LOCATION ST_GEOMETRY(4326),

          OBS_DATE DATE,

          PREC REAL

      );

      IMPORT FROM CSV FILE '/home/datadm/util/scripts_hana/weather.txt'

      INTO WEATHER

      WITH

        record delimited by '\n'

        field delimited by ','

        skip first 1 row;


      IMPORT FROM CSV FILE '/home/datadm/util/scripts_hana/buildings.txt'

      INTO BUILDINGS

      WITH

        record delimited by '\n'

        field delimited by ','

        skip first 1 row;



      We want to have an ouput as follows:

         

      CREATE COLUMN TABLE OUTPUT_TABLE (

          ID_BUILDING INTEGER,

          PREC_ACUM REAL

      );


      INSERT INTO OUTPUT_TABLE VALUES(1,43.38);

      INSERT INTO OUTPUT_TABLE VALUES(2,75.92);

      INSERT INTO OUTPUT_TABLE VALUES(3,66.64);


      That belongs to the following queries (note that id_estation is calculated as the closest one to the location of the building, and the dates used in the between come from cons_start and cons_end):

      select 1 as id_building, sum(prec) from weather where id_estation=99021 and obs_date between TO_DATE('01-01-2015','DD-MM-YYYY') and TO_DATE('12-02-2015','DD-MM-YYYY');

      select 2 as id_building, sum(prec) from weather where id_estation=99012 and obs_date between TO_DATE('12-01-2015','DD-MM-YYYY') and TO_DATE('13-03-2015','DD-MM-YYYY');

      select 3 as id_building, sum(prec) from weather where id_estation=99008 and obs_date between TO_DATE('01-02-2015','DD-MM-YYYY') and TO_DATE('29-03-2015','DD-MM-YYYY');


      I've ommited the part in which I've calculated the closest id_estation for each building, you can do it too if you want, using the ones specified in the queries.



      Thanks for your help



      Regards


      Juan




  • Mar 08, 2016 at 04:46 PM

    i think this document may help you

    Experiences with SAP HANA Geo-Spatial Features – Part 1

    after finding the closest distance you can aggregate PREC value. But what i didn't understand is, what is the threshold value for you defining the closest distance ? If it is 1km distance to building A, and 500meter distance to building B, can we accept it rained to both buildings?

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Hi Bilen,

      Thanks for your reply. I read that document couple of weeks ago when started using GIS functionalities in HANA.  My problem is not with the GIS part (that was already solved).

      My problem is how to do the query for each row in table A without having to loop over table A (without using a cursor).

      Regarding your threshold question, I'm just interested in the closest one. I mean, if one building is 500 meter away from station 1 and 800 meter away from station 2, I'm only  interested in querying station 1. But as I said, that was already solved, my problem comes with the cursor thing ; )

      Thanks!

      BR

      Juan