on 07-14-2021 8:01 AM
Hi,
I'm working with a graph database in sap hana and i've notices that performance isn't (consistently) great.
I've defined my edges, vertices and graph as:
CREATE COLUMN TABLE "VERTICES" (
"ID" BIGINT PRIMARY KEY,
"DIRECT_ID" BIGINT,
"TYPE" NVARCHAR(100)
);
CREATE TABLE "EDGES" (
"ID" BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
"DIRECT_ID" BIGINT,
"TYPE" NVARCHAR(255),
"SOURCE" BIGINT REFERENCES "VERTICES"("ID") ON DELETE CASCADE NOT NULL,
"TARGET" BIGINT REFERENCES "VERTICES"("ID") ON DELETE CASCADE NOT NULL
);
CREATE GRAPH WORKSPACE "GRAPH"
EDGE TABLE "EDGES"
SOURCE COLUMN "SOURCE"
TARGET COLUMN "TARGET"
KEY COLUMN "ID"
VERTEX TABLE "VERTICES"
KEY COLUMN "ID"
It consists or ~30M vertices and 57M edges. The edges are almost always double between two nodes: one edge (n1)-(n2) and another edge (n2)-(n1).
The problems I want to tackle are:
I've managed to get both questions answered, buy performance is not what I expect. Sometimes the one-to-one is lightning fast, and sometimes it take 1,30 minutes to do the same job.
How I managed to get the known one-to-one shortest path. This take 1,5 when call the procedure the first time. Calling it again with the same or some different node(s) is lightning fast.
CREATE TYPE "TT_VERTICES_SPOTO" AS TABLE ("ID" BIGINT, "TYPE" NVARCHAR(100), VERTEX_ORDER BIGINT);
CREATE OR REPLACE PROCEDURE "SHORTEST_ONE_TO_ONE"(
IN i_startVertex BIGINT, -- the ID of the start vertex
IN i_endVertex BIGINT, -- the ID of the end vertex
IN i_dir NVARCHAR(10), -- the the direction of the edge traversal: OUTGOING (default), INCOMING, ANY
OUT o_pathLength BIGINT, -- the hop distance between start and end
OUT o_pathWeight DOUBLE, -- the path weight/cost based on the WEIGHT attribute
OUT o_vertices "TT_VERTICES_SPOTO"
)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
-- Create an instance of the graph, refering to the graph workspace object
GRAPH g = Graph("GRAPH");
-- Create an instance of the start/end vertex
VERTEX v_start = Vertex(:g, :i_startVertex);
VERTEX v_end = Vertex(:g, :i_endVertex);
-- Running shortest path using the WEIGHT column as cost
WeightedPath<DOUBLE> p = Shortest_Path(:g, :v_start, :v_end, (Edge e) => DOUBLE{ return :e."SYSTEEMLENGTE"; });
-- Project the results from the path
o_pathLength = LENGTH(:p);
o_pathWeight = WEIGHT(:p);
o_vertices = SELECT :v."ID", :v."TYPE", :VERTEX_ORDER FOREACH v IN Vertices(:p) WITH ORDINALITY AS VERTEX_ORDER;
END;
How I've managed to get the known vertex to vertex of a certain type. This always takes about ~1,5 to 2,5 minutes
CREATE TYPE "TT_VERTICES_SPTT" AS TABLE ("ID" BIGINT, "TYPE" VNARCHAR(100), "DISTANCE" DOUBLE);
CREATE OR REPLACE PROCEDURE "SHORTEST_PATH_TO_ALL"(
IN i_startVertex BIGINT, -- the ID of the start vertex
OUT o_vertices "TT_VERTICES_SPTT"
)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
-- Create an instance of the graph, refering to the graph workspace object
GRAPH g = Graph("GRAPH");
-- Create an instance of the start/end vertex
VERTEX v_start = Vertex(:g, :i_startVertex);
GRAPH henl = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, "DISTANCE", (Edge e) => DOUBLE{ return :e."SYSTEEMLENGTE";});
o_vertices = SELECT :v."ID", :v."TYPE", :v."DISTANCE" FOREACH v IN Vertices(:henl);
END;
-- Wrapper around SHORTEST_PATH_TO_ALL
CREATE OR REPLACE FUNCTION "SHORTEST_PATH_TO_TARGET_TYPE"(
IN "SOURCE" BIGINT, IN "TARGET_TYPE" NVARCHAR(100)
)
RETURNS TABLE ("ID" BIGINT, "TYPE" NVARCHAR(100), "DISTANCE" DOUBLE)
LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
CALL SHORTEST_PATH_TO_ALL(:SOURCE, o_table);
RETURN
WITH DESTINATIONS AS (
SELECT ID,
TYPE,
DISTANCE
FROM :o_table
WHERE NAME = :TARGET_TYPE
),
CLOSEST_VERTICES_OF_INTEREST AS (
SELECT MIN(DISTANCE) AS MIN_DIST
FROM DESTINATIONS
)
SELECT ID,
TYPE,
DISTANCE
FROM DESTINATIONS
JOIN CLOSEST_VERTICES_OF_INTEREST
ON DESTINATIONS.DISTANCE = CLOSEST_VERTICES_OF_INTEREST.MIN_DIST;
END;
How do I get this to perform better? Ultimately I want to do a lot of calls from a list of nodes to nodes of a certain type and get the shortest paths back.
Hej Duco,
when calling a graph procedure for the first time, an internal index is built. For subsequent calls, the index is used and hence query performance is much better. There are certain situations where the index is re-built. Changing a procedure is one of these situation. So if you work with HANA Cloud, I'd suggest to use "anonymous blocks" (DO() BEGIN...END; see the attached code) while in development/design phase. Once the logic is ok, turn the block into a "real" procedure.
For your second procedure (nearest vertex of a certain type), I'd suggest to do the filtering in the Graph Procedure. I don't see a reason why you should pass the complete SPOA result back to SQL. In the end, I guess your SQL statement in the RETURN clause is not optimal. In case you have "ties", i.e. neighbors with the same distance - do you want to return both?
Hope the attached code helps. It works on HANA Cloud QRC2nn-weigthed-and-filtered.txt.
Regards, Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi markus.fath
Thanks so mutch for such an elaborate answer!
Your answer confirms my hunge that when changing the procedure the graph needs to be reinitialized.
As for the supplied example for the "nearest vertex of a certain type" I have two questions. Hopefully you could push me in the right direction.
I unfortunately cannot work with anonymous blocks with Graph language 😞 So I need to make procedures.
I've written it as:
CREATE OR REPLACE PROCEDURE "SHORTEST_PATH_TO_ALL"(
IN i_startVertex BIGINT, -- the ID of the start vertex
IN i_k BIGINT, -- number OF nearest neighbors TO be returned
IN i_dir NVARCHAR(10), -- direction OF edges TO be traversed
IN i_maxDist DOUBLE, -- maximum distance, TO LIMIT the SEARCH SCOPE OF the SPOA agorithm
IN i_type NVARCHAR(100), -- the FILTER criteria OF the nearest neighbor(s) TO be found
OUT o_res TABLE("ID" BIGINT, "TYPE" NVARCHAR(100), "$DISTANCE" DOUBLE)
)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
GRAPH g = Graph("AMKV", "NEO4OJEE");
VERTEX v_start = Vertex(:g, :i_startVertex);
GRAPH g_spoa = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, "$DISTANCE",
(Edge e, DOUBLE distance, BIGINT hops) =>
DOUBLE{
IF(:distance <= :i_maxDist) { RETURN :e."SYSTEEMLENGTE"; }
ELSE { END TRAVERSE; }
}, :i_dir);
-- filter the SPOA vertice, and order by distance
SEQUENCE<Vertex> s_neighbors = SEQUENCE<Vertex>(v IN Vertices(:g_spoa) WHERE :v."TYPE" == :i_type) ORDER BY ("$DISTANCE" ASC);
FOREACH v IN :s_neighbors WITH ORDINALITY AS i {
IF(:i > :i_k){break;}
o_res."ID"[:i] = :v."ID";
o_res."TYPE"[:i] = :v."TYPE";
o_res."$DISTANCE"[:i] = :v."$DISTANCE";
}
END;
Could not execute 'CREATE OR REPLACE PROCEDURE "SHORTEST_PATH_TO_ALL"( IN i_startVertex BIGINT, -- the ID of the start ...'
SAP DBTech JDBC: [4901]: GraphEngine: compilation failed: exception 73002201: syntax error, unexpected "," near ",": line 19 col 5 (at pos 852)
This yield into a syntax error. I've had this before when implementing the SHORTEST_PATH procedure. When leaving out the :i_dir variable it usually works, but now gives the following compilation error:
Could not execute 'CREATE OR REPLACE PROCEDURE "SHORTEST_PATH_TO_ALL"( IN i_startVertex BIGINT, -- the ID of the start ...'
SAP DBTech JDBC: [4901]: GraphEngine: compilation failed: exception 73002201: Unexpected reserved keyword "ORDER": line 21 col 101 (at pos 1008)
This looks like the SQL-like language is not allowed in the graph-procedure.
Hej Duco,
which HANA version are you on? The code I gave you runs on HANA Cloud QRC2. If you are on an older release we need to see how we can adapt. I'll try to get you into a chat... let's see never used before.
Regards, Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
90 | |
10 | |
10 | |
10 | |
7 | |
7 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.