Speed up SAP HANA graph query

former_member756662 · ‎07-14-2021

Hi,

I'm working with a graph database in sap hana and i've notices that performance isn't (consistently) great.

I've defined my edges, vertices and graph as:

CREATE COLUMN TABLE "VERTICES" (
	"ID" BIGINT PRIMARY KEY,
	"DIRECT_ID" BIGINT,
	"TYPE" NVARCHAR(100)
);

CREATE TABLE "EDGES" (
		"ID" BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
		"DIRECT_ID" BIGINT,
		"TYPE" NVARCHAR(255),
		
		"SOURCE" BIGINT REFERENCES "VERTICES"("ID") ON DELETE CASCADE NOT NULL,
		"TARGET" BIGINT REFERENCES "VERTICES"("ID") ON DELETE CASCADE NOT NULL
);

CREATE GRAPH WORKSPACE "GRAPH"
	EDGE TABLE "EDGES"
		SOURCE COLUMN "SOURCE"
		TARGET COLUMN "TARGET"
		KEY COLUMN "ID"
	VERTEX TABLE "VERTICES" 
		KEY COLUMN "ID"

It consists or ~30M vertices and 57M edges. The edges are almost always double between two nodes: one edge (n1)-(n2) and another edge (n2)-(n1).

The problems I want to tackle are:

The shortest (weighted) path between two vertices in the graph (one-to-one)
The shortest path between a known vertex and a vertex of a specific type

I've managed to get both questions answered, buy performance is not what I expect. Sometimes the one-to-one is lightning fast, and sometimes it take 1,30 minutes to do the same job.

How I managed to get the known one-to-one shortest path. This take 1,5 when call the procedure the first time. Calling it again with the same or some different node(s) is lightning fast.

CREATE TYPE "TT_VERTICES_SPOTO" AS TABLE ("ID" BIGINT, "TYPE" NVARCHAR(100), VERTEX_ORDER BIGINT);
CREATE OR REPLACE PROCEDURE "SHORTEST_ONE_TO_ONE"(
	IN i_startVertex BIGINT, 		-- the ID of the start vertex
	IN i_endVertex BIGINT, 			-- the ID of the end vertex
	IN i_dir NVARCHAR(10), 			-- the the direction of the edge traversal: OUTGOING (default), INCOMING, ANY
	OUT o_pathLength BIGINT,		-- the hop distance between start and end
	OUT o_pathWeight DOUBLE,	-- the path weight/cost based on the WEIGHT attribute
	OUT o_vertices "TT_VERTICES_SPOTO"
	)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
	-- Create an instance of the graph, refering to the graph workspace object
	GRAPH g = Graph("GRAPH");


	-- Create an instance of the start/end vertex
	VERTEX v_start = Vertex(:g, :i_startVertex);
	VERTEX v_end = Vertex(:g, :i_endVertex);
	-- Running shortest path using the WEIGHT column as cost
	WeightedPath<DOUBLE> p = Shortest_Path(:g, :v_start, :v_end, (Edge e) => DOUBLE{ return :e."SYSTEEMLENGTE"; });
	
	-- Project the results from the path
	o_pathLength = LENGTH(:p);
	o_pathWeight = WEIGHT(:p);
	o_vertices = SELECT :v."ID", :v."TYPE", :VERTEX_ORDER FOREACH v IN Vertices(:p) WITH ORDINALITY AS VERTEX_ORDER;
END;

How I've managed to get the known vertex to vertex of a certain type. This always takes about ~1,5 to 2,5 minutes

CREATE TYPE "TT_VERTICES_SPTT" AS TABLE ("ID" BIGINT, "TYPE" VNARCHAR(100), "DISTANCE" DOUBLE);
CREATE OR REPLACE PROCEDURE "SHORTEST_PATH_TO_ALL"(
	IN i_startVertex BIGINT, 		-- the ID of the start vertex
	OUT o_vertices "TT_VERTICES_SPTT"
	)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
	-- Create an instance of the graph, refering to the graph workspace object
	GRAPH g = Graph("GRAPH");

	-- Create an instance of the start/end vertex
	VERTEX v_start = Vertex(:g, :i_startVertex);

	GRAPH henl = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, "DISTANCE", (Edge e) => DOUBLE{ return :e."SYSTEEMLENGTE";});


	o_vertices = SELECT :v."ID", :v."TYPE", :v."DISTANCE" FOREACH v IN Vertices(:henl);
END;

-- Wrapper around SHORTEST_PATH_TO_ALL
CREATE OR REPLACE FUNCTION "SHORTEST_PATH_TO_TARGET_TYPE"(
	IN "SOURCE" BIGINT, IN "TARGET_TYPE" NVARCHAR(100)
	)
	RETURNS TABLE ("ID" BIGINT, "TYPE" NVARCHAR(100), "DISTANCE" DOUBLE)
LANGUAGE SQLSCRIPT READS SQL DATA AS


BEGIN
	CALL SHORTEST_PATH_TO_ALL(:SOURCE, o_table);
	RETURN
		WITH DESTINATIONS AS (
				SELECT 	ID,
						TYPE,
						DISTANCE
				FROM :o_table
				WHERE NAME = :TARGET_TYPE
			),
			
			CLOSEST_VERTICES_OF_INTEREST AS (
				SELECT MIN(DISTANCE) AS MIN_DIST
				FROM DESTINATIONS
			)
		SELECT 	ID,
				TYPE,
				DISTANCE
		FROM DESTINATIONS
		JOIN CLOSEST_VERTICES_OF_INTEREST
			ON DESTINATIONS.DISTANCE = CLOSEST_VERTICES_OF_INTEREST.MIN_DIST;
END;

How do I get this to perform better? Ultimately I want to do a lot of calls from a list of nodes to nodes of a certain type and get the shortest paths back.

mfath · ‎07-14-2021

Hej Duco,

when calling a graph procedure for the first time, an internal index is built. For subsequent calls, the index is used and hence query performance is much better. There are certain situations where the index is re-built. Changing a procedure is one of these situation. So if you work with HANA Cloud, I'd suggest to use "anonymous blocks" (DO() BEGIN...END; see the attached code) while in development/design phase. Once the logic is ok, turn the block into a "real" procedure.

For your second procedure (nearest vertex of a certain type), I'd suggest to do the filtering in the Graph Procedure. I don't see a reason why you should pass the complete SPOA result back to SQL. In the end, I guess your SQL statement in the RETURN clause is not optimal. In case you have "ties", i.e. neighbors with the same distance - do you want to return both?

Hope the attached code helps. It works on HANA Cloud QRC2nn-weigthed-and-filtered.txt.

Regards, Markus

mfath · ‎07-14-2021

Hej Duco,

which HANA version are you on? The code I gave you runs on HANA Cloud QRC2. If you are on an older release we need to see how we can adapt. I'll try to get you into a chat... let's see never used before.

Regards, Markus

Speed up SAP HANA graph query

Accepted Solutions (0)

Answers (2)

Answers (2)

AFIP Integration in CPI/CI

Re: OS compatibility

OS compatibility

Re: How to show value help with values for Applica...

Re: Allocation of multiple hierarchies from parent...