SQL Graph Shortest Path, Part I

An interesting problem I looked at lately is the problem of determining the shortest path between two nodes (aka vertexes) of a graph whose representation is stored in SQL tables. This turns out to be a moderately difficult problem. Suppose I have a conceptual graph as shown in the image below. There are 6 nodes. Node 0 has a directed connection to node 1, with weight 10. The connection from node 1 to node 0 has weight 18, and so on. Note that there is a connection from node 1 to node 4 (weight 12) but that there is no connection from node 4 to node 1. The first issue is to decide on a way to represent the graph as one or more SQL tables. This in itself is not a trivial problem, but at any rate I used the following SQL statements:
use master
if exists(select name from sys.sysdatabases where name=’dbDemoGraph’)
drop database dbDemoGraph
create database dbDemoGraph
use dbDemoGraph
create table tblNodes
nodeid int primary key,
userid bigint not null unique
create index idxTblNodesByColUserid on tblNodes(userid)
insert into tblNodes values(0,000000000) — node id, userid
insert into tblNodes values(1,111111111)
insert into tblNodes values(2,222222222)
insert into tblNodes values(3,333333333)
insert into tblNodes values(4,444444444)
insert into tblNodes values(5,555555555)
insert into tblNodes values(6,666666666)
select * from tblNodes order by nodeid
create table tblEdges
fromNode int not null, — a nodeid
toNode int not null, — a nodeid
edgeWeight int not null — some arbitrary measure
insert into tblEdges values(0,1,10) — from node 0 to node 1, weight 10
insert into tblEdges values(1,0,18) — from node 1 to node 0, weight 18
insert into tblEdges values(1,2,30) — from 1 to 2, weight 11
insert into tblEdges values(2,1,24) — etc.
insert into tblEdges values(1,3,20)
insert into tblEdges values(3,1,20)
insert into tblEdges values(1,4,12)
— cannot go from 4 to 1
— cannot go from 2 to 4
insert into tblEdges values(4,2,14)
insert into tblEdges values(2,5,16)
insert into tblEdges values(5,2,26)
I use two tables. The first table, tblNodes, has a 0-based node ID like the ones shown in the image. There is also a "User ID" value, which is a way to map the general graph to some specific data. In this situation I am imagining that each node represents a user of some sort. The second table, tblEdges, stores the relationships between nodes, including edge weights which could represent something like the number of times a user sends a text message to another user.
Next time I’ll describe a SQL stored procedure which accepts a start node ID, and an end node ID, and which determines the shortest path (based on edge weights) between the two.
This entry was posted in Software Test Automation. Bookmark the permalink.