## SQL Graph Shortest Path, Part II

In my last blog entry I described an interesting problem I looked at where I wanted to determine the shortest path between two nodes (aka vertexes) of a graph whose representation is stored in SQL tables. Last time I defined two tables, tblNodes and tblEdges. In this entry I’ll describe the SQL stored procedure I came up with. Basically I took the Wikipedia shortest path algorithm and implemented that algorithm in SQL:

create procedure usp_shortestPath
@startNode int,
@endNode int
as
begin
set nocount on
create table #tblAllNodes
(
nodeid int not null,
distance int null,
previous int null,
done bit null
)

insert into #tblAllNodes(nodeid)
select distinct nodeid from tblNodes

update #tblAllNodes
set distance = 2147483647, previous = -1, done = 0

update #tblAllNodes
set distance = 0
where nodeid = @startNode

declare @dist int, @smallestDistance int, @nodeWithSmallestDistance int
declare @finished bit = 0

while @finished = 0
begin
select @nodeWithSmallestDistance = -1

select @smallestDistance = MIN(distance) from #tblAllNodes
where done = 0
select @nodeWithSmallestDistance = nodeid, @dist = distance
from #tblAllNodes
where distance = @smallestDistance and done = 0

if @nodeWithSmallestDistance = -1 break
if @nodeWithSmallestDistance = @endNode break

update #tblAllNodes
set done = 1 where nodeid = @nodeWithSmallestDistance

update #tblAllNodes
set distance = @dist + tblEdges.edgeWeight,
previous = @nodeWithSmallestDistancet
from #tblAllNodes
inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode
where tblEdges.fromNode = @nodeWithSmallestDistance
and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance
end

declare @result int
select @result = distance
from #tblAllNodes
where nodeid = @endNode
if @result = 2147483647
return -1
else
return @result
end
go

My stored procedure begins:

create procedure usp_shortestPath
@startNode int,
@endNode int
as
begin

The sp accepts a start node and an end node. See the previous blog entry for a description of these. Next:

set nocount on
create table #tblAllNodes
(
nodeid int not null,
distance int null,
previous int null,
done bit null
)

I set nocount on because I will be performing lots of inserts into a temp table and I want to suppress the "n rows affected" output. Temp table #tblAllNodes is a scratch table where most of the computations are performed. Column nodeid is any to-node in the graph. Distance is the cumulative shortest distance from startNode to nodeid. Previous is the node id of the node which comes before nodeid in the shortest path. Done is like a Boolean to flag nodes which have been processed in the main processing loop. I allow null values on distance, previous and done so that I can set the value of nodeid first and then later add values for those three columns. Next:

insert into #tblAllNodes(nodeid)
select distinct nodeid from tblNodes

update #tblAllNodes
set distance = 2147483647, previous = -1, done = 0

update #tblAllNodes
set distance = 0
where nodeid = @startNode

First I populate the temp table with node ids, then I set initial values for distance, previous, and done. The 2147483647 is SQL max int and represents "infinity" in the Wikipedia algorithm. Then I initialize the start node, setting its distance to 0 because the distance from start node to itself is 0. Next:

declare @dist int, @smallestDistance int,
@nodeWithSmallestDistance int
declare @finished bit = 0

These are scratch variables where I will compute before placing their final values into the temp table. The @nodeWithSmallestDistance variable is called "u" in the Wikipedia algorithm. Next:

while @finished = 0
begin
select @nodeWithSmallestDistance = -1

select @smallestDistance = MIN(distance) from #tblAllNodes
where done = 0
select @nodeWithSmallestDistance = nodeid, @dist = distance
from #tblAllNodes
where distance = @smallestDistance and done = 0
I enter a main processing loop. The first statement assigns -1 to "u". Then I find the smallest distance from the current node to all nodes which have not ywt been processed, and then use that distance to lookup the associated node id. Next:

if @nodeWithSmallestDistance = -1 break
if @nodeWithSmallestDistance = @endNode break

update #tblAllNodes
set done = 1 where nodeid = @nodeWithSmallestDistance

I check for two stopping conditions, first if I did not find a valid node or scond if I am at my end node. Otherwise I flag the "u" node as having been processed, which is "remove ‘u’ from queue" in the Wikipedia algorithm. Next:
update #tblAllNodes
set distance = @dist + tblEdges.edgeWeight,
previous = @nodeWithSmallestDistancet
from #tblAllNodes
inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode
where tblEdges.fromNode = @nodeWithSmallestDistance
and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance
end –- end while

This is the tricky part and took me a while to figure out. I perform a join on the underlyingb tblEdges table and the temp table. The new distance is the old distance plus the just looked-up distance. The last "and" clause is to make sure that an update occurs only when a shorter path has been found. The stored proc finishes with:

declare @result int
select @result = distance
from #tblAllNodes
where nodeid = @endNode
if @result = 2147483647
return -1
else
return @result
end
The -1 return means no path was found from startNode to endNode. The sp can be called like so:

declare @result int
exec @result = usp_shortestPath 0, 2
print ‘Shortest path from node 0 to node 2 is’
print @result

And the output would be "Shortest path from node 0 to node 2 is 36."