SQL Graph Shortest Path, Part II

In my last blog entry I described an interesting problem I looked at where I wanted to determine the shortest path between two nodes (aka vertexes) of a graph whose representation is stored in SQL tables. Last time I defined two tables, tblNodes and tblEdges. In this entry I’ll describe the SQL stored procedure I came up with. Basically I took the Wikipedia shortest path algorithm and implemented that algorithm in SQL:
 
create procedure usp_shortestPath
  @startNode int,
  @endNode int
as
begin
  set nocount on   
  create table #tblAllNodes
  (
  nodeid int not null,
  distance int null,
  previous int null,
  done bit null
  )
   
  insert into #tblAllNodes(nodeid)
  select distinct nodeid from tblNodes
 
  update #tblAllNodes
  set distance = 2147483647, previous = -1, done = 0
 
  update #tblAllNodes
  set distance = 0
  where nodeid = @startNode
 
  declare @dist int, @smallestDistance int, @nodeWithSmallestDistance int
  declare @finished bit = 0
 
  while @finished = 0
  begin
    select @nodeWithSmallestDistance = -1
   
    select @smallestDistance = MIN(distance) from #tblAllNodes
    where done = 0
    select @nodeWithSmallestDistance = nodeid, @dist = distance
    from #tblAllNodes
    where distance = @smallestDistance and done = 0
       
    if @nodeWithSmallestDistance = -1 break
    if @nodeWithSmallestDistance = @endNode break
   
    update #tblAllNodes
    set done = 1 where nodeid = @nodeWithSmallestDistance
       
    update #tblAllNodes
    set distance = @dist + tblEdges.edgeWeight,
    previous = @nodeWithSmallestDistancet
    from #tblAllNodes
    inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode
    where tblEdges.fromNode = @nodeWithSmallestDistance
    and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance
  end
   
  declare @result int
  select @result = distance
  from #tblAllNodes
  where nodeid = @endNode
  if @result = 2147483647
    return -1
  else
    return @result 
end
go
 
 
My stored procedure begins:
 
  create procedure usp_shortestPath
    @startNode int,
    @endNode int
  as
  begin
 
The sp accepts a start node and an end node. See the previous blog entry for a description of these. Next:
 
  set nocount on   
  create table #tblAllNodes
  (
  nodeid int not null,
  distance int null,
  previous int null,
  done bit null
  )
 
I set nocount on because I will be performing lots of inserts into a temp table and I want to suppress the "n rows affected" output. Temp table #tblAllNodes is a scratch table where most of the computations are performed. Column nodeid is any to-node in the graph. Distance is the cumulative shortest distance from startNode to nodeid. Previous is the node id of the node which comes before nodeid in the shortest path. Done is like a Boolean to flag nodes which have been processed in the main processing loop. I allow null values on distance, previous and done so that I can set the value of nodeid first and then later add values for those three columns. Next:
 
  insert into #tblAllNodes(nodeid)
  select distinct nodeid from tblNodes
 
  update #tblAllNodes
  set distance = 2147483647, previous = -1, done = 0
 
  update #tblAllNodes
  set distance = 0
  where nodeid = @startNode
 
First I populate the temp table with node ids, then I set initial values for distance, previous, and done. The 2147483647 is SQL max int and represents "infinity" in the Wikipedia algorithm. Then I initialize the start node, setting its distance to 0 because the distance from start node to itself is 0. Next:
 
  declare @dist int, @smallestDistance int,
  @nodeWithSmallestDistance int
  declare @finished bit = 0
 
These are scratch variables where I will compute before placing their final values into the temp table. The @nodeWithSmallestDistance variable is called "u" in the Wikipedia algorithm. Next:
 
  while @finished = 0
  begin
    select @nodeWithSmallestDistance = -1
   
    select @smallestDistance = MIN(distance) from #tblAllNodes
    where done = 0
    select @nodeWithSmallestDistance = nodeid, @dist = distance
    from #tblAllNodes
    where distance = @smallestDistance and done = 0
I enter a main processing loop. The first statement assigns -1 to "u". Then I find the smallest distance from the current node to all nodes which have not ywt been processed, and then use that distance to lookup the associated node id. Next:
 
    if @nodeWithSmallestDistance = -1 break
    if @nodeWithSmallestDistance = @endNode break
   
    update #tblAllNodes
    set done = 1 where nodeid = @nodeWithSmallestDistance
 
I check for two stopping conditions, first if I did not find a valid node or scond if I am at my end node. Otherwise I flag the "u" node as having been processed, which is "remove ‘u’ from queue" in the Wikipedia algorithm. Next:
    update #tblAllNodes
    set distance = @dist + tblEdges.edgeWeight,
    previous = @nodeWithSmallestDistancet
    from #tblAllNodes
    inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode
    where tblEdges.fromNode = @nodeWithSmallestDistance
    and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance
  end –- end while
 
This is the tricky part and took me a while to figure out. I perform a join on the underlyingb tblEdges table and the temp table. The new distance is the old distance plus the just looked-up distance. The last "and" clause is to make sure that an update occurs only when a shorter path has been found. The stored proc finishes with:
 
  declare @result int
  select @result = distance
  from #tblAllNodes
  where nodeid = @endNode
  if @result = 2147483647
    return -1
  else
    return @result 
end
The -1 return means no path was found from startNode to endNode. The sp can be called like so:
 
  declare @result int
  exec @result = usp_shortestPath 0, 2
  print ‘Shortest path from node 0 to node 2 is’
  print @result
 
And the output would be "Shortest path from node 0 to node 2 is 36."
 
About these ads
This entry was posted in Software Test Automation. Bookmark the permalink.