In my last blog entry I described an interesting problem I looked at where I wanted to determine the shortest path between two nodes (aka vertexes) of a graph whose representation is stored in SQL tables. Last time I defined two tables, tblNodes and tblEdges. In this entry I’ll describe the SQL stored procedure I came up with. Basically I took the Wikipedia shortest path algorithm and implemented that algorithm in SQL:

create procedure usp_shortestPath

@startNode int,

@endNode int

as

begin

set nocount on

create table #tblAllNodes

(

nodeid int not null,

distance int null,

previous int null,

done bit null

)

insert into #tblAllNodes(nodeid)

select distinct nodeid from tblNodes

update #tblAllNodes

set distance = 2147483647, previous = -1, done = 0

update #tblAllNodes

set distance = 0

where nodeid = @startNode

declare @dist int, @smallestDistance int, @nodeWithSmallestDistance int

declare @finished bit = 0

while @finished = 0

begin

select @nodeWithSmallestDistance = -1

select @smallestDistance = MIN(distance) from #tblAllNodes

where done = 0

select @nodeWithSmallestDistance = nodeid, @dist = distance

from #tblAllNodes

where distance = @smallestDistance and done = 0

if @nodeWithSmallestDistance = -1 break

if @nodeWithSmallestDistance = @endNode break

update #tblAllNodes

set done = 1 where nodeid = @nodeWithSmallestDistance

update #tblAllNodes

set distance = @dist + tblEdges.edgeWeight,

previous = @nodeWithSmallestDistancet

from #tblAllNodes

inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode

where tblEdges.fromNode = @nodeWithSmallestDistance

and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance

end

declare @result int

select @result = distance

from #tblAllNodes

where nodeid = @endNode

@startNode int,

@endNode int

as

begin

set nocount on

create table #tblAllNodes

(

nodeid int not null,

distance int null,

previous int null,

done bit null

)

insert into #tblAllNodes(nodeid)

select distinct nodeid from tblNodes

update #tblAllNodes

set distance = 2147483647, previous = -1, done = 0

update #tblAllNodes

set distance = 0

where nodeid = @startNode

declare @dist int, @smallestDistance int, @nodeWithSmallestDistance int

declare @finished bit = 0

while @finished = 0

begin

select @nodeWithSmallestDistance = -1

select @smallestDistance = MIN(distance) from #tblAllNodes

where done = 0

select @nodeWithSmallestDistance = nodeid, @dist = distance

from #tblAllNodes

where distance = @smallestDistance and done = 0

if @nodeWithSmallestDistance = -1 break

if @nodeWithSmallestDistance = @endNode break

update #tblAllNodes

set done = 1 where nodeid = @nodeWithSmallestDistance

update #tblAllNodes

set distance = @dist + tblEdges.edgeWeight,

previous = @nodeWithSmallestDistancet

from #tblAllNodes

inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode

where tblEdges.fromNode = @nodeWithSmallestDistance

and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance

end

declare @result int

select @result = distance

from #tblAllNodes

where nodeid = @endNode

if @result = 2147483647

return -1

else

return @result

end

go

return -1

else

return @result

end

go

My stored procedure begins:

create procedure usp_shortestPath

@startNode int,

@endNode int

as

begin

@startNode int,

@endNode int

as

begin

The sp accepts a start node and an end node. See the previous blog entry for a description of these. Next:

set nocount on

create table #tblAllNodes

(

nodeid int not null,

distance int null,

previous int null,

done bit null

)

create table #tblAllNodes

(

nodeid int not null,

distance int null,

previous int null,

done bit null

)

I set nocount on because I will be performing lots of inserts into a temp table and I want to suppress the "n rows affected" output. Temp table #tblAllNodes is a scratch table where most of the computations are performed. Column nodeid is any to-node in the graph. Distance is the cumulative shortest distance from startNode to nodeid. Previous is the node id of the node which comes before nodeid in the shortest path. Done is like a Boolean to flag nodes which have been processed in the main processing loop. I allow null values on distance, previous and done so that I can set the value of nodeid first and then later add values for those three columns. Next:

insert into #tblAllNodes(nodeid)

select distinct nodeid from tblNodes

update #tblAllNodes

set distance = 2147483647, previous = -1, done = 0

update #tblAllNodes

set distance = 0

where nodeid = @startNode

select distinct nodeid from tblNodes

update #tblAllNodes

set distance = 2147483647, previous = -1, done = 0

update #tblAllNodes

set distance = 0

where nodeid = @startNode

First I populate the temp table with node ids, then I set initial values for distance, previous, and done. The 2147483647 is SQL max int and represents "infinity" in the Wikipedia algorithm. Then I initialize the start node, setting its distance to 0 because the distance from start node to itself is 0. Next:

declare @dist int, @smallestDistance int,

@nodeWithSmallestDistance int

@nodeWithSmallestDistance int

declare @finished bit = 0

These are scratch variables where I will compute before placing their final values into the temp table. The @nodeWithSmallestDistance variable is called "u" in the Wikipedia algorithm. Next:

while @finished = 0

begin

select @nodeWithSmallestDistance = -1

select @smallestDistance = MIN(distance) from #tblAllNodes

where done = 0

begin

select @nodeWithSmallestDistance = -1

select @smallestDistance = MIN(distance) from #tblAllNodes

where done = 0

select @nodeWithSmallestDistance = nodeid, @dist = distance

from #tblAllNodes

where distance = @smallestDistance and done = 0

from #tblAllNodes

where distance = @smallestDistance and done = 0

I enter a main processing loop. The first statement assigns -1 to "u". Then I find the smallest distance from the current node to all nodes which have not ywt been processed, and then use that distance to lookup the associated node id. Next:

if @nodeWithSmallestDistance = -1 break

if @nodeWithSmallestDistance = @endNode break

update #tblAllNodes

set done = 1 where nodeid = @nodeWithSmallestDistance

if @nodeWithSmallestDistance = @endNode break

update #tblAllNodes

set done = 1 where nodeid = @nodeWithSmallestDistance

I check for two stopping conditions, first if I did not find a valid node or scond if I am at my end node. Otherwise I flag the "u" node as having been processed, which is "remove ‘u’ from queue" in the Wikipedia algorithm. Next:

update #tblAllNodes

set distance = @dist + tblEdges.edgeWeight,

previous = @nodeWithSmallestDistancet

from #tblAllNodes

inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode

where tblEdges.fromNode = @nodeWithSmallestDistance

and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance

end –- end while

set distance = @dist + tblEdges.edgeWeight,

previous = @nodeWithSmallestDistancet

from #tblAllNodes

inner join tblEdges on #tblAllNodes.nodeid = tblEdges.toNode

where tblEdges.fromNode = @nodeWithSmallestDistance

and (@dist + tblEdges.edgeWeight) < #tblAllNodes.distance

end –- end while

This is the tricky part and took me a while to figure out. I perform a join on the underlyingb tblEdges table and the temp table. The new distance is the old distance plus the just looked-up distance. The last "and" clause is to make sure that an update occurs only when a shorter path has been found. The stored proc finishes with:

declare @result int

select @result = distance

from #tblAllNodes

where nodeid = @endNode

select @result = distance

from #tblAllNodes

where nodeid = @endNode

if @result = 2147483647

return -1

else

return @result

end

return -1

else

return @result

end

The -1 return means no path was found from startNode to endNode. The sp can be called like so:

declare @result int

exec @result = usp_shortestPath 0, 2

print ‘Shortest path from node 0 to node 2 is’

print @result

exec @result = usp_shortestPath 0, 2

print ‘Shortest path from node 0 to node 2 is’

print @result

And the output would be "Shortest path from node 0 to node 2 is 36."

Advertisements
(function(g,$){if("undefined"!=typeof g.__ATA){
g.__ATA.initAd({collapseEmpty:'after', sectionId:26942, width:300, height:250});
g.__ATA.initAd({collapseEmpty:'after', sectionId:114160, width:300, height:250});
}})(window,jQuery);
var o = document.getElementById('crt-806194359');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-806194359",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-806194359'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); }
});
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}