I wrote an article titled “Graph-Based Shortest-Path Analysis with SQL” in the December 2012 issue of MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/jj863138.aspx. The shortest path problem is to determine the shortest path between two nodes in a graph. In the early days of computer science the problem was widely studied. The most well-known algorithm for solving the shortest path problem is called Dijkstra’s algorithm. Most implementations and examples of the shortest path algorithm assume that the graph being analyzed is relatively small (roughly 1,000 nodes or fewer) and so the graph can be stored entirely in machine main memory as a matrix or adjacency list.
Applying a shortest path algorithm to a very large graph, such as one representing a social network with hundreds of thousands of nodes, is quite a challenging problem. In general, I use one of three approaches. The first approach, which I describe in the MSDN article, is to store the graph as a SQL table, and then write a shortest-path stored procedure in the SQL language. An alternative approach is to store the graph in SQL, then write a procedural language shortest path function that pulls graph information into memory as needed. A third approach is to store the graph in SQL, then write a stored procedure using a procedural language (C# is my preferred language) to find the shortest path. The pure-SQL approach is by far the simplest but is usually the slowest. The other two approaches increase in complexity but improve in performance.