Fast Shortest Path Analysis for Large Graphs using a CLR Stored Procedure

I’ve been working on some really fascinating code lately. The main goal is to create a desktop-type application that calculates the shortest path between two nodes in a very large graph. The nodes represent people in a social network and the edges between nodes represent a communication of some sort. By large graph, I mean a graph that has millions of nodes/people and so the graph won’t fit into machine memory; therefore, in practical terms, the graph is stored in a SQL database.

My first experiment was to write a SQL stored procedure using the T-SQL language (I’m working in a 100% Microsoft technology environment). Because there is a ton of processing involved (I’m using the Dijkstra Shortest Path algorithm) and T-SQL is not really designed to do such processing, the performance was not so good — for my baseline test case with 5,000,000 nodes, computing the shortest path took 25 minutes.

My second experiment was to write a C# method. The C# method did all of the algorithm processing and called a helper function named GetAdjacents that used ADO.NET to retrieve all the nodes in the graph connected to a given node (the key data/graph access part of Dijksta). Performance was quite good; the baseline case took about 90 seconds. But now I was hooked into this problem and wanted to do better.

My next experiment tried several code trick tweaks to the second experiment. I discovered that the recommended approach of using the C# ‘using’ statement, like

using (SqlConnection sc = new SqlConnection(connString))

had a pretty significant bad impact on performance. Not sure why this is so but I ditched using the using statement and explicitly opened and closed connections. I was able to squeeze about 10% improvement in performance, so my baseline test case took about 80 seconds.

I was ready to move on when I recalled that the .NET environment allows you to write stored procedures using C#. These are called CLR Stored Procedures. Maybe you recall that it’s possible to write a SQL ‘extended’ stored procedure using C++ (which is a real pain in the patooty). I wondered if by pushing both the data access and the logic processing into a CLR Stored Procedure, and then calling the stored procedure from a C# helper method, I could improve shortest path performance. Well, the net results were excellent: my baseline test case run time was reduced to about 13.25 seconds.

To summarize, when using Microsoft technologies, a technique to do very fast shortest path analysis for a large graph stored in SQL is to write a CLR Stored Procedure using C# that does both the data access and logic processing and then call the resulting stored procedure using a C# helper method. The advantage is very fast performance. The disadvantage of the approach is that the technical details are very tricky. If I find time, I’ll write up the details of this technique.

This entry was posted in Software Test Automation. Bookmark the permalink.

One Response to Fast Shortest Path Analysis for Large Graphs using a CLR Stored Procedure

  1. Pingback: Five Blogs – 27 May 2012 « 5blogs

Comments are closed.