Two Excellent .NET Developer Conferences

There are two excellent conferences coming up for software developers who use Microsoft .NET technology. By sheer coincidence (I asked the organizers), both conferences will be during the same week, March 26 through March 30, in Las Vegas, Nevada. The two conferences are DevConnections (http://www.devconnections.com/home.aspx) and Visual Studio Live (http://vslive.com/Home.aspx). I’ve spoken at and attended both of these conferences in the past and can recommend both. The best way for you to get a feel for what they’re like, and how they differ from each other, is to go to the conferences Web sites and examine the schedules and descriptions of talks. I will be presenting short out-of-band talks at both events. My talks will be about artificial intelligence techniques — things that are probably not immediately useful to most developers but topics that many developers find quite interesting. By the way, my recommendation of these two conferences is not based on the fact that I’ll be speaking at them — I wouldn’t be speaking at the events if I didn’t believe they delivered a good product to attendees. My two talks will be on “Simulated Bee Colony Algorithms” and “Particle Swarm Optimization”. Check out the conference sites and if you make it out to Las Vegas be sure to track me down and say hello.

Posted in Software Test Automation | Leave a comment

Demystifying the C# Yield-Return Mechanism

In the current February 2012 issue of Visual Studio Magazine, I wrote an article titled “Demystifying the C# Yield-Return Mechanism”. My article starts:

I remember when the C# language yield-return statement was released as part of C# 2.0 along with Visual Studio 2005. Early documentation for the yield-return mechanism made the statement sound exotic and mysterious.   Recent discussions with some of my developer colleagues suggested that several years later now the yield-return statement isn’t used very often. We suspect this lack of use is not so much because developers don’t know how to use the yield-return mechanism, but rather mostly because developers aren’t quite sure exactly when to use yield-return. In this article I’ll describe three common programming scenarios where you might want to consider using yield-return:    1. Generating a sequence of strings where each string in the sequence depends on the value of the previous string. 2. Processing a text file sequentially where different kinds of lines are treated differently. 3. Sequentially filtering or modifying a large List collection of objects.

You can read the rest of the article online at http://visualstudiomagazine.com/articles/2012/02/01/demystifying-the-c-yield-return-mechanism.aspx

Posted in Software Test Automation | Leave a comment

Ant Colony Optimization

In the February 2012 issue of MSDN Magazine I wrote an article that presents and explains Ant Colony Optimization (ACO). See http://msdn.microsoft.com/en-us/magazine/hh781027.aspx. ACO is a meta-heuristic (set of guidelines that can be used to create a specific algorithm) which is intended to solve network graph problems such as the Traveling Salesman Problem (TSP) where the goal is to find the shortest path that visits each city exactly once. ACO models the pheromone laying behavior of ants. Each ant represents a possible solution. As each any visits a node in the graph, it lays an amount of pheromone on the node. Shorter paths are visited more often and so more pheromones are deposited on nodes which are parts of shorter paths. When each ant picks a semi-random path through the graph, at each node the choice of next node to take is influenced by the pheromones on the candidate nodes: nodes with more pheromones are more likely to be selected. Implementing ACO is surprisingly tricky. ACO is a kind of combinatorial optimization technique — a technique to find the best combination from a large set of possible combinations. In the TSP, each possible path is a combination of cities. I’ve seen ACO stretched and applied to all kinds of problems, but in my opinion ACO is best suited for problems that very closely resemble TSP while other combinatorial optimization technique such as Simulated Annealing and Simulated Bee Colony algorithms are better suited for more general types of problems.

Posted in Software Test Automation

Custom Latitude Longitude Index Interval using Binary Search

This week I was working on an interesting problem and found an interesting bug. The overall problem was to create a custom indexing scheme for latitude and longitude data by writing a function that accepts a lat-lon and returns a sector ID. As part of that problem I wanted to write a helper function that accepts a latitude (which ranges from -90.0 to +90.0) and returns which 0.1 degree interval the latitude falls in. Let me illustrate using a smaller fake example where “latitude” ranges from -1.0 to +10.0. For this example I want the following:

[-1.0 to -0.9) -> 0
[-0.9 to -0.8) -> 1
[-0.8 to -0.7) -> 2
. . .
[-0.1 to 0.0) -> 9
[ 0.0 to 0.1) -> 10
[ 0.1 to 0.2) -> 11
. . .
[ 0.9 to 1.0] -> 19

Notice that all but the last interval are open intervals, and the last is closed to capture the final 1.0 value. If an input is 0.245 then the output should be 12. I could do a sequential search but I wanted to use a binary search because I was going to call the function millions of times. My first attempt was this:

static int LatIndexOf(double latitude) // small experiment
{
// input is a fake latitude in [-1.0 +1.0]
// divide 0.1 degree intervals: [-1.0 -0.9) = 0, etc.

if (latitude == -1.0) return 0;
if (latitude == 1.0) return 19;

int lo = 0; //
int hi = 19;
int mid = (lo + hi) / 2;

bool found = false;
while (found == false)
{
double left = -1.0 + (0.1 * mid); // left part of interval
double right = left + 0.1; // right part of interval
if (latitude >= left && latitude < right)
return mid;
else if (latitude < left)
{
hi = mid – 1; mid = (lo + hi) / 2;
}
else
{
lo = mid + 1; mid = (lo + hi) / 2;
}
}
throw new Exception("LatIndexOf no value");
}

Because I was going to process hundreds of millions of real lat-lon data, I took the time to run pretty thorough tests on my function. As first written I ran into very nasty floating point errors that gave incorrect results for certain inputs. The hacky-fix was to round all floating point values:

static int LatIndexOf(double latitude)
{
latitude = Math.Round(latitude, 8); // round to 8 places
. . .

double left = -1.0 + (0.1 * mid);
left = Math.Round(left, 8);
. . .
double right = left + 0.1;
right = Math.Round(right, 8);
. . .

Moral: watch out for equality comparisons on floating point values.

Posted in Software Test Automation

Calculating Category Utility

Category Utility (CU) is a clever measure of how good a clustering of categorical data is. Here’s an example of how to calculate category utility. Suppose you have three attributes, color, size, tax. Color can be red, blue, green, or yellow. Size can be small, medium, or large. Tax can be false or true. Let’s say you have five tuples and cluster them into two parts, k= 0 and k = 1, like so:

———–
Red Small True
Red Large False
———–
Blue Medium True
Green Medium True
Green Medium False
———–

Step 1 – Calculate the probability of each cluster.

P(k = 0) = 2/5 = 0.40
P(k = 1) = 3/5 = 0.60

Step 2 – Calculate the unconditional expectation = sum of squared probabilities of all attribute values across all clusters.

Red (2/5)^2 = 0.16
Blue (1/5)^2 = 0.04
Green (2/5)^2 = 0.16
Yellow (0/5)^2 = 0.00

Small (1/5)^2 = 0.04
Medium (3/5)^2 = 0.36
Large (1/5)^2 = 0.04

False (2/5)^2 = 0.16
True (3/5)^2 = 0.36
—-
1.32

Step 3 – Calculate conditional expectations for each cluster.

A. k = 0

Red (2/2)^2 = 1.00
Blue (0/2)^2 = 0.00
Green (0/2)^2 = 0.00
Yellow (0/2)^2 = 0.00

Small (1/2)^2 = 0.25
Medium (0/2)^2 = 0.00
Large (1/2)^2 = 0.25

False (1/2)^2 = 0.25
True (1/2)^2 = 0.25
—-
2.00
B. k = 1

Red (0/3)^2 = 0.00
Blue (1/3)^2 = 0.11
Green (2/3)^2 = 0.44
Yellow (0/3)^2 = 0.00

Small (0/3)^2 = 0.00
Medium (3/3)^2 = 1.00
Large (0/3)^2 = 0.00

False (1/3)^2 = 0.11
True (2/3)^2 = 0.44
—-
2.11

Step 4 – Put it all together.

CU = (0.40 * (2.00 – 1.32)) + (0.60 * (2.11 – 1.32)) / 2
= 0.3733

The equation for the last step is too tricky for me to type out. If you search the Internet for Category Utility, you’ll find the equation in a Wikipedia entry. Coding up a routine to compute category utility is surprisingly tricky. See image below for a demo example.

Posted in Software Test Automation

Creating a Super Simple COM Object

Every now and then I teach a beginning API test automation class. As part of that class I like to walk students through the process of creating a super simple COM object to test. The idea is to show students some of the magic behind the scenes which I believe will help them better understand the test automation. Here are the steps for creating a super simple COM object named MathCOMLib housed in file MathCOMLib.dll which contains an Interface named IMathMethods which is exposed as a class MathClass which contains a single method Sum(int x, int y) that returns the sum of integers x and y. It’s surprisingly tricky in the sense that one bad mouse click means disaster.

Launch an instance of Visual Studio being sure to be using “Run as administrator”. I’m using VS 2010 but most recent versions of VS work the same to create a COM component. Click File | New | Project. Select the C++ | ATL project template, name it MathCOMLib (the name of the project will become the name of the resulting DLL), specify any convenient Location, clear the “Create directory for solution” and “Add to source control” check boxes, and click OK.

In the Solution Explorer window, right click on the bolded MathCOMLib project (be careful to get the project, not the solution entry), and select Add | Class from the context menu. Select the ATL Simple Object item and click Add.

In the ATL Simple Object Wizard, enter MathClass into the Short Name field. In VS 2008 and earlier, all the remaining fields (including ProgID) will be filled in automatically. But, immensely annoyingly, with VS 2010 you need to manually type in the ProgID field: MathCOMLib.MathClass (the name of the Project + “.” + the name of the Class). The ProgID field is used if you want to allow your COM object to be callable from a scripting language like JavaScript. Click Next. Leave all the File Type Handler Options blank, and click Next. Leave the Options default selections alone, and click Finish.

Now in the Solution Explorer window, select the Class View tab. You should see an interface entry named IMathClass. Right click on that interface (not the class as you might expect) and select Add | Add Method. In the Add Method Wizard, enter Sum for the Method Name. Select the “in” Parameter Attribute, select LONG for the Parameter Type, enter x for the Parameter Name, and then click the Add button. Repeat for y (“in”, “LONG”, “y”, Add). Now, select the *LONG Parameter Type which enables the retval Parameter Attribute, and check retval, and enter pRes (which stands for pointer to result) as the Parameter Name, and click Add. Click Next.

Leave the IDL Attribute alone, and click Finish. Now go back to the main Solution Explorer window. You should see a MathClass.cpp entry and then double click on it. You will be transferred to auto-generated code that looks like:

STDMETHODIMP CMathClass::Sum(LONG x, LONG y, LONG* pRes)

{

// TODO: Add your implementation code here

return S_OK;

}

Replace the comment line with:

*pRes = x + y;

Notice that return values are passed using a pointer because the regular return is reserved for an HRESULT. Now, on a 32-bit machine you can just Build the library COM object DLL by clicking Build | Build Solution. But on a 64-bit machine you need to set the Build Configuration to 64-bit by Build | Configuration Manager -> Active Solution Platform -> x64. The Build process also registers the COM object on your current machine so you don’t have to do it by using the regsvr32.exe or regsvr64.exe utility. At this point you can use Windows Explorer and verify that you just created a file named MathCOMLib.dll at wherever you told VS to save it, in the .\MathCOMLib\Debug subdirectory (along with a bazillion other files).

Now you can call your Sum method to test it in several ways. The easiest is from a C# console application. After creating the application you would first Add Reference to the MathCOMLib.dll file. Then calling code would look like this:

MathLib.MathClass mc = new MathLib.MathClass();

int x = 2;

int y = 3;

int sum = mc.Sum(x, y);

Console.WriteLine(sum);

From JavaScript you could call the method like this:

var mc = new ActiveXObject(“MathCOMLib.MathClass”);

var x = 2;

var y = 3;

var sum = mc.Sum(x, y);

WScript.Echo(sum);

And of course you could call the Sum method by using C++ if you’re a real glutton for punishment.

Posted in Software Test Automation

Simulated Annealing

In the January 2012 issue of MSDN Magazine I have an article about simulated annealing. Simulated annealing is an artificial intelligence technique that is modeled on the behavior of cooling metals. Simulated annealing can be used to estimate the optimal solution to difficult or impossible combinatorial optimization problems such as the Traveling Salesman problem where the goal is the find the order in which to visit a set of cities so that the total distance traveled is minimized. Related techniques are Simulated Bee Colony Algorithms and Ant Colony Optimization. However, Genetic Algorithms, Particle Swarm Optimization, and Bacterial Foraging Optimization are more distantly related to Simulated Annealing because they are best suited to solve numeric (rather than combinatorial) optimization problems. You can check the article out at: http://msdn.microsoft.com/en-us/magazine/hh708758.aspx.

Posted in Software Test Automation