Recently I was working with matrices. In general I prefer writing my own custom matrix methods rather than using pre-written code libraries because I like having total understanding and control over my code (which usually happens only when I write the code myself). For years I’ve known that in principle, matrix multiplication can be performed in parallel. But until .NET 4.0, coding a parallelized version of matrix multiplication was tricky and difficult. However, the release of the Task Parallel Library, in the System.Threading.Tasks namespace in .NET 4.0, makes easy parallelization a reality.

Suppose you implement a matrix in C# as an array of arrays. The parallelization I’ll demo shortly will also work if you use true multidimensional arrays. For example:

static double[][] MatrixCreate(int rows, int cols) { // do error checking here double[][] result = new double[rows][]; for (int i = 0; i < rows; ++i) result[i] = new double[cols]; return result; }

A traditional approach to matrix multiplication is:

static double[][] MatrixProduct(double[][] matrixA, double[][] matrixB) { int aRows = matrixA.Length; int aCols = matrixA[0].Length; int bRows = matrixB.Length; int bCols = matrixB[0].Length; if (aCols != bRows) throw new Exception("xxx"); double[][] result = MatrixCreate(aRows, bCols); for (int i = 0; i < aRows; ++i) // each row of A for (int j = 0; j < bCols; ++j) // each col of B for (int k = 0; k < aCols; ++k) // could use k < bRows result[i][j] += matrixA[i][k] * matrixB[k][j]; return result; }

Calling the code could resemble:

double[][] ma = MatrixCreate(3,4); double[][] mb = MatrixCreate(4,2); // assign values to ma and mb double[][] prod = MatrixProduct(ma, mb);

Matrix multiplication is O(n^3) and therefore can be time-consuming. For example, if some matrix A has size 300 x 400, and matrix B has size 400 x 200, there’d be 300 x 400 x 200 = 24,000,000 type double multiplication operations. A simple way to parallelize matrix multiplication is:

static double[][] MatrixProduct(double[][] matrixA, double[][] matrixB) { int aRows = matrixA.Length; int aCols = matrixA[0].Length; int bRows = matrixB.Length; int bCols = matrixB[0].Length; if (aCols != bRows) throw new Exception("xxxx"); double[][] result = MatrixCreate(aRows, bCols); Parallel.For(0, aRows, i => { for (int j = 0; j < bCols; ++j) // each col of B for (int k = 0; k < aCols; ++k) // could use k < bRows result[i][j] += matrixA[i][k] * matrixB[k][j]; } ); return result; }

The Parallel.For statement can be loosely interpreted as, “Divide up all statements that use i, where i ranges from 0 to aRows, and perform them separately on different processors.”

All of the ugly plumbing details are abstracted away, and as a developer all I need to know is some slightly new syntax. I’ve left out a ton of important details but the main idea is simple and powerful.