A C# Big Array

When working with large files and C#, sooner or later it seems like I always need a big array. The normal maximum array size in .NET is about 2 GB even on a 64-bit machine and so the maximum array length depends on the size of the objects being stored in the array. For an array of type long, since each long is 64 bits = 8 bytes, you can have a maximum array length of somewhere between 100,000,000 and 200,000,000 longs (I’m too lazy to do the math). One way to make an array with a larger size than 2 GB on a 64-bit machine is to create a custom big array class which is basically a set of smaller arrays which are less than 2 GB in size. Technically this isn’t difficult but the design option are interesting. First, should the custom class hold a specific type (such as long), or any type (by using type object), or hold any type (by using .NET generics)? The question of how general vs. specific to make code is always an issue. In general I prefer to make my utility classes as simple as possible, which means I make them very un-generic. So for a big array of type long I created a class BigArrayOfLong. It looks like this:

class BigArrayOfLong
{
private long[][] master;
public readonly int length;
private const int numberCellsInEachSubArray = 100000000;
. . .

The values are stored in a jagged array structure named master. I could have used a .NET ListArray. The length is the number of cells. Notice I chose to make length public to avoid a separate get property. I hard-coded the number of cells in each subarray (except for possibly the last subarray) to 100,000,000 – for type long, since a long is 64 bits = 8 bytes, this is 8 hundred million bytes or about 0.8 GB each. Here is another typical design decision – do you parameterize this number of cells in each subarray value of not? In my case I choose simple over flexible. The constructor is:

public BigArrayOfLong(int length)
{
this.length = length;
int remainder = length % numberCellsInEachSubArray;

int numberSubArrays;
int numberCellsInLastSubArray;

if (remainder == 0)
{
    // evenly divisible means every subarray has the same length
    numberSubArrays = length / numberCellsInEachSubArray;
    this.master = new long[numberSubArrays][];
    for (int i = 0; i < this.master.Length; ++i)
      this.master[i] = new long[numberCellsInEachSubArray];
}
else // the last subarray has shorter length
{
    numberSubArrays = (length / numberCellsInEachSubArray) + 1;
    numberCellsInLastSubArray = remainder;
    this.master = new long[numberSubArrays][];
    for (int i = 0; i < this.master.Length – 1; ++i)
      this.master[i] = new long[numberCellsInEachSubArray];
    this.master[this.master.Length – 1] = new long[numberCellsInLastSubArray];      }
} // ctor

The constructor determines how many subarrays are required to create a big array of length ‘length’ and then allocates them into master being careful to deal with the last subarray.

To access the big array I have:

public long GetValue(int index)
{
int row = index / numberCellsInEachSubArray;
int col = index % numberCellsInEachSubArray;
return this.master[row][col];
}

public void SetValue(int index, long value)
{
int row = index / numberCellsInEachSubArray;
int col = index % numberCellsInEachSubArray;
this.master[row][col] = value;
}

With this code in place I can call the big array along the lines of:

BigArrayOfLong ba = new BigArrayOfLong(400000000);
ba.SetValue(0, 123456789);
long x = ba.GetValue(0);

I use explicit methods to get and set values rather than use .NET properties which would allow calls like:

ba[0] = 123456789;
long x = ba[0];

The property approach is elegant but in my opinion not as good as an explicit method approach because the explicit approach calls make it very clear that I am working with a custom BigArrayOfLong object and not a built–in type.