Introduction to Spark for .NET Developers

I wrote an article “Introduction to Spark for .NET Developers” in the December 2015 issue of MSDN Magazine. See

Spark is an open source “computing framework” for Big Data. It’s not so easy to explain exactly what that means. First, Spark is not a database storage system like SQL Server or HBase or Cassandra or HDFS files. To use Spark you must have some existing data storage system.

Second, Spark is not a programing language. To use Spark you must interact with a language like Scala (for interactive commands), or Java or Python (for programmatic interaction).

Spark is basically a library of functions that perform operation like extracting data or performing arithmetic operations on data. In this regards Spark can be thought of as something roughly analogous to the SQL language. Another way to think of Spark is as an alternative to Hadoop.


In my article, I explain how to install Spark (and Scala and Java) on a Windows machine. The installation I explain isn’t a full-fledged system that could be used for production because the system uses the Windows file system, which is a non-distributed file system.

Spark is a relatively new technology and it still has a lot of rough edges. However, I think Spark has a lot of promise and Spark seems to be gaining a lot of traction so it should evolve fairly quickly.

This entry was posted in Machine Learning. Bookmark the permalink.