Parsing a Text File of Numeric Values using Python

I’ve been using Python quite a bit recently, mostly because I’ve been looking at the TensorFlow and CNTK machine learning libraries which both have a Python interface.

Some tools, such as Weka, use a required data file format (ARFF). But TensorFlow and CNTK operate at a lower level. Reading raw data into a suitable data structure is not exciting but it’s a key part of using TensorFlow or CNTK.

It’s possible to use the built-in “reader” functions, but sooner or later I know I’ll need to create a custom reader, so I figured I’d refresh my Python knowledge by reading a text file that simulates the Iris data set, into two Python numeric lists.

I created a dummy text file:


The first four items in each line are the “features” (predictor variables) and the last three items are the “labels”. Then after a somewhat surprisingly long time (my Python was quite rusty) I wrote a demo script that read the file into a list of the features and a list of the labels.

# parse a text file numeric values to two lists

ftrs = [] 
lbls = []

f = open('C:\\Data\\CNTK_Scripts\\iris.txt', 'r')
for line in f:
  ff = []
  ll = []
  line = line.rstrip('\n')
  xx = line.split(',') 

  for i in range(0,4):

  for i in range(4,7):


print("\nBegin demo \n")

print("\nEnd script \n")

I don’t think there’s a bottom line to this blog post, except maybe that using Python, like all programming languages, requires practice.


Posted in Machine Learning | Leave a comment

Installing CNTK v2.0 Beta

The Microsoft CNTK tool does deep neural networks and is more-or-less a direct competitor to Google TensorFlow. The CNTK v2.0 is in Beta as I write this. I figured I’d install CNTK v2.0 Beta to play around with it. The installation process wasn’t hideous but it wasn’t completely trivial.

During the installation process I grabbed screenshots every few steps. So, here’s installing CNTK v2.0 Beta in 17 screenshots. You can click on an image to enlarge.

0. Before starting the installation process I did two small preparatory steps. First I created a directory named “Local” in the C:\ root (see step #6). Second, I launched a PowerShell window with Administrative privileges and issued a “set-executionpolicy unrestricted” and then minimized the shell for use later (see step #11).

1. I went to the CNTK Releases Web page at and scrolled down to the v2.0 Beta 11 section. (These instructions should work for later releases but for sure they work with Beta 11). I selected the CPU-only version because my machine didn’t have a GPU.


2. I accepted the various license agreements. Yawn.


3. The download is a zip file. I selected Save (I could have done Save As).


4. After the download of the zip file finished, I opened the download directory in a File Explorer window.


5. I right-clicked on the zip file and selected the Extract All option from the context menu.


6. Before starting the install process, I had created a C:\Local directory because later on CNTK puts a lot of files there by default. I extracted the zip file into C:\Local.


7. There is a root cntk directory.


8. Inside the cntk root directory there is another cntk directory (groan) that holds the core CNTK DLLs.


9. The setup instructions tell you to create a new User Environment variable named MYCNTKPATH but I don’t think it’s ever used.


10. The setup instructions didn’t say to edit the System PATH variable to point to the CNTK DLLS but I did so anyway. Not sure if it’s needed or not, but because my install eventually worked, it seems OK to create such a path.


11. To install CNTK you run a PowerShell script. I launched PowerShell with Administrative privileges. I checked my execution policy to make sure I could actually run scripts. I issued a install.ps1 command to run six sub-scripts. You can either unblock each or enter ‘R’ to run each.



12. The first run of install.ps1 command is really just a trial run of the install to see if anything blows up. So now I ran “install.ps1 -execute” to actually run the install script.


13. CNTK 2.0 has a new Python interface and part of the installation is an Anaconda distribution which includes Python + NumPy + SciPy + kitchen sink.


14. The Anaconda part of the install runs in a separate command shell. It takes about 10 minutes.


15. The “install complete” message was a pleasant result. . .


16. I closed PowerShell. But to actually activate the installation I had to run a post-install cntkpy35.bat script in an ordinary command shell. Notice that my shell didn’t recognize python at first. After the post-install install script finished, I ran the example by typing python (which didn’t get captured in the screenshot) . It worked. Somewhat unusually, the documentation says you have to run the cntkpy35.bat activation script in order to run any CNTK script written in Python. Somewhat strange but eh.


17. I finished by running the Logistic Regression example which uses the CNTK BrainScript language which used to be the primary CNTK interface (it seems as if Python is the interface of choice now).


Whew. Now I’m ready to try a few of my own CNTK examples with Python, and I hope I’ll be somewhat prepared to install the non-Beta version of CNTK 2.0 when it’s released.

To summarize, installing CNTK isn’t like the nice process you may be used to with other Windows programs (“Next”, “Next”, “Next”) but if you follow the setup instructions closely you should have little or no trouble installing.

Posted in Machine Learning | Leave a comment

Running TensorFlow for Windows

Tensorflow is a code library of sophisticated machine learning algorithms. TensorFlow was created on, and intended for use on Linux machines. However, several weeks ago, a version of TensorFlow for Windows was released by the TF guys.

TensorFlow for Windows is pretty rough around the edges but I was able to get a fairly complex demo running. I went to and clicked on the Install tab and found instructions for installing TF on Windows. The first requirement was Python 3.5 (you access the TF code library using Python).

I like Python, but the split between Python 2 and Python 3 is really, really irritating. I blew away my existing Python installation and then went to and found a link to a self-extracting executable installer. The installation went smoothly.


Next, I edited my System environment PATH variable so that my shell commands would find python.exe and the pip3.exe installer.


Then I verified that Python was working by launching a Windows shell and issuing a python –version command. Then I installed TensorFlow for Windows.

(prompt) python --verion
Python 3.5.3
(prompt) pip3 install --upgrade tensorflow
. . .
Successfully installed . . . 


Finally I verified TensorFlow by printing a Hello message:

>>> import tensorflow as tf
>>> h = tf.constant('Hello message from TF!')
>>> s = tf.Session()
>>> print(

The shell spewed all kinds of warning messages but eventually printed my message. I will describe how I got a demo of image recognition using a convolutional neural network to run in a future blog post.


Bottom line: If you use Windows, TensorFlow for Windows is almost ready for mainstream use.

Posted in Machine Learning | Leave a comment

The Outer Product of Two Vectors

I was working with some machine learning code that required the use of the outer product of two vectors. I realized I hardly ever used an outer product in code, so I had to brush up quickly by reviewing the Wikipedia entry on the topic.

Suppose vector A = [1, 3, 5] and vector B = [4, 2]. The outer product of A and B is:

 4   2
12   6
20  10

The number of rows of the outer product is the length of A and the number of columns is the length of B. Each value in the result is a product of the two corresponding entries in A and B.

Now for the ML code I was working with, the two vectors had only 0 and 1 values, so the resulting outer product will only have 0 or 1 values. For example if A = [0, 0, 1, 1, 0, 0] and B = [0, 1, 0] then the outer product is:

0  0  0
0  0  0
0  1  0
0  1  0
0  0  0
0  0  0

Kind of weird. It’s very simple but I have a hard time visualizing the outer product.


Posted in Machine Learning | Leave a comment

L2 Regularization and Back-Propagation

L2 regularization, also called weight decay, is simple but difficult to explain because there are many interrelated ideas. Briefly, L2 regularization (also called weight decay as I’ll explain shortly) is a technique that is intended to reduce the effect of neural network (or similar machine learning math equation-based models) overfitting.

So, to really understand the “why” of L2 regularization, you have to understand neural networks weights and training, and such an explanation would take a couple of pages at least. Moving on, NN overfitting is often characterized by weight values that are very large in magnitude. The main idea of L2 regularization is to reduce the magnitude of weights to reduce overfitting.

Every math-based model requires training, which is the process of using data that has known inputs and known correct outputs, to find the values of the weights and biases (special weights). When training, the training optimization algorithm, for example, back-propagation or swarm optimization, needs a measure of error. L2 regularization adds a factor which is a fraction of the sum of the squared weights, to the error term. Therefore, larger weight values will contribute to larger error, and so smaller weights will be rewarded.

Note that at this point, to fully grasp L2 regularization, you must also understand how training error is measured and how training optimization algorithms work, which, again, would take several pages of explanation.

And now things get really messy. In back-propagation training, the basic weight update expressed in an equation is:


After doing some math that involves taking the derivative of the error function (also called the cost function), the update when using L2 regularization becomes:


Note: And before I forget, when using L2 regularization, the update equation for the bias values doesn’t change — a small detail that can cause a lot of grief if you’re writing code and don’t pay attention.

In words, when using back-propagation with L2 regularization, when adjusting a weight value, first reduce the weight by a factor of 1 – (eta * lambda) / n (where eta is the learning rate, lambda is the L2 regularization constant, and n is the number of training items involved (n = 1 for “online” learning), then subtract eta times the partial derivative (loosely referred to as the gradient) of the cost (error) function. The weight values tend to decrease, or “decay”, during training.

And, sadly, the messy details continue. When implementing L2 regularization, instead of adjusting weights according to the math equations, you can simplify the code to adjust the weight as normal without L2 regularization, and then subtract a fraction of the original weight value. This approach reduces weight values but completely changes the meaning of the lambda constant.

And, sigh, things get messier if you consider training algorithms such as particle swarm optimization that directly use the error term, rather than algorithms such as back-propagation that use error indirectly (to calculate the gradient).

Well, if you’re reading this blog post because you want to understand L2 regularization, all these complications are probably a bit depressing. But the basic idea is simple: L2 regularization reduces weight values which reduces model overfitting. L2 regularization is actually very simple, but the difficulty is that a full understanding of L2 regularization requires a full understanding of virtually every neural network concept.

The good news is that completely understanding L2 regularization is possible — you just have to understand all the related concepts.

Posted in Machine Learning | Leave a comment

Three Interesting Signs

I enjoy Las Vegas – for anyone who likes psychology, sociology, and mathematics, Las Vegas has all kinds of interesting weirdness. Here are three images from a recent trip.

I was staying at the Planet Hollywood hotel. Here’s what I saw in the hallway, right next to my room.


Hmmmm, quite a paradox.

This sign was in the Plant Hollywood casino area.


The sign brings up several questions – what do I have to do to get a reward? (Not sure I want to know).

During my trip, it was the Chinese New Year, the year of the rooster.


Apparently, not all roosters are honored this year.

Posted in Conferences, Miscellaneous, Top Ten | Leave a comment

At the 2017 Microsoft TechReady Event

Once or twice a year, Microsoft puts on a large conference for its employees called TechReady. I attended the 2017 event which ran from February 6 to 10 at the Washington State Convention Center in Seattle. The conference is intended primarily for employees who are in customer-facing roles — although that can mean several different things.


Invitation to TechReady is by invitation only and employees greatly value the chance to attend. This year’s event had about 5,000 attendees plus about 1,000 speakers, presenters, and event organizers.

I usually give a technical talk at TechReady, but this year I manned a booth for a new internal training effort called the Microsoft AI School. The goal of the AI School is to educate Microsoft employees about Machine Learning and Artificial Intelligence tools and concepts so that they can add intelligence into Microsoft products and services.


Interest in the AI School was huge. I printed approximately 1,600 information flyers and most were distributed. It was actually quite exhausting trying to explain what AI is many hundreds of times. I think I’ll switch back to being a regular speaker at the next TechReady.

One thing that I always enjoy at TechReady is visiting the gigantic lecture hall. You can see from the photo the hall is big, but the photo doesn’t really capture the scale of the room. There’s a rumor going around that TechReady may be combined with Microsoft’s two other large (about 8,000 people each) internal events. If true, that will be a truly enormous event.


TechReady – for me, an interesting event and good use of my time.

Posted in Conferences | Leave a comment