A Quick Look at MongoDB

MongoDB is an open source database system that works with JSON style documents (as opposed to traditional SQL databases with rows and columns). I hadn’t worked with MongoDB for quite a long time, so I thought I’d take the latest version, v3.4.2, out for a test drive.

Installation was quick and easy. I accepted all default except for installing at C:\mongodb instead of C:\Program Files. I created directories C:\Data\db and C:\Data\log which are the defaults to hold data and log files respectively.

The MongoDB system can run as a Windows service or as a more traditional Linux-style daemon. I used the service approach.

My tiny demo first connected to the running service and listed existing system databases (admin and local):

C:\mongodb\bin > mongo
> show dbs
admin
local

Next I created a dummy database named mydb with a single entry:

> use mydb
> db.users.insert({first: 'John', last: 'Smith'})

Then I cleaned up by deleting the dummy database:

> use mydb
> db.dropDatabase()

Bottom line: MongoDB is very, very nice. Technical thumbs up!

Posted in Miscellaneous | Leave a comment

What’s the Deal with all the JavaScript Frameworks?

I don’t create Web applications as often as I code backend algorithms. I’m constantly amused by the Web world and its constant chaos. In my mind, Web development falls into six categories based on technology used: ASP.NET, PHP, raw HTML + JavaScript, Ruby, JavaScript frameworks, other. And of course there are many important technologies I’ve left out, but I think these six categories capture most Web application programming.

It seems like there’s a new JavaScript framework that emerges every few months. If you’re not a programmer, the basic idea is that you can code everything from scratch using HTML and JavaScript, but this approach requires a lot of time and expertise. JavaScript frameworks are a collection of code libraries that can be used to create a Web application. It’s sort of like using Legos to create a robot instead of creating a robot from raw metal and plastic.

Here are the eight JavaScript frameworks that I run into most often, not in any particular order.

1. jQuery – The jQuery framework is a very low-level library of code functions that are especially useful when writing code that must be used by different browsers. Often used by the other frameworks on this list.

2. AngularJS and Angular 2 – Perhaps the most common high-level framework I see. Created by Google so has strong support. Very comprehensive.

3. Backbone.js – A framework designed with a traditional database backend in mind. Uses a MVC / MVP model. Created by a single guy so is possibly fragile with regards to long-term support.

4. ReactJS – A framework that emphasizes creating user interfaces. Maintained by Facebook so is likely to be well-supported.

5. Vue.js – Another framework that emphasizes UI. Created by one guy.

6. Ember.js – Emphasizes single-page-applications that connect to a database. Very popular.

7. MeteorJS – A general purpose framework, closely associated with the MongoDB database and therefore unstructured documents.

8. KnockoutJS – A framework that emphasizes database connectivity via MVVM. Written by one guy. One of the earliest popular frameworks but seems to be losing steam.

There are dozens of other JavaScript frameworks. The upside of these frameworks is that you can be more productive, more quickly. The downside is that most have a very steep learning curve, and using one framework can lock you in forever.

Over time, I fully expect most JavaScript frameworks to fade into obscurity, with maybe two or three dominating.

Posted in Miscellaneous | Leave a comment

Recap of the 2017 Visual Studio Live Conference

I spoke at the 2017 Visual Studio Live Conference. The event ran from March 13 – 17, in Las Vegas (Bally’s). VS Live is a conference for software developers who use Microsoft technologies, and it’s one of my favorite conferences.

I estimate there were about 400 attendees. A typical attendee was a senior developer at a large company or a state or federal agency. For example, I talked to people who worked at hospitals, financial companies, energy companies, and all kinds of government agencies. Attendees were overwhelmingly male (perhaps 95%) which is normal for developer conferences.

VS Live has been around for many years, and the event organizers, Danielle, Brent, Sara, and the rest, all do a great job with regards to logistics. Unlike most developer conferences which happen once a year, there are seven VS Live events every year — Austin, Washington DC, Redmond, Chicago, Anaheim, Orlando, and my favorite location, Las Vegas.

I gave two talks. My first talk was “Introduction to R and Microsoft R Server”. I explained what the R language is from the point of view of a software developer, and gave a few opinions about how R might be useful. My R talk went OK, but not great — I was a bit flat and so was the audience.

My second talk was “Introduction to Azure Machine Learning”. I walked through a complete end-to-end demo that showed how to use Azure ML to create a prediction model for the famous Iris Data set. That talk went very well. I was energized and the audience was engaged too. The attendee feedback was very good.

Conferences are exhausting. You get up early, concentrate all day, stay up late chatting, sleep for a couple of hours, then start again. But attendees (and me too) get exposed to all kinds of new ideas and technologies in a very short time frame.

For me, I feel it’s important for Microsoft’s clients to understand a bit about what Microsoft Research is doing with regards to things that will affect developers. I’ve been in software for decades and I’ve never seen a more exciting time. The advances in machine learning and artificial intelligence are accelerating and will likely change the world in ways that are hard to imagine.

Posted in Conferences | Leave a comment

Experimenting with Neural Network L2 Regularization

Regularization is a standard technique used in neural network training. The most common form of regularization is called L2. The idea is to add the sum of squared (the “2” in “L2”) weight values to the error term during training. This penalty acts to reduce the magnitude of the weights, which in turn acts to reduce the possibility of model over-fitting.

I coded up a demo using the Python language so I could gain a full understanding of L2 regularization. During my preliminary research, I found a lot of confusing and contradictory information on the Internet.

For example, when using L2, according to several sources, the theoretical weight update equation is:

Here eta (like script lower case n) is the learning rate, and lambda (like a triangle without the bottom) is a regularization constant, and n is the number of training items. It doesn’t make any sense that the weight penalty should depend on the number of training items.

Several sources say that regularization should not be applied to the bias values. This too doesn’t make any sense to me. Biases can grow very big so why not restrict them?

And several resources simplify the weight update equation to:

where d is a decay constant with a value like 0.99. But when I tried this approach with some synthetic data, all the weight values quickly went to 0.0 and training completely stalled.

In my demo, the approach that finally seemed to work best was to use a “conditional decay”, where weights are decayed using the simple equation but only when the absolute value of a weight is greater than 1.0 (which was arbitrary — perhaps a larger threshold would work better). And I decayed both weights and biases.

The moral is that even though there is a lot of information about neural network L2 regularization available on the Internet, I’m skeptical of a lot of that info.

Posted in Machine Learning | Leave a comment

Experimenting with Neural Network Dropout

Whenever I read about some sort of technology, no matter how clear the explanation is, I never feel that I fully understand the topic unless I can code a demo program. This is probably a character strength and weakness of mine.

I was thinking about several neural network ideas related to over-fitting such as L1 regularization, L2 regularization, weight restriction, and dropout. So, even though I am reasonably familiar with all these ideas, I thought I’d take a close look at dropout.

In neural network dropout, during training when the system finds values of the network weights and biases, as each training item is presented to the network, a random 50% of the hidden processing nodes are virtually dropped — you pretend the dropped nodes aren’t there.

It’s not obvious, but using dropout helps prevent over-fitting, which is when a network predicts very well on the training data, but when presented with new test data that wasn’t used during training, the network predicts poorly.

As usual, even with rather simple ideas, there are many details you have to address when coding an actual implementation. I spent a few hours one rainy Seattle weekend exploring neural network dropout. I now believe I have a very solid understanding of how to implement dropout, and a decent understanding of the theoretical aspects of dropout.

In my demo, a neural net trained without dropout over-fit some synthetic data — the accuracy on the training data was 94.00% but when presented with test data the accuracy was only 67.50%. But when trained using dropout, the accuracy on the test data improved to 72.50%.

Posted in Machine Learning | Leave a comment

Factor Analysis using R

I wrote an article in the March 2017 issue of Visual Studio Magazine titled “Revealing Secrets with R and Factor Analysis”. See https://visualstudiomagazine.com/articles/2017/03/01/revealing-secrets-r-factor-statistics.aspx.

Factor Analysis is a classical statistics technique that analyzes data to determine if some set of observed data can be explained by a smaller set of “latent variables.” The idea is rather subtle and is best explained by example. In my article I create a set of a fake movie preference data for 20 people. Each person rates how much they like each of seven movies: “Forbidden Planet”, Dark City”, The Hangover”, “Meet the Parents”, “Ben Hur”, “Gladiator”, and “Galaxy Quest”.

The first two movies (“Forbidden Planet” and “Dark City”) are science fiction. The next two are comedies. The two after that are historical. The last movie, “Galaxy Quest”, is both science fiction and comedy.

A factor analysis can tell you if people’s movie preferences are related to the latent variable, genre. If so, then you could use that information to predict the preference of some new movie by the people in the data.

Performing factor analysis with R is very easy. The harder part is interpreting the results. For my dummy data, the key part of the R results is:

Loadings:
                Factor1 Factor2 Factor3
ForbiddenPlanet -0.141   0.987         
TheHangover      0.930          -0.205 
MeetTheParents   0.798  -0.174  -0.226 
BenHur          -0.216  -0.142   0.964 
Gladiator       -0.484  -0.182   0.665 
GalaxyQuest      0.591   0.557  -0.488 
DarkCity                 0.761  -0.273

Notice “Forbidden Planet” and “Dark City” have high values of the “Factor2” latent variable (which we know to be “science fiction”). Similarly, “The Hangover” and “Meet the Parents” correspond to a “Factor1”, and “Ben Hur” and “Gladiator” correspond to a “Factor3”.

Factor analysis isn’t too common in the hard sciences, but it’s used fairly often in fields such as psychology and marketing.

Posted in Machine Learning, R Language | Leave a comment

March Madness and Machine Learning

As I write this blog post, I’m at the 2017 Visual Studio Live Conference in Las Vegas. By coincidence, the 2017 NCAA college basketball tournament (“March Madness”) started just a few minutes ago.

I’ve always been fascinated by things related to probability and prediction. For example, every year I write a computer program that predicts the outcomes of NFL football games. In my mind, machine learning is any system that uses data to make some sort of a prediction, so predicting outcomes related to March Madness is a possible machine learning problem.

There is huge interest in the NCAA basketball tournament. By that I mean an enormous amount of money is bet on the games. The best estimate I’ve seen indicates that people will wager approximately $10.0 billion dollars on March Madness over the next few weeks. That’s “billion” with a “b”.

A good friend of mine (PW) knew that I was in Vegas and so he sent me an e-mail message and asked me to place a $100 bet on UCLA for a friend of his (DL) to win the tournament. DL picked UCLA because that’s where he went to school. The approximate Vegas odds of UCLA winning the tournament are about 12 to 1 so if UCLA wins my friend’s friend will win about $1200.

So, I walked across the street from Bally’s (where VS Live is) to the Bellagio Hotel, which has a big “sports book” (betting operation) to place the bet. There were hundreds of people there and tremendous energy, and the first game (Notre Dame vs. Princeton) had just tipped off. It was very exciting.

Now the interesting thing here is that the current odds are determined by how much people bet on each team. And for March Madness, people often tend to bet with their hearts (i.e., the school they went to) rather than their heads. This likely creates imbalances in the odds that could be taken advantage of with a sophisticated machine learning system. I wish I had time to explore such a prediction system, but I don’t.

Posted in Machine Learning | 2 Comments