I wrote an article titled “Sentiment Analysis Using CNTK” in the October 2018 issue of Microsoft MSDN Magazine. See https://msdn.microsoft.com/en-us/magazine/mt830362.
CNTK (“Cognitive Network Tool Kit”) is Microsoft’s neural network library that’s comparable in many ways to Google’s TensorFlow code library and Facebook’s PyTorch library.
The article uses the IMDB movie reviews dataset. The goal is to accept a movie review written by someone, complete with misspellings and bad grammar, and predict whether the review is positive (“this was an excellent film”) or negative (“I should have stayed home”). Sentiment analysis is problem, that until a couple of years ago, was not really feasible.
The IMDB dataset has a total of 50,000 reviews. There are 25,000 reviews for training the model, and 25,000 test reviews for evaluating the accuracy/quality of the trained model.
The demo neural network in the article is a long, short-term memory (LSTM) network. LSTMs have a memory which is important for analyzing text input because the meaning of a word in a sentence often depends on previous words in the sentence. For example, the two words “great movie” have a different meaning if preceded by “I can endorse this as a” than if preceded by “If I was a masochist then this would be a”.
Now sentiment analysis is still an extremely difficult challenge. But with neural networks libraries like CNTK, TensorFlow/Keras, and PyTorch, creating a sentiment analysis system is now within the reach of ordinary software engineers with limited time and budget.
Note: Thanks again to my colleagues Joey Carson, Si-Qing Chen, Eunice Kim, and Lucas Meyer who helped me with the article.