Category Archives: Machine Learning

Xenobots: Tiny Bio-Robots Designed Using Machine Learning

I ran into a truly fascinating research paper recently that described “xenobots”. Briefly, a xenobot is a tiny (about 4 one-hundredths of an inch in diameter — about the size of a grain of sand) programmable bio-robot made from frog … Continue reading

Posted in Machine Learning | 1 Comment

Reading IMDB Movie Review Dataset Files

I was working on the well-known IMDB movie review sentiment analysis problem The goal is to create a machine learning model that accepts the text of a movie review and predicts if the review is positive (class 1) or negative … Continue reading

Posted in Machine Learning | Leave a comment

The Best Algorithm I’ve Discovered for Positive and Unlabeled Learning (PUL)

A positive and unlabeled learning (PUL) problem occurs when a machine learning set of training data has only a few positive (class 1) labeled items and many unlabeled (could be either negative class 0, or positive class 1) items. For … Continue reading

Posted in Machine Learning | Leave a comment

A Predict-Next-Word Example Using Hugging Face and GPT-2

Deep neural transformer architecture (TA) systems can be considered the successors to LSTM (long, short-term memory) networks. TAs have revolutionized the field of natural language processing (NLP). Unfortunately, TA systems are extremely complicated and implementing a TA system from scratch … Continue reading

Posted in Machine Learning | Leave a comment

Principal Component Analysis (PCA) From Scratch vs. Scikit

A few days ago I coded up a demo of anomaly detection using principal component analysis (PCA) reconstruction error. I implemented the PCA functionality — computation of the transformed data, the principal components, and the variance explained by each component … Continue reading

Posted in Machine Learning | Leave a comment

Nucleus Sampling for Natural Language Processing

I ran into an interesting idea called nucleus sampling, also called top-p sampling. Nucleus sampling is used for natural language processing (NLP) next-word prediction. Suppose you have a sentence that starts with “I got up and ran to the . … Continue reading

Posted in Machine Learning | Leave a comment

Graphing the Michalewicz Function Using Matplotlib

The Michalewicz function is an interesting math function that is sometimes used to test the effectiveness of numerical optimization algorithms. The function can accept two or more input values. The function is tricky to minimize because there are several local … Continue reading

Posted in Machine Learning | Leave a comment

A Sentence Fill-in-The-Blank Example Using Hugging Face

Deep neural transformer architecture (TA) systems have revolutionized the field of natural language processing (NLP). Unfortunately, TA systems are incredibly complex and implementing such a system from scratch can take months. Enter the Hugging Face code library. Terrible name, excellent … Continue reading

Posted in Machine Learning, PyTorch | Leave a comment

Computing the Similarity Between Two Machine Learning Datasets in Visual Studio Magazine

I wrote an article titled “Computing the Similarity Between Two Machine Learning Datasets” in the September 2021 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2021/09/20/dataset-similarity.aspx. A common task in many machine learning scenarios is the need to compute the similarity … Continue reading

Posted in Machine Learning, PyTorch | Leave a comment

Anomaly Detection Using Principal Component Analysis (PCA) Reconstruction Error

I was reviewing some research papers that had been submitted to an internal conference at the tech company I work for. I was the Area Chair for the Unsupervised and Semi-Supervised Learning track of the conference. One of the research … Continue reading

Posted in Machine Learning | Leave a comment