Category Archives: Machine Learning

The Hellinger Distance Between Two Probability Distributions Using Python

A fairly common sub-problem when working with machine learning algorithms is to compute the distance between two probability distributions. For example, suppose distribution P = (0.36, 0.48, 0.16) and Q = (0.33, 0.33, 0.33). What is the difference between P … Continue reading

The Worst Logistic Regression Graph Diagram on the Internet

Argh! I have to post on this topic. Strewn throughout the Internet is a graph that is supposed to explain what logistic regression is and how it works. I’ve seen this graph, and variations of it, for years and it … Continue reading

Posted in Machine Learning | 2 Comments

Neural Network Lottery Ticket Hypothesis: The Engineer In Me Is Not Impressed

The neural network lottery ticket hypothesis was proposed in a 2019 research paper titled “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks” by J. Frankle and M. Carbin. Their summary of the idea is: We find that a standard … Continue reading

Implementing Kullback-Leibler Divergence from Scratch Using Python

The Kullback-Leibler divergence is a number that is a measure of the difference between two probability distributions. I wrote some machine learning code for work recently and I used a version of a KL function from the Python scipy.stats.entropy code … Continue reading

Positive and Unlabeled Learning (PUL) Using PyTorch

I wrote an article titled “Positive and Unlabeled Learning (PUL) Using PyTorch” in the May 2021 edition of the online Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2021/05/20/pul-pytorch.aspx. A positive and unlabeled learning (PUL) problem occurs when a machine learning set of … Continue reading

I was preparing to start a project that will use JavaScript. To get ready, I coded up a few short demos of my favorite problems, using JavaScript. One of these favorite problems is Parrando’s Paradox. It’s one of the most … Continue reading

Why I Dislike XGBoost and Why I Like XGBoost

First, the title of this blog post is moderately click-bait. I dislike many charateristics of XGBoost but I like some of them too. XGBoost (“extreme gradient boost”) is a huge library of many functions, with hundreds of parameters and possible … Continue reading

Posted in Machine Learning | 2 Comments

Tomek Links for Pruning Imbalanced Data

Imbalanced data occurs when you have machine learning training data with many items of one class and very few items of the other class. For example, some medical data might have many thousands of data items that are “no disease” … Continue reading

Researchers Explore Intelligent Sampling of Huge ML Datasets to Reduce Costs and Maintain Model Fairness

I contributed to an article titled “Researchers Explore Intelligent Sampling of Huge ML Datasets to Reduce Costs and Maintain Model Fairness” in the May 2021 edition of the online Pure AI site. See https://pureai.com/articles/2021/05/03/intelligent-ai-sampling.aspx. Researchers devised a new technique to … Continue reading

Neural Networks, Dogs, and JavaScript

I was walking my two dogs, Riley and Kevin, early one wet Pacific Northwest Saturday morning. I enjoy walking and thinking while my dogs do their dog-thing and look for rabbits. My dogs have never caught a rabbit but they’re … Continue reading