Using Reinforcement Learning for Anomaly Detection

I often think about new machine learning ideas in two different, but related ways. One approach is to look at a specific, practical problem and then mentally examine my collection of ML techniques to see if I have a way to solve the problem. The second approach is to start by thinking about my ML techniques repertoire and then do some thought experiemnts that combine or modify two or more techniques to see if the hypothetical new technique can solve a problem.

So, one day while I was walking my dogs, my brain came up with an idea using a technique called Q-learning, from the field of Reinforcement Learning, to identify anomalies in a dataset.

Q-learning is a clever technique that can be used to find the best path through a maze. Each position in the maze is a State, and each possible move from a given position is an Action. Using an idea called the Bellamn equation, you can construct a table that assigns a Q value (“quality”) for every possible Action in every State. Then to solve the maze from the starting state, you repeatedly take the Action to move to the position/State that has the highest Q value.

See https://jamesmccaffrey.wordpress.com/2018/10/22/q-learning-using-python/ for a concrete example of Q-learning.

So my rather strange idea for an anomaly detection technique is the following. Suppose you have a dataset. Each item in the dataset is a State. You want to move through the dataset one item at a time to go from the first data item/State to the last data item/State. Using Q-learning, you construct a Q value for moving from each data item to another data item. And then . . . drum roll please . . . data items that have the lowest Q values are anomalies.

Maybe.

The next step is to do some experiments by actually writing code. Many of my ideas that combine techniques don’t work out. But some do.



The Taylor Aerocar is arguably the most successful of many attempts to combine an automobile and an airplane. Six were designed and built by a man named Moulton Taylor from 1949 to 1960. One is still flying. Software engineering experiments are much less risky than mechanical engineering experiments.

This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s