“Data Anomaly Detection Using a Neural Autoencoder with C#” in Visual Studio Magazine

I wrote an article titled “Data Anomaly Detection Using a Neural Autoencoder with C#” in the April 2024 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/Articles/2024/04/15/data-anomaly-detection.aspx.

Data anomaly detection is the process of examining a set of source data to find data items that are different in some way from the majority of the source items. My article explains how to use a neural autoencoder implemented using raw C# to find anomalous data items.

My demo program uses a synthetic dataset that has 240 items. The raw data looks like:

F  24  michigan  29500.00  liberal
M  39  oklahoma  51200.00  moderate
F  63  nebraska  75800.00  conservative
M  36  michigan  44500.00  moderate
F  27  nebraska  28600.00  liberal
. . .

Each line of data represents a person. The fields are sex (male, female), age, State (Michigan, Nebraska, Oklahoma), income, and political leaning (conservative, moderate, liberal).

The result is that the data item that has the largest reconstruction error is (M, 36, nebraska, $53000.00, liberal), which has encoded and normalized form (0.00000, 0.36000, 0.00000, 1.00000, 0.00000, 0.53000, 0.00000, 0.00000, 1.00000).

The predicted output is (-0.00122, 0.40366, -0.00134, 0.99657, 0.00477, 0.49658, 0.01607, -0.01048, 0.99440). This indicates that the anomalous data item has an age value that’s a bit too small (actual 36 versus a predicted of 40) and an income value that’s a bit too large (actual $53,000 versus a predicted of $49,658).

The neural autoencoder anomaly detection technique presented in the article is just one of many ways to look for data anomalies. The technique assumes you are working with tabular data, such as log files. Working with image data, working with time series data, and working with natural language data, all require more specialized techniques.



In many science fiction movies, acting intelligent is anomalous behavior.

Left: In “Deep Blue Sea” (1999), scientists sedate a super intelligent, genentically enchanced shark. Choice A = Leave it alone. Choice B = Go poke it to see if it’s really sedated or just pretending.

Center: In “Alien” (1979), a space crew finds an abandoned alien ship with a cargo full of creepy, menacing egg-like pods. Choice A = Get away quickly. Choice B = Go poke one, and when it slowly opens, stick your helmet with an incredibly fragile glass faceplate directly in front of the pod.

Right: In “Life” (2017), a space station crew retrieves a probe to Mars that has an unknown life form. Choice A = Assume it might be dangerous, keep it isolated, and leave it alone until it can be transferred to a secure facility. Choice B = Assume it’s friendly, give it a cute name, and poke it with your hand covered only by a cheap plastic glove.


This entry was posted in Machine Learning. Bookmark the permalink.

Leave a comment