“Microsoft Researchers Use Neural Transformer Architecture for Cybersecurity” on the Pure AI Web Site

I contributed to an article titled “Microsoft Researchers Use Neural Transformer Architecture for Cybersecurity” on the October 2022 edition of the Pure AI web site. See https://pureai.com/articles/2022/10/03/neural-transformer-architecture.aspx.

The article describes two new techniques for cybersecurity that use deep neural transformer architecture. The first system computes the similarity of two datasets. The second system identifies anomalous items in a dataset containing mixed numeric and non-numeric data.

Previous research has explored dataset similarity and anomaly detection using standard deep neural multi-layer perceptron architecture. The new systems use transformer architecture which is a significantly more sophisticated approach. A loose analogy is old electronics vacuum tube technology versus transistor technology. Both can be used to build a radio, but transistor technology is more powerful.

Knowing the similarity between two datasets can be useful in several cybersecurity scenarios. One example is comparing two machine learning training datasets to determine if one has been compromised by a so-called poisoning attack. Another example is comparing two output datasets to determine if a prediction model has been compromised in some way.

The anomaly detection system scans each item in the source dataset and uses a TransformerEncoder component to generate a condensed latent representation of each item. Each data item is converted to a latent representation, and the latent representation is expanded back to a vector of nine values. Data items with large reconstruction error don’t fit the TA model and must be anomalous in some way.

I was quoted in the article:

McCaffrey added, “It’s somewhat surprising that transformer-based systems often perform better than standard neural-based systems, even when there is no explicit sequential data. We don’t completely understand exactly why transformer systems work so well, and this is an active area of research.”

McCaffrey further observed, “There is no single silver bullet technique for cybersecurity. Systems based on transformer architecture appear to be a promising new class of techniques.” But he cautioned, “Because transformer architecture is so complex, it’s important to do a cost-benefit analysis before starting up a TA project.”



Several of the novels by Charles Dickens feature the transformation of a boy to a man. Here are three of my favorite movie adaptations. Left: In “Oliver Twist” (1948), Oliver meets the ambiguous criminal Fagin and the Artful Dodger. Center: In “David Copperfield” (1935), David has many adventures and is helped by the kindly Mr. Micawber. Right: In “Great Expectations” (1946), Pip meets the escaped convict Abel Magwitch who secretly becomes Pip’s benefactor.


This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s