A Few Observations About the Fisher-Yates Shuffle Algorithm

Shuffling an array into random order is a common task in machine learning algorithms. For example, if an array holds array-index values such as idxs = [0, 1, 2, 3, . . ] then shuffling the idxs array gives you a way to iterate through training data in random order.

The Python language NumPy library has a built-in numpy.random.shuffle() function. But there are times when you want to implement a custom shuffle() function, and some programming languages don’t have a built-in shuffle() function.

The usual algorithm to shuffle the contents is called the Fisher-Yates shuffle, or sometimes the Knuth shuffle. There are several different variations. The variation I prefer, implemented in Python is:

def shuffle_basic(arr, rndobj):
  # last i iteration not necessary
  n = len(arr)
  for i in range(n):  # 0 to n-1 inclusive
    j = rndobj.randint(i, n)  # i to n-1 inclusive
    tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp

I iterate forward, i ranging from 0 to n-1 inclusive. This is not the most common technique. The last iteration will always swap the value in the last cell with itself. Therefore, it’s more common to write for i in range(n-1). I don’t mind the extra iteration because 1.) if an array has just one cell, range(n-1) will be out-of-range for some programming languages (not Python), and 2.) the for-loop range and the randint() range are the same which has a pleasant symmetry.

Somewhat weirdly, the Wikipedia article on Fisher-Yates gives a basic algorithm that iterates backwards with the i variable in the outer loop. This baffles me — I can think of no reason to awkwardly iterate backwards when a forward iteration works just fine.

Additionally, I prefer to pass in a local random_object rather than use the global NumPy random_object. This makes reproducibility much easier because no other functions are modifying the random_object.

It’s well known that it’s easy to mess up the Fisher-Yates shuffle algorithm. Specifically:

def shuffle_bad(arr, rndobj):
  # last i iteration not necessary
  n = len(arr)
  for i in range(n):  # 0 to n-1 inclusive
    j = rndobj.randint(0, n)  # 0 to n-1 is wrong
    tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp

The call to the randint() function looks correct, and the algorithm does spit out random-looking arrangements. However this version is biased towards certain permutations. Specifically, if arr = [0,1,2], then first, fifth and sixth arrangements [0,1,2], [2,0,1], [2,1,0] are slightly less likely than the second, third and fourth arrangements [0,2,1], [1,0,2], [1,2,0].

Strangely, I implemented the bad version of Fisher-Yates and ran it to shuffle [0, 1, 2] 100,000 times. If all six permutations are equally likely, you’d get 1/6 = 0.1667 of each. In my demo I got [ 0.1665 0.1673 0.1641 0.1676 0.1682 0.1664]. This is not at all what theory predicts and I’m not sure what’s going on. I know from a lifetime of experience that this problem is now in the back of my mind and will stay there, nagging at me, until I solve it. But I will.

Left: Synchronized shuffle dancing. Center: The Shuffle Inn arcade bowling game. Right: Shuffling playing cards.

Demo code below. Continue reading

Posted in Machine Learning | Leave a comment

Example of Calculating the Energy Distance Between Two Samples

I stumbled across an interesting idea called energy distance. Energy distance is a number that is a measure of the distance between two distributions. There are many other ways to define the distance between two distributions. The Kullback-Leibler divergence is one example.

Suppose you have two distributions, X and Y, where each item is a vector with 4 values (so dim = 4). You draw n = 3 samples from the first distribution and m = 2 samples from the second distribution. If X and Y are:

X =
 [0.1  0.5  0.3  0.1 ]
 [0.3  0.6  0.0  0.1 ]
 [0.0  0.8  0.05 0.15]

Y =
 [0.1  0.3  0.2  0.4 ]
 [0.05 0.25 0.4  0.3 ]

then the energy distance between X and Y is 0.8134 — maybe (see below).

I was motivated to explore the idea of energy distance because I did some Internet searches for examples and found literally no examples. Lack of information such as this always intrigues me.

The Wikipedia page on energy distance feels like it was written by the inventor. It wasn’t much help for me as an implementation guide.

The inventor of the idea of energy distance is a mathematician named G.J. Szekely. The few research papers I found were all written by him, and the Wikipedia article on energy distance looks like it was entirely written by him too. Somewhat of a red flag.

If you have a sample X and a sample Y, you must compute Euclidean distances (or any other vector distance measure) between all pairs of X items, distances between all pairs of Y items, and distances between all pairs of X and Y items. Then you compute the average distance between all X items, the average distance between all Y items, and the average distance between all X-Y pairs. Then energy distance is:

sqrt[ (2 * avg_xy) - avg_xx - avg_yy ]

At least I think this is how energy distance is computed, based on the research papers I read. Notice that energy distance won’t scale well to large datasets because of the huge number of potential distance calculations.

I concluded that energy distance is perhaps somewhat of a vanity project — an interesting idea but one that doesn’t appeal to anyone other than the inventor. Many times, new ideas are useful and valid, but they don’t provide a big enough advantage over existing techniques. Maybe energy distance could be useful, but the research papers are written in a style that only deep experts can understand — and so nobody will take the ideas behind energy distance and popularize them for data scientists.

I coded up a demo using Python. But my demo could be quite wrong because I had very little to go on. (Note: my demo is not efficient in the sense that is computes both dist(x,y) and dist(y,x) which are the same for Euclidean distance).

Vanity Fair Magazine started publication in 1913. The magazine content was folded into Vogue Magazine for several years. Left: The June 1914 issue, just a few days before the start of World War I in July. Center: The July 1929 issue, just a few days before the start of the Great Depression in August 1929. Right: The September 1941 issue, just a few weeks before the U.S. entry into World War II on December 7, 1941.

It seems like people have a remarkable ability to overcome adversity and bounce back stronger than ever with renewed energy.

Code below. Long. Continue reading

Posted in Miscellaneous | 1 Comment

Great Technology in Vegas That Fails Because It’s Too Good

I noticed an example where great technology is rendered useless because of a lack understanding of human psychology.

My story takes place in Las Vegas. I was there speaking at a conference. Before I explain, it’s important to know that I grew up playing cribbage with my father, poker with my high school friends, and in college I learned how to count cards while playing Blackjack.

I greatly enjoy playing poker and Blackjack because of the mathematical decisions involved, not because of the gambling aspect. On the other hand, mentally passive games, like roulette, slot machines and craps, bore the heck out of me.

OK, so one evening, after the Vegas conference sessions had all finished, I was wandering around the casino area of the Aria hotel. I observed many people playing roulette, Blackjack, craps, the Big Wheel, Pai Gow Poker, baccarat, and other games. They were playing in the traditional way, with several players at a table and a dealer (or dealers in the case of craps).

Left: Why aren’t systems like this more popular? I think I found out why. Right: The user screen.

Then I noticed a separate area with about 16 chairs. Each chair had a tabletop console. In front of the chairs there were five large monitors, about 6 feet tall. Players can sit down and play one of five games — Blackjack, craps, roulette, baccarat, or Big Wheel. But there were only a couple of people playing this big computer system.

The idea seems to make sense — you don’t have to worry about annoying players, and the casino will save huge amounts of money by not having to pay dealers.

So I sat down and played Blackjack on the system. And I was completely bored after five minutes, and I left.

The next day I gave the incident some thought. Why is it that I like to play traditional Blackjack with a human dealer, but the exact same game with a computer dealer bored the heck out of me?

I concluded the computer system is too efficient and not human enough.

Left: The large monitors are supposed to attract players to the system. Right: I laughed when I saw a BSOD on one console. I didn’t know that Microsoft Windows powers part of Sin City.

When you play Blackjack with a human dealer, every hand is dealt a bit differently in terms of speed, and the way the dealer tosses or lays out your cards. Furthermore, when playing with humans, there are all kinds of minor activities going on — a new player joins the table, the cocktail waitress comes by and asks if anyone wants a drink, the dealer shuffles the deck of cards up, people playing at a nearby table shout for joy in unison, and so on.

Playing with a computer system is insanely repetitive and my brain went to mush very quickly from the lack of variety and lack of stimuli.

So, all this is just an observation.

But if I was the designer of a computer gambling system, I would program all kinds of deliberate but minor actions — varying speed, unexpected little popups, and so on, to make the system more human and less perfect.

Four dresses made from playing cards. Very clever!

Posted in Miscellaneous | 1 Comment

Strange Results Using Isolation Forest Anomaly Detection

I was exploring anomaly detection using isolation forests and got some strange results that make me skeptical of the technique. An isolation forest takes some data then repeatedly constructs binary trees where there split at each branch is random based on a single column/feature. The idea is that unusual values in a column will be branched quickly and therefore data items with unusual values will be close to the tree root. If you do this repeatedly and track the average depth of an item, items with small average depth are anomalous.

Left: baseline data. Right: One item changed to a young person who makes a lot of money.

In my mind I wondered how this algorithm could deal with a situation where a data item has two features, and each feature value is common but the combination of the two is very unusual. For example, suppose you have a bunch of people data with sex, age, income. The 20-29 year olds all have income in the $30,000s. The 50-59 year olds all have income in the $60,000s. But if you see a 20-year old with an income of $62,000, that person should definitely be flagged as an extreme anomaly.

I created such a 20-item dummy data file and ran it through the isolation forest implementation in the scikit-learn code library. With all normal data, weirdly, the isolation forest flagged four items as normal (+1) and the other 16 items as anomalies (-1). Hmm. Strange.

Then I changed one data item from (0, 0.22, 37), meaning a male, 22-year old who makes $37,000, to (0, 0.22, 62) — same person now makes $62,000 which is far more than any other person in their 20s. With the modified data, the isolation forest got quite different results that didn’t make much sense. Although the high-earning 22-year old was flagged as anomalous, it wasn’t flagged as anomalous as a 20-year old who makes $39,000 or the 59-year old who makes $60,000.

It looks as though the isolation forest is only looking at extreme values in individual features, and isn’t finding anomalies that result from interactions between features. For sure, there are many isolation forest parameters that I didn’t explore, and my dummy dataset was tiny, but still, the isolation forest anomaly detection results were strange.

Superheros and mutants are anomalous human beings. I’m not really a fan of superhero and mutant movies, but here are three clever spoofs that I enjoyed watching.

Left: “Mystery Men” (1999) featured a collection of third-rate superheros including The Shoveler who is very competent are shoveling dirt, Mr. Furious who has the ability to get very angry, and The Blue Raja who can throw forks and spoons with great accuracy. Very funny movie.

Center: “Supervized” (2019) was set in a retirement home for superheros. The retirees included Pendle, Shimmy, and Ray. An OK film but not as good as the other two I show here.

Right: “Superhero Movie” (2008) mostly spoofs Spiderman but also has references to X-Men, the Fantastic Four and other movies. An uneven movie but some parts of it are hilarious.

Here’s the demo code with embedded data:

# iforest_weak_demo.py

import numpy as np
from sklearn.ensemble import IsolationForest

def main():
  print("\nBegin scikit Isolation Forest demo ")

  data = np.array([
    # [0,0.22,37],
    [1,0.59,60]], dtype=np.float32)

  np.set_printoptions(precision=4, suppress=True, linewidth=100)
  print("\nData: ")

  iso_for = IsolationForest()
  model = iso_for.fit(data)
  print("\nPredcitions: ")
  predictions = model.predict(data)

  np.set_printoptions(precision=3, suppress=True, linewidth=40)
  print("\nAnomaly scores: ")
  scores = iso_for.score_samples(data)

  print("\nEnd demo ")

if __name__ == "__main__":
Posted in Machine Learning | Leave a comment

A Comparison of Symmetric Kullback-Leibler, Jensen-Shannon, and Hellinger Distances

A surprisingly common task in machine learning is to compute the distance between (or similarity to) two probability or frequency distributions. For example, suppose P = (0.20, 0.50, 0.30) and Q = (0.10, 0.30, 0.60). What is the distance between P and Q?

There are many ways to compute a distance. Three of the most common are Kullback-Leibler (KL), symmetric Kullback-Leibler (sKL), Jensen-Shannon (JS), and Hellinger (H).

Note: Another common distance is the Wasserstein distance, also known as the Earth Mover distance, but it’s more difficult to compute so I don’t compare it with KL, JS, H. Theoretically, Wasserstein is supposed to be superior to KL, JS, and H distances. I intend to figure out how to implement Wasserstein distance from scratch, rather than using the scipy library wasserstein_distance() function, and then I’ll compare Wasserstein distance with KL, JS, and H distances.

Four probability distributions for my investigations.

After I learned how to compute these distances, the next question was, “Which one is best under which circumstances?” There is no good information available on the Internet. But here are my observations.

Regular KL is not good because KL(P,Q) != KL(Q,P) — it’s not symmetric. There are several ways to make a symmetric KL. The simplest is to define sKL(P,Q) = KL(P,Q) + KL(Q,P) — just sum both directions. However, none of the simple symmetric versions of KL satisfy the triangle inequality (TI).

Top: equation for Kullback-Leibler distance. Middle: Jensen-Shannon. Bottom: Hellinger.

For a distance function D and distributions P, Q, R, if D(P,R) <= D(P,Q) + D(P,R) then the distance function D satisfies the triangle inequality, which is a nice characteristic.

The Jensen-Shannon and Hellinger distance functions satisfy three nice properties that qualify them as formal “metrics” (as opposed to the general meaning a of a number that measures something).

1. D(P,Q) >= 0 and D(P,Q) == 0 IFF P == Q
2. D(P,Q) == D(Q,P)
3. D(P,R) <= D(P,Q) + D(Q,R)

I wrote a short demo program to look at sKL, JS, and H distance and a few sample distributions. I was surprised that JS and H gave very similar results. The fact that KL does not satisfy the triangle inequality is a negative feature, so, in short, I judge JS and H distance preferable to KL and symmetric KL for most scenarios.

From an engineering perspective, JS(P,Q) has a minor downside that a naive computation will fail if you have paired values that sum to 0.0 — for example, P = [0.20, 0.00, 0.80] and Q = [0.40, 0.00, 0.60] will invoke a log(0.0) which is negative infinity. Hellinger distance doesn't have this kind of minor engineering glitch to watch-for.

Ultimately, if one distance function was clearly best, there would only be one distance function. The fact that there are so many distance functions suggests that either 1.) all work pretty much the same, or 2.) different functions work best for different problems, or 3.) both.

So, there is no definitive answer to, “Which distance function is best?” If you have a choice, Hellinger distance is appealing, but Wasserstein distance might be better (TBD based on some future experiments).

The artist who identifies himself as “batjorge” describes himself as an “iterative fractalist”. Whoever or whatever he is, he creates beautiful digital images where the distance between reality and art is just enough to make the images appealing to me.

Posted in Machine Learning | 3 Comments

Examples of Python Colormaps for Surface Plots

I was sitting in an airport recently, waiting to board a plane to fly to a conference. I wanted to make use of the time so I decided to review the different colormaps for use with Python matplotlib 3D surface plots. A colormap is a collection of colors that will be applied automatically to a surface plot, where the color depends on the value of the plot.

There are dozens and dozens of colormaps. I usually use “hsv” (Hue, Saturation, Value) or “jet” which range from red to organge to yellow to green to blue. But for some plots, different colormaps give a better visualization — it’s all very subjective. If you omit an argument value for the cmap parameter, you get shades of solid blue-gray which isn’t very nice for most plots.

A good reference for colormaps is the Web page at matplotlib.org/stable/tutorials/colors/colormaps.html. That page comments that the hsv colormap is not recommended for some visualizations, but I mildly disagree with that opinion.

For my demo surface function I used the simple sphere function, f(x,y) = x^2 + y^2.

Seven of the colormaps I use most often. The last tab20 colormap is not a smooth gradient of colors and so the coloring of the surface is discrete rather than continuous.

Large polyp stony corals are just that — stony-calcium bases with relatively large polyps (the flower-looking things). For some reason, corals have always frightened me somewhat — they look like man-eating plants that sting. But they’re pretty. Left: Red and green Blastomussa Wellsi. Center: Tricolor Gonistrea. Right: Orange tubastrea.

Here’s the demo code:

# colormaps_demos.py

from matplotlib import cm  # color map
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

X = np.linspace(-2, 2, 100)    
Y = np.linspace(-1, 2, 100)    
X, Y = np.meshgrid(X, Y)
Z = X**2 + Y**2  # "sphere" function

fig = plt.figure()
ax = fig.gca(projection='3d')

# change the value of the "cmap" parameter
surf = ax.plot_surface(X, Y, Z, \
  rstride=1, cstride=1, cmap=cm.hsv, \
  edgecolor='darkred', linewidth=0.1)

# plt.savefig('sphere.jpg')
Posted in Miscellaneous | Leave a comment

Data Entry For Machine Learning Is No Fun At All

I’m in the process of preparing my Zoltar system that predicts the outcomes of NFL football games. I’ve done Zoltar for several years. Every year I try new algorithms and techniques. This year I decided to write two little data entry programs: one for entering Las Vegas point spread information, and one for entering game results information.

In previous years I’d just enter data into text files in a more or less ad hoc way. Getting NFL point spread data and result data is surprisingly difficult — there is no standard source and so I end up manually entering data from various Web sites. It’s not fun.

My idea is simple. I start with schedule data. I iterate through each scheduled game for a specified week and prompt for either point spread or game result information. The process is still a bit tedious but it’s much better than entering data from scratch.

There are two morals to this story. First, most of machine learning starts with data, and sometimes data is not easy to get. Second, in computer science, there are times when it’s important to be very clever and come up with smart ideas, but there are times when you just have to grit your teeth and hard-wire logic and data into programs.

Manual hard-wired data entry may not be fun, but there are worse scenarios. It’s safe to assume these three drivers did not have the best possible hard-wired experience.

# make_result_data.py

# target: (winner listed first) - no "@" means home team won
# 1,Thu,September 10,8:20PM,Kansas City Chiefs,,Houston Texans,
#   boxscore,34,20,377,0,371,1
# 1,Sun,September 13,1:00PM,Seattle Seahawks,@,Atlanta Falcons,
#  boxscore,38,25,406,0,522,2

# from: (visitor listed first)
# //
# 1,Thu,September 9,Dallas Cowboys,,@,Tampa Bay Buccaneers
#  ,,8:20 PM
# 1,Mon,September 13,Baltimore Ravens,,@,Las Vegas Raiders
#  ,,8:15 PM

# -------------------------------------------------------------

# print("\nEnter visitor score first then home score ")
week = input("Enter week number: ")      # as string
result = ""                              # big result string
f = open("ScheduleData2021.txt", "r")
for line in f:
  line = line.strip()
  tokens = line.split(",")
  if tokens[0] != week:

  day = tokens[1]  # Mon
  month_date = tokens[2]  # September 13
  # month = month_date.split(" ")[0]  # September
  # date = month_date.split(" ")[1]   # 13
  visitor = tokens[3]
  home = tokens[6]
  time = tokens[8].split(" ")[0]    # 8:15
  ampm = tokens[8].split(" ")[1]    # PM

  print("Enter score for visitor team " + visitor + \
    " : ", end = "")
  visitor_score = int(input())
  print("Enter score for home team " + home + " : ", end = "")
  home_score = int(input())  
  s = week + ","             # 1
  s += day + ","             # Sun
  s += month_date + ","      # September 13
  s += time + ampm + ","     # 1:00PM

  if home_score "gte" visitor_score:  # home team won game
    s += home + ",,"
    s += visitor + ","
    s += "boxscore" + ","
    s += str(home_score) + ","
    s += str(visitor_score) + ","
    s += "-1,-1,-1,-1" + "\n"
    s += visitor + ",@,"
    s += home + ","
    s += "boxscore" + ","
    s += str(visitor_score) + ","
    s += str(home_score) + ","
    s += "-1,-1,-1,-1" + "\n"

  result += s


# -------------------------------------------------------------
Posted in Zoltar | Leave a comment

Dataset Similarity Using Autoencoded Jensen-Shannon Distance

A problem that pops up over and over in machine learning and data science scenarios is the need to compute the similarity (or nearly equivalently, difference or distance) between two datasets. At first thought, this doesn’t seem difficult, but the problem is extemely difficult. Briefly, if you try to compare individual lines between datasets, you hit the combinatorial explosion problem — there are just too many comparisons. Additionally, there are the problems of dealing with different dataset sizes, and dealing with non-numeric data. The bottom line is that there is no simple way to calculate the similarity/distance between two datasets.

Several months ago, I came up with an interesting technique for the distance between datasets P and Q. First, transform each data line of P and Q into an embedding vector representation using a deep neural autoencoder. For example, if you set the embedding dim to 3, a line of data like (male, 34, engineer, $72,000.00, Fullerton, 0.7717) might be transformed into a vector like [0.3256, 0.8911, 0.7936].

Next, each dataset representation determines a frequency distribution for each latent variable. For example:

P 1: [0.10, 0.00, 0.05, 0.05, 0.20, 0.10, 0.00, 0.30, 0.10, 0.10]
Q 1: [0.05, 0.10, 0.00, 0.15, 0.20, 0.10, 0.20, 0.10, 0.00, 0.10]

This means that for latent varaible 1 (of the three), in dataset P, 10% of the data items are between 0.00 and 0.10, 0% are between 0.10 and 0.20, 5% are between 0.20 and 0.30, and so on.

In my original thought, I figured to use the Kullback-Leibler divergence to compute the difference between the P and Q frequency distributions. But in my most recent thought I wondered about using Jensen-Shannon distance. So you compute the three distances between the three different P and Q distributions using JS. And last, you compute the average of the three JS distances to give the final distance between P and Q.

To test this idea, I coded it up using PyTorch. Then I created a reference dataset P that is 100 lines of the UCI Digits dataset. Each line represents a crude 8×8 handwritten digit and has 64 pixel values between 0 and 16 followed by the class label between 0 and 9. Then I made 10 different Q datasets where each is based on P but with a different percentage of lines where the values have been assigned randomly — 0%, 10%, 20% . . 100%. The idea is that if P and Q are the same, the distance should be 0, and as the percentage of randomization increases, the distance should increase.

My experiment worked very nicely and I was pleased.

There aren’t very many true identical twins actors and actresses. Among the most well known are actors James and Oliver Phelps who played Fred and George Weasley in the “Harry Potter” series. It’s much easier to get a single actor to play dual roles when a twin is needed. Here are three other pairs of true identical twins. Left: Mary and Madeline Collinson as vampires in “Twins of Evil” (1971). Center: Peter and David Paul as barbarian warriors in “The Barbarians” (1987). Right: Leigh and Lynette Harris as sisters being menaced by the evil wizard Traigon in “Sorceress” (1982).

Code below. Long. Continue reading

Posted in PyTorch | Leave a comment

Poking Around the Rosenbrock Function

The Rosenbrock function is a standard benchmark for optimization functions. The function can be defined in various dimensions. For n dimensions, f(X) = Sum_i[ 100* (x[i+1] – x[i]^2)^2 + (1 – x[i])^2 ]. The function has a minimum value of 0.0 at [1, 1, 1, . . 1].

The Wikipedia page for the Rosenbrock function.

The Rosenbrock function is interesting because it slopes steeply in many regions, but close to the global minimum, the function gets flat, which gives many optimization algorithms some trouble.

I was looking at an optimization algorithm called Differential Evolution and needed some test functions. I was somewhat familiar with the Rosenbrock function but hadn’t looked at it for quite some time, so I figured I’d explore the Rosenbrock function so I could use it as a test function for Differential Evolution experiments.

I coded up a little Python demo program that defined a version of the Rosenbrock function for n dimenions and also graph the function for n = 2 dimensions.

There’s no big moral to the story. The little moral to the story is that mathematics and machine learning are endlessly interesting.

Almost all of the tech conferences I speak at are in Las Vegas. Las Vegas is endlessly interesting. Left: A couple is getting married on the roller coaster in the New York New York hotel. Center: A marriage in the giant aquarium at the Silverton hotel. Right: A drive-through marriage in Vegas.

# rosenbrock_graph.py

from matplotlib import cm  # color map
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

def rosenbrock(x):
  dim = len(x)
  sum = 0.0
  for i in range(dim-1):
    a = 100 * ((x[i+1] - x[i]**2)**2)
    b = (1 - x[i])**2
    sum += a + b
  return sum

print("\nBegin Rosenbrock function demo ")
x = np.array([1.0, 1.0, 1.0, 1.0])
print("\nx = ", end=""); print(x)
z = rosenbrock(x)
print("f(x) = ", end=""); print(z)

x = np.array([-0.5, 0.2])
print("\nx = ", end=""); print(x)
z = rosenbrock(x)
print("f(x) = ", end=""); print(z)

print("\nGraphing Rosenbrock dim = 2 ")

# Rosenbrock dim = 2
X = np.linspace(-2, 2, 100)    
Y = np.linspace(-1, 3, 100)    
X, Y = np.meshgrid(X, Y)

a = 100 * ((Y - X**2)**2)
b = (1 - X)**2
Z = a + b

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, \
  rstride=1, cstride=1, cmap=cm.hsv, \
  edgecolor='darkred', linewidth=0.1)

# plt.savefig('rosenbrock.jpg')
Posted in Machine Learning | Leave a comment

Recap of the 2021 ISC West Security Conference

I gave a short talk at 2021 ISC West Security Conference. The event was held July 19-21 in Las Vegas, Nevada. My talk was titled “AI and ML for Cyber Threat Prevention”. The ISC West event covers all aspects of security, from physical systems like video surveillance to software systems. The event Web site describes ISC West as the largest security conference in the U.S. I estimate there were somewhere between 10,000 to 15,000 attendees and exhibitors.

Left: The title slide from my talk. It’s clear my graphic design skills are limited. My talk was arranged by a large technical company (who asked not to be named), and attendance was by invitation-only. Right: Most of the rooms at ISC West were reserved by companies for meetings with customers. I noticed Johnson Controls — I worked on a project with them about three years ago where we explored ML for temperature control.

In my presentation, first I gave a brief overview of machine learning techniques, with an emphasis on why supervized techniques (that require labeled training data) don’t work well in most security scenarios. Then I described four techniques: LSTM systems, Transformer Architecture systems, deep neural autoencoder reconstruction error, and variational autoencoder reconstruction probability. I briefly described isolation forest anomaly detection with an emphasis on its strengths and weaknesses.

The ISC West is an industry conference. This means there was a heavy emphasis on business, rather than on education and information. Most of the meeting rooms were reserved by various companies to hold sessions with clients. I find that learning about the business aspects of industry conference helps me understand the big picture of how software systems integrate into overall security platforms.

Left: ISC West was the first major in-person conference in Vegas since covid hit in March 2020. All attendees had to take a daily temperature check. Right: I saw this Cavalier King Charles Spaniel in the casino area of the Bellagio Casino. He was very friendly (but then, I’ve never met an unfriendly Cavalier). Those are my conference shoes in the background. My work shoes are tennis shoes.

By far the biggest aspect of the ISC West conference — and most industry conferences — was the event Expo. Approximately 400 companies set up display booths. I don’t fully understand the return-on-investment for these Expo booths. They are very expensive. I suspect that it’s mostly for companies to demonstrate they are major players in the security (or whatever industry) game. Or, put another way, a company’s absence from a major event might hurt its image and reputation. Maybe.

All in all, I think that attending the ISC West security conference was a good use of my time. I’m confident I represented my company well, and I learned many potentially important, business-related facts about integrated security — information that can be useful in my technical world of machine learning for security.

Whenever I evaluate a conference, I ask myself, “Am I eager to go back next year?” For ISC West 2021, the answer is a definite “yes”.

Left: The 2021 ISC West Expo had hundreds of interesting booths. Center: There were quite a few systems that protect from aerial drone attacks. Right: Everyone needs security systems, even Buddhist monks.

Left: At an Expo with hundreds of companies, it’s difficult to attract attention. But one company did so cleverly with a Mr. “Know-it-All” (aka technical expert), “Fuel” (aka beer), and “Booth Babe” (aka booth babe). Center: Many companies touted their ML and AI capabilities. Right: One of several themes at the Expo was security integration via IoT.

Posted in Conferences | Leave a comment