Whatever Happened to Percolation Theory?

Many years ago I was quite interested in mathematical percolation theory. But over the past 10 years or so I haven’t seen many new research papers published on the topic.

This figure illustrates some of the main ideas of percolation:

If some edges are randomly connected, how likely is it that there’s a path from the top to the bottom? In this example, there’s a 7×7 square grid. This creates 84 up-down and left-right edges. Edges were connected randomly with p = 0.4 (which means that the probability that an edge wasn’t connected is 0.6). After the random connections were added, there were 27 out of 84 edges connected, so the actual probability of a connection in this particular grid is 27/84 = 0.32.

The resulting grid does in fact have a connection from top to bottom. The connected path contains 12 of the 27 connected edges so 12/27 = 0.4 of the edges are part of the connection.

There has been a huge amount of research done on percolation theory. One of the results is, for a square grid, that if the probability of connection is 0.5, then there is likely a connected path from top to bottom. Therefore, in this example, it was a bit lucky that a connected path exists.

Note: Let me point out that I have (deliberately) butchered several of the vocabulary terms and a few of the ideas to keep the explanation simple and understandable. If you are a mathematician who works with percolation theory, please don’t slaughter me in the Comments to this post.

As with any math topic, there are gazillions of generalizations. There are many kinds of grids (more accurately lattices), many kinds of connections (bond vs. site, etc.), 3D versions, n-dimensional versions, and on and on. Percolation theory has many relationships with graph theory, fractals, lattice theory, cellular automata, and other areas of math.

So, back to my original question: Why are there so few new research papers on percolation theory? First, maybe there are lots of new research papers on percolation and I’m just not seeing them. But if the number of research papers is in fact declining, I suspect that either 1.) percolation theory has not proven to be particularly useful in applied areas (such as oil moving through porous rock or materials science), or 2.) all the main ideas in percolation theory have been explored. But this is speculation on my part.

Percolation means, “the slow passage of a liquid through a filtering medium”. But in informal usage percolation means giving some thought to a plan or idea. Here are four examples from an Internet search for “bad idea crime” where criminals should have percolated a bit more. (Interestingly, none of stories with these photos identified the criminal — perhaps too embarrassing). Left: A man in a wheelchair in Brazil tries to rob a store while holding a gun with his feet. Did not go well. Left-Center: A man in Philadelphia tries to rob a store using a banana. Not a good idea. Right-Center: A man in Missouri robs a convenience store but a few hours later he tried to grab a gun away from a police officer. Did not go well. Right: A woman in Ohio tried to diguise herself using a cow costume. The mugshot shows her plan needed more thought.

Posted in Miscellaneous | Leave a comment

Spiral Dynamics Inspired Optimization Demo Using Python

I’ve been looking at an interesting optimization algorithm based on a 2011 research paper titled “Spiral Dynamics Inspired Optimization” by K. Tamura and K. Yasuda. SDI optimization is somewhat similar to particle swarm optimization (SPO). Briefly, SDI sets up a collection of random possible points/solutions. The best current solution is set as a center, and then the points are moved towards the current center in a spiral motion. A new center is set, and the process continues.

I put together a simple end-to-end demo using Python. The goal is to minimize the Sphere function in 3 dimensions, f(x,y,z) = x^2 + y^2 + z^2. The function has a known minimum of f = 0.0 at (0, 0, 0).

The parameters of the SDI optimization technique are:

m = number of points/solutions
theta = controls curvature of spiral movement
r = controls how quickly points move towards center
max_iter = number of times to iterate

The SDI optimization technique uses a clever rotation matrix R that determines how each point/solution moves towards the current spiral center. My demo hard-codes the R matrix. In a non-demo scenario, I’d write code to programmatically generate R.

A very interesting experiment. I intend to look at SDI optimization in more depth. When I fully understand the subtleties of the technique, I’ll write up a complete explanation.

I’m not a big fan of movies that have heavy doses of symbolism but “Dark City” (1998) has an interesting plot too. Aliens kidnap humans and put them in a fake city where it’s always night, and then experiment with their memories. The movie has cinematographic themes of spirals and circles. From left: A spiral clock with no hands, one of the humans suspects the truth and becomes obsessed with spirals, the hero has a memory implanted that makes him think he murdered a prostitute, a spiral staricase in a chase scene.

Code below. Not thoroughly tested.

# spiral_optimization.py
# minimize Sphere function using spiral optimization

import numpy as np

def error(X):
  # compute Sphere value for n=3, take squared diff
  z = 0.0
  for j in range(3):
    z += X[j]**2
  err = (z - 0.0)**2
  return err

def find_center(points, m):
  best_err = error(points[0])
  idx = 0
  for i in range(m):
    err = error(points[i])
    if err "less-than" best_err:
      idx = i; best_err = err
  return np.copy(points[idx])    

def move_point(X, R, center):
  # move X vector to new position, spiraling around center
  I3 = np.array([[1,0,0], [0,1,0], [0,0,1]], dtype=np.float32)
  RminusI = R - I3
  offset = np.matmul(RminusI, center)
  new_X = np.matmul(R, X) - offset
  return new_X  

def main():
  print("\nBegin spiral dynamics optimization demo ")
  print("Goal is to minimize Sphere function in dim=3")

  # 0. prepare
  np.set_printoptions(precision=6, suppress=True, sign=" ")
  theta = np.pi/4  # radians. small = curved, large = squared
  r = 0.95  # closer to 1 = tight spiral, closer 0 = loose 
  m = 5  # number of points / possible solutions
  # n = 3  # problem dimension is hard-coded
  max_iter = 100
  print("\nSetting theta = %0.4f " % theta)
  print("Setting r = %0.2f " % r)
  print("Setting number of points m = %d " % m)
  print("Setting max_iter = %d " % max_iter)

  # 1. set up the Rotation matrix for n=3
  print("\nSetting up hard-coded spiral Rotation matrix R ")
  R12 = np.array([[np.cos(theta), -np.sin(theta), 0],
                  [np.sin(theta),  np.cos(theta), 0],
                  [0,              0,             1]],

  R13 = np.array([[np.cos(theta),  0, -np.sin(theta)],
                  [0,              1,              0],
                  [np.sin(theta),  0,  np.cos(theta)]],

  R23 = np.array([[1,                 0,          0],
                  [0, np.cos(theta), -np.sin(theta)],
                  [0, np.sin(theta),  np.cos(theta)]],
  R = np.matmul(R23, np.matmul(R13, R12))  # compose
  R = r * R  # scale

  # 2. create m intial points and find curr center (best point)
  print("\nCreating %d initial random points " % m)
  points = np.random.uniform(low=-5.0, high=5.0, size=(m, 3))

  center = find_center(points, m)
  print("\nInitial center (best) point: ")

  # 3. spiral the points towards curr center
  print("\nUsing spiral dynamics optimization: ")
  for itr in range(max_iter):
    if itr % 20 == 0:
      print("itr = %5d  curr center = " % itr, end="")
    for i in range(m):  # move each point toward curr center
      X = points[i]
      points[i] = move_point(X, R, center)
    # print(points); input()
    center = find_center(points, m)  # find new center
  # 4. show best center found 
  print("\nBest solution found: ")

  print("\nEnd spiral dynamics optimization demo ")  

if __name__ == "__main__":
Posted in Machine Learning | Leave a comment

Rastrigin Function Graph Using matplotlib With Edge Colors

The Rastrigin function is a standard benchmark problem for optimization algorithms. The general form of the function in n dimensions is f(X) = Sum[xi^2 – 10*cos(2*pi*xi)] + 10n. For the specific case when n = 2, the function is f(x1, x2) = [(x1^2 – 10*cos(2*pi*x1)) + (x2^2 – 10*cos(2*pi*x2))] + 20.

I have graphed the Rastrigin function using R, MatLab, SciLab (a MatLab clone), Excel, and Python’s matplotlib library. I most often use matplotlib. When I used matplotlib a few years ago, the default surface plot had a wireframe outline that made the shape of the function easier to see. But at some point in early 2018, the default settings for the surface_plot() function no longer added a wireframe. This is nice for very simple graphs, but for complex shapes like the Rastrigin function, the lack of a wireframe made it difficult to understand the shape of the graph.

Here is the Rastrigin function for n = 2, using default no-wireframe outline (on the left), and with a wireframe outline added (on the right). To me at least, the graph on the right is clearly preferable.

My annoyance level finally hit a threshhold and I sat down to figure out how to add a wireframe outline to a matplotlib surface plot. If you aren’t familiar with graphing libraries, your first impression is likely, “Oh come on McCaffrey, how hard could it be to look at the matplotlib documentation and find the appropriate parameter(s)?” If you are familiar with graphing libraries, your first impression is something like, “Yeah, good luck with that. Let me know what you find over the next several days.”

Anyway, as always, the answer is easy once you know what to do: add the edgecolor and linewidth parameters to the plot_surface() function. The diffiulty is that these two parameters aren’t explicit parameters of plot_surface() — they are parameters to a hidden base class Poly3DCollection class.

Anyway, I don’t think there’s a significant moral to this story/post. But maybe it’s that for “us” (meaning me and you and anyone else who reads blog posts like this one), we mentally store up and remember minor technical problems, put them in our mental TODO list, and these problems don’t go away until we eventually solve them.

The matplotlib show_plot() function constructs a surface by using many small polygons. Modern photo editing software can convert a photograph to an image made of polygons. I’m not a huge fan of the low poly portraits because they lack the soul of pure-human art, but here are three nice ones that somewhat resemble the image of the Rastrigin function graph. Notice that none of the three illustrations uses a line edge.

# rastrigin_graph.py

from matplotlib import cm  # color map
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

X = np.linspace(-5, 5, 200)    
Y = np.linspace(-5, 5, 200)    
X, Y = np.meshgrid(X, Y)

Z = (X**2 - 10 * np.cos(2 * np.pi * X)) + \
  (Y**2 - 10 * np.cos(2 * np.pi * Y)) + 20

fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, \
  rstride=1, cstride=1, cmap=cm.jet, \
  edgecolor='darkred', linewidth=0.1)
# plt.savefig('rastrigin.png')
Posted in Machine Learning | Leave a comment

Anomaly Detection Using Principal Component Analysis (PCA) Reconstruction Error

I was working on an anomaly detection system recently. The system used a deep neural autoencoder. As part of the system evaluation, we looked at anomaly detection using principal component analysis (PCA). PCA is a classical statistics technique that decomposes source data in a very clever, complicated way. If you only look at part of the decomposition components, you are performing a kind of dimensionality reduction.

You can use the decomposition to reconstruct a prediction of the original source data. If you compare the reconstructed data with the original source data and compute error as the difference between source and reconstruction, items with large reconstruction error are anomalous in some way.

I coded up a partial demo of anomaly detection using PCA reconstruction error. I used several resources I found on the Internet, especially a response to a question at stats.stackexchange.com/questions/229092/how-to-reverse-pca-and-reconstruct-original-variables-from-several-principal-com.

The source data is the well-known Fischer Iris Dataset. It consists of 150 items. Each data item has four values: sepal length and width, and petal length and width.

The demo loads the 150-item Iris source data into memory then computes PCA components and uses that information to transform the source data. Because each data item has four values, there are four principal components. It is possible to completely reconstuct the source data by using all four principal components, but this isn’t useful for anomaly detection. The demo code uses just 2 of the 4 components to reconstruct the source data.

The first two data source items and their reconstructions are:

source items:
[ 5.1  3.5  1.4  0.2]
[ 4.9  3.0  1.4  0.2]

[ 5.0830  3.5174  1.4032  0.2135]
[ 4.7463  3.1575  1.4636  0.2402]

The reconstructions are very close to the source items. This make sense because the Iris data is very simple and there aren’t any major anomalies. To complete an anomaly detection system, I’d compute squared error between each source item and its reconstruction, then sort by error from large to small.

Based on my experience, anomaly detection based on neural autoencoder reconstruction error works better than detection based on PCA reconstruction error. But even in the worst case scenario, PCA reconstruction error anomaly detection can serve as a baseline for evaluation of other techniques.

Good fun.

Forensic reconstruction takes human remains and uses a variety of techniques to create a lifelike image that’s often very close to what the deceased person truly looked like. Left: An Egyptian girl known as “Meritamun” was about 20 years old when she died approximately 2,000 years ago. Her light complexion indicates she was a member of the upper class rather than a dark-skinned slave. Center: A woman from Britain. She lived approximately 5,000 years ago during the neolithic era and was about 22 years old when she died. Right: “Jane” was about 15 years old when she died during a famine in 1609 in Jamestown, Virginia, the first English settlement in the Americas.

Code below. Continue reading

Posted in Machine Learning | Leave a comment

Spiral Dynamics Optimization in 3D

I’m slowly but surely dissecting the ideas in a 2011 research paper titled “Spiral Dynamics Inspired Optimization” by K. Tamura and K. Yasuda. The idea of SDI optimization is an algorithm to find the minimum value of some function using a geometric technique, rather than a technique based on Calculus gradients. SDI optimization is similar in some respects to particle swarm optimization (SPO) and Nelder-Mead optimization (also known as amoeba method).

My most recent experiment is best explained by this image of a demo program:

The problem is set up in 3 dimensions. The starting point is at (10, 10, 10) and the goal of the demo is to iterate so that each succeeding point gets closer to the center point of (0, 0, 0) by moving in a spiral.

The ideas aren’t terribly complicated, but they’re difficult to explain. In a 2D scenario if you have a point (x,y), you can move in a spiral by multiplying the point by a fraction (like 0.95) of the matrix R12 =

cos(theta)  -sin(theta)
sin(theta)   cos(theta)

Here theta is an angle in radians, typically like pi/4.

In 3D you can look at each pair of the dimensions and compute three R matrices:

R12 =
cos(theta)  -sin(theta)   0
sin(theta)   cos(theta)   0
0            0            1

R13 =
cos(theta)   0           -sin(theta)
0            1            0
sin(theta)   0            cos(theta)

R23 =
1            0            0
0            cos(theta)  -sin(theta)
0            sin(theta)   cos(theta)

Here R12 means in the 1st and 2nd plane, or x-y plane. R13 is the x-z plne and R23 is the y-z plane.

If you multiply a 3D point (x,y,z) by any one of the three matrices, you will spiral in the associated plane. But if you compose the three matrices like R = R23 * R13 * R12, you will get a matrix that will spiral in 3D. Quite tricky.

Spiraling in 3D sets up the core of an optimiztion technique where you can minimize some function of three variables. Briefly, you set up several random, possible solutions. Each initial possible solution looks something like (13.0, 14.5, 11.7). From the collection of possible solutions, you select the current best solution and use it as the spiral center. Then you spiral all possible solutions towards the current center. Then you compute a new center, and continue.

As you iterate you explore possible solutions in a principled way and will eventually find a good (but not necessarily best) solution.

The next task on my list of things to understand is to extend the spiral mechanism to 4 or more dimensions. I have a lot more work to do before I get a working end-to-end spiral optimization demo working, but I’m confident I’ll get there eventaully.

Some of the most memorable women in old science fiction movies have spiral inspired hair styles. Left: Princess Achen played by Zeta Graff in “The Fifth Element” (1997). Left Center: Dame Vaako played by Thandie Newton in “The Chronicles of Riddick” (2004). Right Center: Queen Amidala played by Natalie Portman in “Star Wars the Phantom Menace” (1999). Right: Lady Jessica played by Francesca Annis in “Dune” (1984).

Code below.

# spirals.py
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

print("\nBegin 3D spiral rotation demo ")

np.set_printoptions(precision=3, suppress=True)
theta = np.pi/4  # in radians. small = curved, large = squared
r = 0.95  # closer to 1 = tight spiral, closer 0 = loose 

R12 = np.array([[np.cos(theta), -np.sin(theta), 0],
                [np.sin(theta),  np.cos(theta), 0],
                [0,              0,             1]],

R13 = np.array([[np.cos(theta),  0, -np.sin(theta)],
                [0,              1,              0],
                [np.sin(theta),  0,  np.cos(theta)]],

R23 = np.array([[1,                 0,          0],
                [0, np.cos(theta), -np.sin(theta)],
                [0, np.sin(theta),  np.cos(theta)]],
R12 = r * R12; R13 = r * R13; R23 = r * R23


R = np.matmul(R23, np.matmul(R13, R12))  # compose

X = np.array([10.0, 10.0, 10.0], dtype=np.float32)  # 1x3
X = X.T  # 3x1 column vector, starting point

xd = []; yd = []; zd = []  # for plotting

for k in range(7):
  xd.append(X[0]); yd.append(X[1]); zd.append(X[2])
  print("k = " + str(k) + "   X = ", end="")
  X = np.matmul(R, X)

fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.set_xlim3d(-14, 14)
ax.set_ylim3d(-10, 10)
ax.set_zlim3d(-14, 14)

for i in range(len(xd)):
  ax.scatter(xd[i], yd[i], zd[i], c='red')
  ax.text(xd[i], yd[i], zd[i], '. k=%s' % (str(i)), size=10, \
    zorder=1, color='k')

ax.scatter(0,0,0, c='green')
ax.text(0,0,0, '. %s' % 'center', size=8, zorder=1, color='k') 




print("\nEnd demo ")
Posted in Machine Learning | Leave a comment

The Birthday Paradox

If you have 70 people in a room, what is the probability that two or more of them have the same birthday? Most people guess something like 70/365 = 0.19 or 19%. But amazingly, the probability is 0.9990 — almost for sure. This is called the birthday paradox.

If you select n people at random, from all living people, what is the probability that two or more of them share the same birthday (such as July 4 or September 20)? For example, if n = 2 people, the first person can have any birthday with probality 365/365 = 1. Then the second person will have a different birthday with p = 364/365. Therefore, the probability that the two people have different birthdays is (365/365) * (364/365) = 0.9973, and therefore the probability that the two people have the same birthday is 1 – 0.9973 = 0.0027.

Now, suppose there are n = 3 people. Using the pattern above, the probability that all three have different birthdays is (365/365) * (364/365) * (363/365) = 0.9918, and therefore the probability that two or more have the same birthday is 1 – 0.9918 = 0.0082.

You can continue looking at different values of n up until n = 365 because if you have 365 people, in the absolute most extreme case, they all have different birthdays. But as soon as you hit n = 366, at least two people must have the same birthday.

Here are probabilities that in a room with n people, two or more have the same birthday, for different values of n, from the Wikipedia article on the topic.

OK, it’s not surprising that the probability that two people have the same birthday is small. But what is really unexpected is that as the number of people in the room increases, the probability that two or more people share a birthday gets very large very quickly.

Just for fun, I coded up a short simulation to estimate the probability that in a room with n people, two or more shar the same birthday. I ran the program for n = 30 people and got p = 0.704 which is very close to the exact probability of 0.706.

There are approximately 220 countries in the world that celebrate an Independence Day of some sort. It’s not mathematically surprising that many of the country birthdays fall on the same month and day. For example, Armenia, Belize, and Malta all mark September 21 as their country birthday.

# birthday_paradox.py
# in room with n=30 people, estimate prob that 2 or more 
# people have the same birthday.

import numpy as np

def has_two_or_more(days):
  for i in range(len(days)):  # 365
    if days[i] "greater-equal" 2:  # replace with symbols
      return True
  return False

print("\nBegin Birthday Paradox simulation ")

num_trials = 100000
n = 30
num_success = 0  # two or more same
num_fail = 0     # no same

print("\nEstimating prob two or more same for n = %d people " % n)
print("Using %d trials " % num_trials)

for trial in range(num_trials):
  days = np.zeros(365, dtype=np.int64)  # record days
  for i in range(n):  # make n birthdays
    bd = np.random.randint(low=0, high=365)  # [0, 364]
    days[bd] += 1

  if has_two_or_more(days) == True:
    num_success += 1
    num_fail += 1

p = (num_success * 1.0) / num_trials
print("\nEstimated p = %0.4f " % p)

print("\nEnd simulation ")
Posted in Miscellaneous | Leave a comment

Why Do I Never Remember the Differences Between ROC AUC and PR AUC?

I was working on an anomaly detection system recently. Whem working with any machine learning prediction system, you should evaluate the effectiveness of the system. The basic effectiveness metric is prediction accuracy. But in systems where there is imbalanced data, accuracy isn’t very good. For example, in an anomaly detection scenario, suppose 99% of your data items are normal (class 0, negative) and 1% are anomalous (class 1, positive). If you just predict class 0 for every input item you’ll get 99% accuracy.

So, with imbalanced data you should look at precision and recall. Both of these metrics ignore true negatives, but in a slightly different way.

In addition to imbalanced data, a second characteristic of ML prediction systems is the threshold value. Most prediction systems emit a pseudo-probability (pp) value between 0 and 1. The default approach is to set a threshold value of 0.5, and then a computed pp less than 0.5 indicates class 0 = normal = negative, and a computed pp value greater than 0.5 indicates class 1 = anomalous = positive. But you can set the threshold value to whatever you want. Adjusting the threshold value will change accuracy, precision, and recall.

OK, I’m slowly getting to the point.

An anomaly detection system is a binary prediction system — an input item is predicted as normal or anomalous. In such situations there are exactly four possible outcomes when you make a prediction.

TP: true positive (corectly predict an anomaly)
TN: true negative (correctly predict a normal)
FP: false positive (incorrectly predict an anomaly as normal)
FN: false negative (incorrectly predict a normal as anomaly)

For a specific threshold value, you can calculate four basic metrics for an ML prediction system:

accuracy  = (TP + TN) / (TP + TN + FP + FN) - uses all
precision = TP / (TP + FP) - ignores TNs
recall    = TP / (TP + FN) - ignores TNs
F1        = harmonic_avg(pre, rec) - ignores TNs

The F1 score is the harmonic average of precision and recall. You need to look at both precision and recall because if you adjust the threshold, if precision goes up (good) then recall will go down (bad), and vice versa.

The area under the ROC curve for model B is greater than the area under model A, so model B is better overall. Each ROC curve is really just three points that correspond to three threshold values.

Now if you evaluate a prediction system using different theshold values, you can graph the results. There are two common graphs — the ROC (hideously named receiver operating characteristic) curve, and the PR (precision-recall) curve:

ROC curve = FP (x-axis) vs. TP (y-axis) - ignores TN and FN
PR curve  = recall (x-axis) vs. precision (y-axis) - ignores TN

Each curve is made of many points. One way to summarize a curve/graph as a single number is to compute the area under the curve (AUC). So there is ROC AUC and PR AUC.

Putting it all together, ROC AUC is a metric that summarizes model effectiveness (over different threshold values), which ignores all negative / normal / class 0 items. PR AUC is a metric that summarizes model effectivenss, which ignores true negatives. Both are appropriate to use when there are many class 0 / normal items and very few class 1 / anomalous items.

Here are two scenarios where you don’t need a machine learning model to predict what will happen.

Posted in Machine Learning | Leave a comment

A Quick Look at the Oqtane Framework

Oqtane is an open source framework for building Web applications based on Microsoft .NET Blazor technology.

Whoa. That’s a lot to grok.

It’s not all that difficult to create a Web application using basic HTML, CSS, and JavaScript. But there are a truly bewildering number of alternative approaches. There are endless JavaScript-based frameworks. And Microsoft has released one technology after another for almost 20 years, starting with classic ASP with VBScript, followed by ASP.NET with Web Forms, ASP.NET Web API, ASP MVC, Silverlight, Web Matrix, Razor, etc., etc.

The latest Microsoft Web technology is called Blazor. There are actually two flavors of Blazor. You can create a strictly client-side application that runs entirely in a user’s browser using a technology called WebAssembly. Or you can write a server-side application where the browser sends requests to the server and gets reponses using a technology called SignalR.

Blazor allows developers to use the C# language and the .NET Core framework, and so Blazor is a very complex wrapper over HTML, CSS, and JavaScript.

Oqtane is a framework. One way to think of a Web framework is that you get a pre-built Web application that is ready to be extended and customized. So Oqtane is a very complex wrapper over Blazor which is a very complex wrapper over HTML, CSS, and JavaScript.

Here’s an example of a basic Web application using Oqtane. There are zillions (approximately) of dependencies. Ultimately all of this crazy-complex code gets translated down to simple HTML, CSS, and JavaScript.

The main advantage of working with a framework like Oqtane (or any of the dozens of other Web frameworks) is that you get a working application immediately.

The cons of working with a Web framework include: 1.) the application is extremely complex and enormous in code size compared to a from-scratch version, 2.) if the framework was designed to handle a particular scenario you are good but if you have an unanticipated scenario you are in a world of hurt, 3.) once you start down a path with a framework, you are stuck with the framework permanently, 4.) because there are thousands of dependencies, there are thousands of ways your application code will break due to some sort of change in the dependency code, 5.) frameworks often have a very steep learning curve.

The Oqtane framework is essentially a port to .NET Core (aka .NET 5) of the old DotNetNuke framework, which was originally designed for content management but evolved into a more general framework.

In general, I don’t enjoy working with Web frameworks, but as a person who often uses Microsoft technologies, Oqtane is one of the better Web application frameworks I’ve seen. If you work on Web development, Oqtane is worth a look in my opinion.

An art style is somewhat of a virtual framework that guides artist when they render an illustration. “Ligne claire” is a style of art that is characterized by strong lines of the same width, little contrast, shadows are often illuminated, strong colors, and a combination of cartoon-like characters against a realistic background. The style was popularized by Georges Remi, the Belgian artist known as Herge, who did the Tintin adventures. Ligne claire means clear line in French.

Left: Tintin is in the blue sweater, with Captain Haddock, in “The Calculus Affair”. Center: A panel by artist Peter van Dongen, done for a Dutch magazine article on bookstores. Right: A panel from “The Rainbow Orchid” by artist Garen Ewing. Adventurer hero Julius Chancer is in trouble against the evil villainess Evelyn Crow.

Posted in Miscellaneous | Leave a comment

Creating a Custom Python Vocabulary Object From Scratch For NLP Problems

I’ve been looking at natural language processing (NLP) problems on-and-off for quite some time. One weekend I decided to implement a custom Vocabulary class from scratch. A Vocabulary object accepts a word/token and returns a unique integer ID. For example,

vocab = make_vocab(src_text, min_freq)
idx = vocab["and"]        # calls stoi[]
print(idx)                # 7
idx = vocab.stoi["code"]  # stoi directly
wrd = vocab.itos[7]
print(wrd)                # 'and'

The TorchText library has a built-in Vocab object but it’s very complex (so that it can handle almost any scenario) and you must still write code to parse the source text and frequencies and feed them to the Vocab constructor.

Writing a custom Vocabulary object isn’t too difficult — I guess. I’ve done so several times and it’s easy for me now. But I remember it wasn’t so easy the first time I tried.

An interesting design choice is how to work with the internal stoi (“string to integer”) dictionary object. Because you have to deal with words/tokens that aren’t in the dictionary, for example “flooglebarge”, you want to return ID = 0 for unknown words. A common strategy for situations like this is to write a __getitem__() function that calls into the built-in stoi dictionary get() method (all Python dictionaries have a get() method) and uses the default parameter for get() so that an unknown word returns the default value. Tricky.

Most Vocabulary objects implement an itos (“integer to string”) method that accepts an ID and returns the corresponding word/token. That can be implemented as either an ordinary list (because lists are accessed by an integer index) or as a dictionary (to keep design symmetry with the stoi dictionary).

All in all, it was an interesting exporation that solidified my knowledge of Vocabulary objects for NLP problems.

Four of the many different book covers for the James Bond spy novel “Casino Royale” by author Ian Fleming. Published in 1953, this was the first novel in the series. It was fairly well received in the U.K. but did poorly in the U.S. But in an interview in 1961, U.S. President J.F. Kennedy mentioned that “From Russia with Love” (1957, fifth in the series) was one of his favorite novels, and book sales exploded.

A Vocabulary object based on a Bond novel’s text would likely have separate IDs for “Bond” (the spy) and the verb “bond” (to combine things) and the noun “bond” (the financial instrument or word meaning a coupling).

Code below. Long. Continue reading

Posted in Machine Learning, PyTorch | Leave a comment

Exploring Spiral Dynamics Inspired Optimization

I came across an interesting research paper titled “Spiral Dynamics Inspired Optimization” (2011) by K. Tamura and K. Yasuda. The idea of SDI optimization is an algorithm to find the minimum value of some function using a geometric technique, rather than a technique based on Calculus gradients. SDI optimization is similar in some respects to particle swarm optimization (SPO) and Nelder-Mead optimization (also known as amoeba method).

The first two pages of the research. My goal was to use the equations circled in red to reproduce the left-most graph shown in Figure 3.

One possible use of SDI optimization would be to train a neural network — finding values for the network weights and biases that minimize error between computed output values and target output values from training data.

SDI is based on spirals. If you want to find a point on the xy plane, you can start near a corner then spiral towards the center until you find the target point or get close. You can extend the idea by starting at several points and have all of them spiral toward a moving center, where the moving center is the best known point so far.

I decided to explore SDI optimization. The ideas are moderately complicated so I decided to dissect the problem one step at a time. My first step was to understand how to spiral a single point in the 2D xy plane towards some specified, fixed center. Ultimately I’ll need to spiral several points in n-dimensions towards a moving center, where each point represents a possible solution to the target minimization problem.

Using the research paper as a guide, I coded up a short demo. The starting point is (10, 10). The center point is (0, 0). The parameters that control the spiral are theta = pi / 4, and r = 0.95. The theta value controls how curved the spiral is, and the r value controls how tight the spiral is.

The demo code is both simple and tricky. The code is simple in the sense that it uses only basic techniques such as sin and cos rotation, and matrix multiplication. But the code is tricky in the sense that if a person doesn’t fully understand the basic techniques, I think the code would be nearly impossible to understand.

My next steps will be to figure out how to spiral points in n-dimensions, and then use that technique to solve a simple problem such as finding the values of (x1, x2, x3, x4) that minimize the sphere function f(x1, x2, x3, x4) = x1^2 + x2^2 + x3^2 + x4^2.

Two center images: Padaung women are ethnic minorities from Myanmar (Burma). They are known for striking neck spiral coil jewelry (not rings as sometimes thought). The style has been adapted for modern jewelry too. To me, a few coils are interesting and attractive, but when the coil gets too long, the appearance gets mildly creepy. The four examples here are about as extreme as I find interesting rather than disturbing.

# spirals.py
import numpy as np

print("\nBegin spiral rotation demo ")

np.set_printoptions(precision=4, suppress=True)

center = np.array([0.0, 0.0], dtype=np.float32)  # 1x2
center = center.T    # transpose

X = np.array([10.0, 10.0], dtype=np.float32)  # 1x2
X = X.T  # 2x1 column vector

print("\nCenter = ", center)
print("Initial point = ", X)

theta = np.pi/4  # in radians. small = more curved
r = 0.95  # closer to 1 = tight spiral, closer to 0 = loose spiral

R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]], dtype=np.float32)
rR = r * R
I2 = np.array([[1.0, 0.0], [0.0, 1.0]], dtype=np.float32)
# foo = rR - I2
# bar = np.matmul(foo, center)
offset = np.matmul((rR - I2), center)

for k in range(5):
  print("k = " + str(k) + "   X = ", end="")
  X = np.matmul(rR, X) - offset # (2x2) * (2x1) = (2,)

print("\nEnd demo ")
Posted in Machine Learning | Leave a comment