NFL 2021 Week 20 (Division Championships) Predictions – Zoltar Likes the Vegas Favorite Chiefs to Cover

Zoltar is my NFL football prediction computer program. It uses custom reinforcement learning and a neural network. Here are Zoltar’s predictions for week #20 (division championships) of the 2021 season. It usually takes Zoltar about four weeks to hit his stride and takes humans about eight weeks to get up to speed, so weeks six through nine are usually Zoltar’s sweet spot. After week nine, injuries start having a big effect.

Zoltar:      titans  by    6  dog =     bengals    Vegas:      titans  by    3
Zoltar:     packers  by    6  dog = fortyniners    Vegas:     packers  by    5
Zoltar:  buccaneers  by    5  dog =        rams    Vegas:  buccaneers  by    3
Zoltar:      chiefs  by    6  dog =       bills    Vegas:      chiefs  by    2

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. In mid-season I usually use 3.0 points difference but for the first few weeks and last few weeks of the season I go a bit more conservative and use 4.0 points difference as the advice threshold criterion. In middle weeks I sometimes go ultra-aggressive and use a 1.0-point threshold.

Note: Because of Zoltar’s initialization (all teams regress to an average power rating) and other algorithms, Zoltar is much too strongly biased towards Vegas underdogs. I need to fix this.

For week #20 (division championships):

Zoltar likes Vegas favorite Chiefs over the Bengals

A bet on the favorite Chiefs will pay off only if the Chiefs win by more than the point spread of 2.0 points (in other words by 3 points or more). If the Chiefs win by less than 2.0 points or if the Bills win by any score, the wager on the Bills is lost. If the Bills win by exactly 2 points, the wager is a push.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #19, against the Vegas point spread, Zoltar went 0-2 (using the standard conservative 4.0 points as the advice threshold). Zoltar liked the underdog Raiders who came close against the Bengals, and Zoltar liked the underdog Steelers but they got crushed by the Chiefs. Overall, for the season, Zoltar is 65-53 against the spread (~55%).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #19, just predicting the winning team, Zoltar went 5-1 which is quite good.

In week #19, just predicting the winning team, Vegas — “the wisdom of the crowd” also went 5-1, the same as Zoltar.

Zoltar sometimes predicts a 0-point margin of victory, which means the two teams are evenly matched. There are no such games in week #20. In those situations, to pick a winner (only so I can track raw number of correct predictions) in the first few weeks of the season, Zoltar picks the home team to win. After that, Zoltar uses his algorithms to pick a winner.



Left: My system is named after the Zoltar fortune teller machine you can find in arcades. Coin-operated fortune teller machine have been around for well over 100 years.

Center: On the “Mystic Mirror” machine, you set a pointer to one of six questions — 1. Your future wife. 2. Am I going to travel? 3. Shall I be wealthy? 4. Am I going to marry? 5. Shall I have a family? 6. Your future husband. The answer is displayed in a mutoscope screen, sort of an early motion picture device that works like a flip book with opaque photographs.

Right: The “Wizard Fortune Teller” machine is similar. The six questions are 1. (men) My future occupation. 2. What is my principal quality? 3. What is my greatest defect? 4. How many times shall I marry? 5. Shall I soon fall in love? 6. (women) My future occupation. The answer is displayed in a sign in a small window.


Posted in Zoltar | Leave a comment

The Flax Neural Network Library

I came across two interesting, related, Python libraries recently: JAX and Flax. JAX (“just after execution”) is sort of an enhanced NumPy (numerical Python) library. JAX adds support for numeric arrays on GPU and TPU hardware, and automatic gradient calculation. Flax is a neural network code library, somewhat similar to PyTorch or TensorFlow, that is built upon JAX.

I asked the Flax GitHub Discussion board if Flax is an acronym or not. According to two of the main contributors, the name stands for both “functional layers for JAX” (in early versions) and “flexible JAX” (from the design principles).

Here’s a code snippet from the Flax documentation that creates a 10-[12-8]-4 neural network.

from typing import Sequence

import numpy as np
import jax
import jax.numpy as jnp
import flax.linen as nn

class MLP(nn.Module):
  features: Sequence[int]

  @nn.compact
  def __call__(self, x):
    for feat in self.features[:-1]:
      x = nn.relu(nn.Dense(feat)(x))
    x = nn.Dense(self.features[-1])(x)
    return x

model = MLP([12, 8, 4])
batch = jnp.ones((32, 10))
variables = model.init(jax.random.PRNGKey(0), batch)
output = model.apply(variables, batch)

In even a tiny example like this, there is a lot going on. In the import statements, notice that Flax has dependencies on jax and jax.numpy — that’s a big topic by itself.



Two pages from the Flax documentation Web site. It looks like getting up to speed with Flax would take many weeks of dedicated effort.


The code snippet uses the relatively new typing library so that a Sequence can be typed to hold int values instead of List of arbitrary types.

The @nn.compact is a syntax mechanism for simple neural networks to skip a setup() method.

The JAX library uses a somewhat unusual API for random number generation.

I’m intrigued by Flax. I am very familiar with the PyTorch and Keras libraries, and part of me is thinking, “PyTorch works perfectly well for the work I do, and PyTorch has a large, fairly stable ecosystem. So why should I spend valuable time learning a new neural library?”

But another part of me is thinking, “Flax and JAX look very well thought-out and probably have learned lots from the lessons of PyTorch. Maybe Flax represents a big step forward.”

Well, this is the good and the bad of machine learning — there’s always something new and interesting.



Left: Flax is a plant. Linen is made from flax. Ancient Egyptians made linen clothes. Center: A painting by contemporary artist Louise Flax. Right: “Field of Flax” (1892) by Edgar Degas.


Posted in Machine Learning | Leave a comment

IMDB Movie Review Sentiment Analysis Using an LSTM with PyTorch

When I was first learning PyTorch, I implemented a demo of the IMDB movie review sentiment analysis problem using an LSTM. I recently revisited that code to incorporate all the things I learned about PyTorch since that early example.

My overall approach is to preprocess the IMDB data by encoding each word as an integer ID, rather than encoding on the fly during training. IDs are sorted by frequency where small ID numbers are the most common words. This makes it easy to filter out of rare words like “floozle”. Preparing the raw movie data is the most difficult part of creating the sentiment analysis system.

I created a root directory named IMDB with subdirectories Data and Models. I downloaded the 50,000 movie reviews from https://ai.stanford.edu/~amaas/data/sentiment/ as aclImdb_v1.tar.gz to the root IMDB directory, then unzipped using the 7-Zip utility to get file aclImdb_v1.tar, and then I unzipped that file to get an aclImdb directory that contains all the movie reviews. I moved that directory and its contents into the Data directory.


Here I illustrate the data preprocessing for tiny reviews that are 20 words or less. Notice there is a duplicate review.

The goal of my preprocessing is to create files imdb_train_50w.txt and imdb_test_50w.txt for training and testing respectively. These are files where the movie reviews are very small — 50 words or less — because working with the entire dataset of reviews is very difficult. This generated just 620 training items/reviews which is too few to get good results. In a non-demo NLP scenario you need several thousand training items.

The words in the reviews are tokenized into integer values like “the” = 4 and “movie” = 20. I reserved 0 for (PAD) to pad all reviews to exactly 50 words. Most punctuation is stripped out and all words are converted to lower case. Each line in the train and test files is one review where padding is at the beginning and the class label to predict (0 = negative, 1 = positive) is the last value on each line. This preprocessing script is complicated and took me several days of coding and debugging. See the code below.

The program to create and train a sentiment analysis model using a PyTorch LSTM also took several days of work. The model definition is:

import numpy as np
import torch as T
device = T.device('cpu')

class LSTM_Net(T.nn.Module):
  def __init__(self):
    # vocab_size = 129892
    super(LSTM_Net, self).__init__()
    self.embed = T.nn.Embedding(129892, 32)
    self.lstm = T.nn.LSTM(32, 75)
    self.drop = T.nn.Dropout(0.10)
    self.fc1 = T.nn.Linear(75, 10)  # 0=neg, 1=pos
    self.fc2 = T.nn.Linear(10, 2)

  def forward(self, x):
    # x = review/sentence. length = 50 (fixed w/ padding)
    z = self.embed(x) 
    z = z.view(50, 1, 32)  # "seq batch input"
    lstm_oupt, (h_n, c_n) = self.lstm(z)
    z = lstm_oupt[-1]
    z = self.drop(z)
    z = T.tanh(self.fc1(z)) 
    z = self.fc2(z)  # CrossEntropyLoss will apply softmax
    return z  

There are virtually unlimited design choices for an LSTM-based network. There are no good rules of thumb for design — it’s all trial and error guided by experience.

The make_data_files.py data preprocessing program determined that there are 129,892 distinct words/tokens in the entire training data. This is far too many words to get good results so in a non-demo scenario I’d filter the vocabulary down to just the 10 or 20 thousand most common words/tokens.

Each word ID in an input review is converted into an embedding vector of 32 values (in a non-demo scenario 100 values for embedding is more common). The LSTM component converts these to 75 values. These 75 values are passed to two Linear layers that map down to 10 values then down to 2 final result values (0 or 1).

For simplicity, during training I used a batch size of 1, meaning I processed just one review at a time. In a non-demo scenario, I’d probably use a batch size of 16.

After the model was trained, I fed a movie review of “the movie was a great waste of my time” to the model. I converted each word manually: “the” = 4, “movie” = 20, “was” = 16, etc., by using the vocab_dict dictionary in the make_data_files.py program. In a non-demo scenario, I would have programmatically determined the ID values for each word by using the vocab_file.txt file that was generated by the data preparation program.

The prediction result for the review was [0.9984, 0.0016] which maps to class 0, which is a negative review.

Whew! Natural language processing problems like movie review analysis are mysterious and very, very difficult. But very, very interesting.



Three mystery movies that I give positive sentiment reviews to. Left: “Murder on the Orient Express” (1974). Center: “Sherlock Holmes and the House of Fear” (1945). Right: “The Nice Guys” (2016).


Demo code for make_data_files.py. Replace “lt” (less-than), etc. with symbols. My blog editor chokes on those symbols.

# make_data_files.py
#
# input: source Stanford 50,000 data files reviews
# output: one combined train file, one combined test file
# output files are in index version, using the Keras dataset
# format where 0 = padding, 1 = 'start', 2 = OOV, 3 = unused
# 4 = most frequent word ('the'), 5 = next most frequent, etc.
# i'm skipping the start=1 because it makes no sense here.
# these data files will be loaded into memory then feed
# a built-in Embedding layer (rather than custom embeddings)

# the reviews will be just those that have 50 words or less.
# short reviews will have 0s pre-pended. the class
# label (0 or 1) is the very last value.

import os

# allow the Windws cmd shell to deal with wacky characters
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)

# -------------------------------------------------------------

def get_reviews(dir_path, num_reviews, punc_str):
  punc_table = {ord(char): None for char in punc_str}  # dict
  reviews = []  # list-of-lists of words
  ctr = 1
  for file in os.listdir(dir_path):
    if ctr "gt" num_reviews: break
    curr_file = os.path.join(dir_path, file)
    f = open(curr_file, "r", encoding="utf8") 
    for line in f:
      line = line.strip()
      if len(line) "gt" 0:  # number characters
        # print(line)  # to show non-ASCII == errors
        line = line.translate(punc_table)  # remove punc
        line = line.lower()  # lower case
        line = " ".join(line.split())  # remove consecutive WS
        word_list = line.split(" ")  # list of words
        reviews.append(word_list)    # 
    f.close()  # close curr file
    ctr += 1
  return reviews

# -------------------------------------------------------------

def make_vocab(all_reviews):
  word_freq_dict = {}   # key = word, value = frequency

  for i in range(len(all_reviews)):
    reviews = all_reviews[i]
    for review in reviews:
      for word in review:
        if word in word_freq_dict:
          word_freq_dict[word] += 1
        else:
          word_freq_dict[word] = 1

  kv_list = []  # list of word-freq tuples so can sort
  for (k,v) in word_freq_dict.items():
    kv_list.append((k,v))

  # list of tuples index is 0-based rank, val is (word,freq)
  sorted_kv_list = \
    sorted(kv_list, key=lambda x: x[1], \
      reverse=True)  # sort by freq

  f = open(".\\vocab_file.txt", "w", encoding="utf8")
  vocab_dict = {}  
  # key = word, value = 1-based rank 
  # ('the' = 1, 'a' = 2, etc.)
  for i in range(len(sorted_kv_list)):
    w = sorted_kv_list[i][0]  # word is at [0]
    vocab_dict[w] = i+1       # 1-based as in Keras dataset

    f.write(w + " " + str(i+1) + "\n")  # word-space-index
  f.close()

  return vocab_dict

# -------------------------------------------------------------

def generate_file(reviews_lists, outpt_file, w_or_a, 
  vocab_dict, max_review_len, label_char):

  # write first time, append later
  fout = open(outpt_file, w_or_a, encoding="utf8")  
  offset = 3  # Keras offset: 'the' = 1 (most frequent)
      
  for i in range(len(reviews_lists)):  # walk each review
    curr_review = reviews_lists[i]
    n_words = len(curr_review)     
    if n_words "gt" max_review_len:
      continue  # next i, continue without writing anything

    n_pad = max_review_len - n_words   # num of 0s to pre-pend

    for j in range(n_pad):  # write padding to get 50 values
      fout.write("0 ")
    
    for word in curr_review: 
      # a word in test set might not have been in training set
      if word not in vocab_dict:  
        fout.write("2 ")   # 2 is out-of-vocab index        
      else:
        idx = vocab_dict[word] + offset
        fout.write("%d " % idx)
    
    fout.write(label_char + "\n")  # add label '0' or '1
        
  fout.close()

# -------------------------------------------------------------

def main():
  remove_chars = "!\"#$%&()*+,-./:;"lt"="gt"?@[\\]^_`{|}~" 
  # leave ' for words like it's  

  print("\nLoading all reviews into memory - be patient ")
  pos_train_reviews = get_reviews(".\\aclImdb\\train\\pos", 
    12500, remove_chars)
  neg_train_reviews = get_reviews(".\\aclImdb\\train\\neg",
    12500, remove_chars)
  pos_test_reviews = get_reviews(".\\aclImdb\\test\\pos",
    12500, remove_chars)
  neg_test_reviews = get_reviews(".\\aclImdb\\test\\neg",
    12500, remove_chars)

  # mp = max(len(l) for l in pos_train_reviews)  # 2469
  # mn = max(len(l) for l in neg_train_reviews)  # 1520
  # mm = max(mp, mn)  # longest review is 2469
  # print(mp, mn)

# -------------------------------------------------------------

  print("\nAnalyzing reviews and making vocabulary ")
  vocab_dict = make_vocab([pos_train_reviews, 
    neg_train_reviews])  # key = word, value = word rank
  v_len = len(vocab_dict)  
  # need this value, plus 4, for Embedding: 129888+4 = 129892
  print("\nVocab size = %d -- use this +4 for \
    Embedding nw " % v_len)

  max_review_len = 20   # use None for all reviews (any len)
  # if max_review_len == None or max_review_len "gt" mm:
  #   max_review_len = mm

  print("\nGenerating training file len %d words or less " \
    % max_review_len)

  generate_file(pos_train_reviews, ".\\imdb_train_20w.txt", 
    "w", vocab_dict, max_review_len, "1")
  generate_file(neg_train_reviews, ".\\imdb_train_20w.txt",
    "a", vocab_dict, max_review_len, "0")

  print("Generating test file with len %d words or less " \
    % max_review_len)

  generate_file(pos_test_reviews, ".\\imdb_test_20w.txt", 
    "w", vocab_dict, max_review_len, "1")
  generate_file(neg_test_reviews, ".\\imdb_test_20w.txt", 
    "a", vocab_dict, max_review_len, "0")

  # inspect a generated file
  # vocab_dict was used indirectly (offset)

  print("\nDisplaying encoded training file: \n")
  f = open(".\\imdb_train_20w.txt", "r", encoding="utf8")
  for line in f: 
    print(line, end="")
  f.close()

# -------------------------------------------------------------

  print("\nDisplaying decoded training file: \n") 

  index_to_word = {}
  index_to_word[0] = ""lt"PAD"gt""
  index_to_word[1] = ""lt"ST"gt""
  index_to_word[2] = ""lt"OOV"gt""
  for (k,v) in vocab_dict.items():
    index_to_word[v+3] = k

  f = open(".\\imdb_train_20w.txt", "r", encoding="utf8")
  for line in f:
    line = line.strip()
    indexes = line.split(" ")
    for i in range(len(indexes)-1):  # last is '0' or '1'
      idx = (int)(indexes[i])
      w = index_to_word[idx]
      print("%s " % w, end="")
    print("%s " % indexes[len(indexes)-1])
  f.close()

if __name__ == "__main__":
  main()

Demo code for imdb_lstm.py. Replace “lt” (less-than), etc. with symbols. My blog editor chokes on those symbols.

# imdb_lstm.py

# PyTorch 1.9.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import numpy as np
import torch as T
device = T.device('cpu')

# -----------------------------------------------------------

class LSTM_Net(T.nn.Module):
  def __init__(self):
    # vocab_size = 129892
    super(LSTM_Net, self).__init__()
    self.embed = T.nn.Embedding(129892, 32)
    self.lstm = T.nn.LSTM(32, 75)
    self.drop = T.nn.Dropout(0.10)
    self.fc1 = T.nn.Linear(75, 10)  
    self.fc2 = T.nn.Linear(10, 2)  # 0=neg, 1=pos

  def forward(self, x):
    # x = review/sentence. length = 50 (fixed w/ padding)
    z = self.embed(x) 
    z = z.view(50, 1, 32)  # "seq batch input"
    lstm_oupt, (h_n, c_n) = self.lstm(z)
    z = lstm_oupt[-1]
    z = self.drop(z)
    z = T.tanh(self.fc1(z)) 
    z = self.fc2(z)  # CrossEntropyLoss will apply softmax
    return z  

# -----------------------------------------------------------

def accuracy(model, data_x, data_y):
  # data_x and data_y are lists of tensors
  model.eval()
  num_correct = 0; num_wrong = 0
  for i in range(len(data_x)):
    X = data_x[i]
    Y = data_y[i].reshape(1)
    with T.no_grad():
      oupt = model(X) 

    idx = T.argmax(oupt.data)
    if idx == Y:  # predicted == target
      num_correct += 1
    else:
      num_wrong += 1
  acc = (num_correct * 100.0) / (num_correct + num_wrong)
  model = model.train()
  return acc

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin PyTorch IMDB LSTM demo ")
  print("Using only reviews with 50 or less words ")
  T.manual_seed(1)
  np.random.seed(1)

  # 1. load data from file
  print("\nLoading preprocessed train and test data ")
  max_review_len = 50 # exact review length
  
  train_xy = np.loadtxt(".\\Data\\imdb_train_50w.txt", 
    delimiter=" ",  usecols=range(0,51), dtype=np.int64)
  train_x = train_xy[:,0:50]
  train_y = train_xy[:,50]

  test_xy = np.loadtxt(".\\Data\\imdb_test_50w.txt", 
    delimiter=" ",  usecols=range(0,51), dtype=np.int64)
  test_x = test_xy[:,0:50]
  test_y = test_xy[:,50]
 
  # 1b. convert to tensors
  train_x = T.tensor(train_x, dtype=T.int64).to(device)
  train_y = T.tensor(train_y, dtype=T.int64).to(device)
  test_x = T.tensor(test_x, dtype=T.int64).to(device)
  test_y = T.tensor(test_y, dtype=T.int64).to(device)

  N = len(train_x)
  print("Data loaded. Number train items = %d " % N)

# -----------------------------------------------------------

  # 2. create network
  net = LSTM_Net().to(device)

  # 3. train model
  loss_func = T.nn.CrossEntropyLoss()  # does log-softmax()
  optimizer = T.optim.Adam(net.parameters(), lr=1.0e-3)
  max_epochs = 12
  log_interval = 2  # display progress

  print("\nStarting training with bat_size = 1")
  for epoch in range(0, max_epochs):
    net.train()  # set training mode
    indices = np.arange(N)
    np.random.shuffle(indices)
    tot_err = 0.0

    for i in range(N):  # one review at a time
      j = indices[i]
      X = train_x[j]
      Y = train_y[j].reshape(1)
      
      optimizer.zero_grad()
      oupt = net(X)  
      loss_val = loss_func(oupt, Y) 
      tot_err += loss_val.item()
      loss_val.backward()  # compute gradients
      optimizer.step()     # update weights

    if epoch % log_interval == 0:
      print("epoch = %4d  |" % epoch, end="")
      print("  avg loss = %7.4f  |" % (tot_err / N), end="")
      train_acc = accuracy(net, train_x, train_y)
      print("  accuracy = %7.2f%%" % train_acc)
      # test_acc = accuracy(net, test_x, test_y)  # 
      # print("  test accuracy = %7.2f%%" % test_acc)
  print("Training complete")

# -----------------------------------------------------------

  # 4. evaluate model
  test_acc = accuracy(net, test_x, test_y)
  print("\nAccuracy on test data = %7.2f%%" % test_acc)

  # 5. save model
  print("\nSaving trained model state")
  fn = ".\\Models\\imdb_model.pt"
  T.save(net.state_dict(), fn)

  # saved_model = Net()
  # saved_model.load_state_dict(T.load(fn))
  # use saved_model to make prediction(s)

  # 6. use model
  print("\nFor \"the movie was a great waste of my time\"")
  print("0 = negative, 1 = positive ")
  review = np.array([4, 20, 16, 6, 86, 425, 7, 58, 64], \
    dtype=np.int64)
  padding = np.zeros(41, dtype=np.int64)
  review = np.concatenate([review, padding])
  review = T.tensor(review, dtype=T.int64)
  
  net.eval()
  with T.no_grad():
    prediction = net(review)  # raw outputs
  print("\nlogits: ", end=""); print(prediction) 
  probs = T.softmax(prediction, dim=1)  # pseudo-probabilities
  probs = probs.numpy()
  print("pseudo-probs: ", end="")
  print("%0.4f %0.4f " % (probs[0][0], probs[0][1]))

  print("\nEnd PyTorch IMDB LSTM sentiment demo")

if __name__ == "__main__":
  main()
Posted in PyTorch | Leave a comment

Matrix Inverse From Scratch Using Python

Computing the inverse of a matrix is a fundamental algorithm for machine learning and numerical programming. On a recent flight to a conference, just for hoots (and for programming exercise) I decided to implement a matrix inverse function from scratch using Python.

Computing the inverse of a matrix is fantastically complicated. The Wikipedia article on matrix inversion lists 10 categories of techniques, and each category has many variations. The fact that there are so many different ways to invert a matrix is an indirect indication of how difficult the problem is. Briefly, relatively simple matrix inversion techniques such as using cofactors and adjugates only work well for small matrices (roughly 8×8 or smaller). For larger matrices you should write code that involves a complex technique called matrix decomposition.

Although matrix decomposition is very difficult, using decomposition as an intermediate step for matrix inverse is still easier than computing an inverse directly.

My high level function is called mat_inverse(). The mat_inverse() function calls a function mat_decompose() and a function helper(). Both of these helper function could have been defined locally inside mat_inverse().

Anyway, it was good mental exercise. I can’t think of a practical use case where you’d want to implement matrix inverse from scratch, except if you really wanted to avoid an external dependency such as the numpy.linalg.inv() function.



I enjoy alternative movie posters created by artists. Here are three alternative posters for “The Matrix” (1999) that are at least as good as the real poster.


Demo code. Replace “lt” (less-than), “gt”, etc., with symbols. Continue reading

Posted in Machine Learning | Leave a comment

JavaScript and the Sapir-Whorf Hypothesis

The Sapir-Whorf hypothesis loosely states that the structure of a spoken language affects its the way its speakers see and understand the world. This makes intuitive sense — a tribesman from a primitive country there the language doesn’t have words for concepts related to charity will have difficulty grasping the idea of classical Christian style charity.

I am quite certain that the Sapir-Whorf hypothesis idea applies to computer programming languages too. When I work with a set-based language like SQL, my brain has to work quite a bit differently than when I’m working with a procedural language like C or Java. Similarly, working with one language (like Python) gives me insights into other closely related languages (like JavaScript). For that reason, I try to write code almost every day, and I use different programming languages.

To keep fresh with JavaScript, I like to implement a neural network from scratch. Coding up a neural network in JavaScript requires a knowledge of just about every aspect of the language. To support my from-scratch JavaScript neural network, I use a small utility library of basic functions. This keeps the size of the actual neural network code smaller. Some of my utility functions for a JavaScript neural network are:

  vecMake() - create a numeric array/vector.
  matMake() - create an array-of-arrays style matrix.
  vecShow() - display a vector to shell.
  matShow() - display a matrix to shell.
  vecMax() - the largest value in a vector.
  argmax() - the index of the largest value in a vector.
  arange() - create a vector that holds [0,1,2, . .]
  loadTxt() - read a text file of numeric data into a matrix.
  Erratic() - a class to generate semi-random numbers.
  hyperTan() - hyperbolic tangent of a value.
  logSig() - logistic sigmoid of a value.
  softmax() - softmax values of a vector.

I guess the moral of the story is that I, and guys like me, practice coding because we enjoy it, not because we’re forced to. When I interview job candidates at the company I work for, one of my go-to questions is to ask the person applying for the job to describe a side project they’ve done. About half of the candidates I talk to don’t have a good, quick answer. The other half of the candidates get very excited and launch into great conversations about something they have done. These are the guys I look for.



I am very good with programming languages. I am absolutely terrible with spoken languages. According to some research I found on the Internet, there’s a strong consensus that Mandarin is the most difficult spoken language to learn for English speakers. Other difficult languages to learn are (left to right) Russian, Icelandic, and Mongolian.


Demo code. Replace “lt” (less-than), “gt”, “gte”, and “lte” with symbols (my blog editor chokes on symbols).

// utilities_lib.js
// ES6. node.js

let FS = require('fs');

// there is no easy way to read line-by-line !!
function loadTxt(fn, delimit, usecols)
{
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");
  let rows = lines.length;
  let cols = usecols.length;
  let result = matMake(rows, cols, 0.0); 
  for (let i = 0; i "lt" rows; ++i) {  // each line
    let tokens = lines[i].split(delimit);
    for (let j = 0; j "lt" cols; ++j) {
      result[i][j] = parseFloat(tokens[usecols[j]]);
    }
  }
  return result;
}

function arange(n)
{
  let result = [];
  for (let i = 0; i "lt" n; ++i) {
    result[i] = Math.trunc(i);
  } 
  return result;
}

class Erratic
{
  constructor(seed) {
    this.seed = seed + 0.5;  // avoid 0
  }

  next() {
    let x = Math.sin(this.seed) * 1000;
    let result = x - Math.floor(x);  // [0.0,1.0)
    this.seed = result;  // for next call
    return result;
  }

  nextInt(lo, hi) {
    let x = this.next();
    return Math.trunc((hi - lo) * x + lo);
  }
}

function vecMake(n, val)
{
  let result = [];
  for (let i = 0; i "lt" n; ++i) {
    result[i] = val;
  }
  return result;
}

function matMake(rows, cols, val)
{
  let result = [];
  for (let i = 0; i "lt" rows; ++i) {
     result[i] = [];
     for (let j = 0; j "lt" cols; ++j) {
       result[i][j] = val;
     }
  }
  return result;
}

function vecShow(v, dec, len)
{
  for (let i = 0; i "lt" v.length; ++i) {
    if (i != 0 && i % len == 0) {
      process.stdout.write("\n");
    }
    if (v[i] "gte" 0.0) {
      process.stdout.write(" ");  // + or - space
    }
    process.stdout.write(v[i].toFixed(dec));
    process.stdout.write("  ");
  }
  process.stdout.write("\n");
}

function matShow(m, dec)
{
  let rows = m.length;
  let cols = m[0].length;
  for (let i = 0; i "lt" rows; ++i) {
    for (let j = 0; j "lt" cols; ++j) {
      if (m[i][j] "gte" 0.0) {
        process.stdout.write(" ");  // + or - space
      }
      process.stdout.write(m[i][j].toFixed(dec));
      process.stdout.write("  ");
    }
    process.stdout.write("\n");
  }
}

function argmax(vec)
{
  let result = 0;
  let m = vec[0];
  for (let i = 0; i "lt" vec.length; ++i) {
    if (vec[i] "gt" m) {
      m = vec[i];
      result = i;
    }
  }
  return result;
}

function hyperTan(x)
{
  if (x "lt" -20.0) {
    return -1.0;
  }
  else if (x "gt" 20.0) {
    return 1.0;
  }
  else {
    return Math.tanh(x);
  }
}

function logSig(x)
{
  if (x "lt" -20.0) {
    return 0.0;
  }
  else if (x "gt" 20.0) {
    return 1.0;
  }
  else {
    return 1.0 / (1.0 + Math.exp(-x));
  }
}

function vecMax(vec)
{
  let mx = vec[0];
  for (let i = 0; i "lt" vec.length; ++i) {
    if (vec[i] "gt" mx) {
      mx = vec[i];
    }
  }
  return mx;
}

function softmax(vec)
{
  //let m = Math.max(...vec);  // or 'spread' operator
  let m = vecMax(vec);
  let result = [];
  let sum = 0.0;
  for (let i = 0; i "lt" vec.length; ++i) {
    result[i] = Math.exp(vec[i] - m);
    sum += result[i];
  }
  for (let i = 0; i "lt" result.length; ++i) {
    result[i] = result[i] / sum;
  }
  return result;
}

module.exports = {
  vecMake,
  matMake,
  vecShow,
  matShow,
  argmax,
  loadTxt,
  arange,
  Erratic,
  hyperTan,
  logSig,
  vecMax,
  softmax
};
// test_utils.js

let U = require("../Utilities/utilities_lib.js");
// module.exports = { vecMake, matMake, vecShow,
//  matShow, argmax, loadTxt, arange, Erratic,
//  hyperTan, logSig, vecMax, softmax, };

function main()
{
  process.stdout.write("\033[0m");  // reset
  process.stdout.write("\x1b[1m" + "\x1b[37m");  // bright white
  console.log("\nBegin JavaScript utilities for NN demo ");

  let v = U.vecMake(4, 0.0);  // 4 cells, all 1.0
  v[0] = 5.8; v[1] = 3.7; v[2] = 7.3; v[3] = 2.9; 
  process.stdout.write("\nv = ");
  U.vecShow(v, 2, 12);  // 2 decimals, 12 per line

  let x = U.vecMax(v);
  let mi = U.argmax(v);
  console.log("\nLargest value: ");
  console.log(x);
  console.log("\nIndex of largest value: ");
  console.log(mi);

  let rnd = new U.Erratic(13);
  let lo = 1.0; let hi = 5.0;
  let z = (hi - lo) * rnd.next() + lo;
  console.log("\nRandom val in [1.0, 5.0] = ");
  console.log(z.toFixed(4));

  process.stdout.write("\033[0m");  // reset
  console.log("\nEnd demo ");
} // main()

main();
Posted in JavaScript, Machine Learning | Leave a comment

NFL 2021 Week 19 (Wild Card) Predictions – Zoltar Likes Vegas Underdogs Raiders and Steelers

Zoltar is my NFL football prediction computer program. It uses custom reinforcement learning and a neural network. Here are Zoltar’s predictions for week #19 (wild card round) of the 2021 season. It usually takes Zoltar about four weeks to hit his stride and takes humans about eight weeks to get up to speed, so weeks six through nine are usually Zoltar’s sweet spot. After week nine, injuries start having a big effect.

Zoltar:     bengals  by    1  dog =     raiders    Vegas:     bengals  by  6.5
Zoltar:       bills  by    6  dog =    patriots    Vegas:       bills  by  4.5
Zoltar:  buccaneers  by    6  dog =      eagles    Vegas:  buccaneers  by   10
Zoltar:     cowboys  by    6  dog = fortyniners    Vegas:     cowboys  by    3
Zoltar:      chiefs  by    6  dog =    steelers    Vegas:      chiefs  by 13.5
Zoltar:        rams  by    6  dog =   cardinals    Vegas:        rams  by  4.5

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. In mid-season I usually use 3.0 points difference but for the first few weeks and last few weeks of the season I go a bit more conservative and use 4.0 points difference as the advice threshold criterion. In middle weeks I sometimes go ultra-aggressive and use a 1.0-point threshold.

Note: Because of Zoltar’s initialization (all teams regress to an average power rating) and other algorithms, Zoltar is much too strongly biased towards Vegas underdogs. I need to fix this.

For week #19 (wild card):

1. Zoltar likes Vegas underdog Raiders against the Bengals
2. Zoltar likes Vegas underdog Steelers against the Chiefs

For example, a bet on the underdog Raiders against the Bengals will pay off if the Raiders win by any score, or if the favored Bengals win but by less than the point spread of 6.5 points (in other words, by 6 points or less).

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

In week #18, against the Vegas point spread, Zoltar went 3-1 (using the standard conservative 4.0 points as the advice threshold). Overall, for the season, Zoltar is 65-51 against the spread (~56%).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #18, just predicting the winning team, Zoltar went 9-7 which is very poor.

In week #18, just predicting the winning team, Vegas — “the wisdom of the crowd” also went 9-7, the same as Zoltar.

Zoltar sometimes predicts a 0-point margin of victory, which means the two teams are evenly matched. There are no such games in week #19. In those situations, to pick a winner (only so I can track raw number of correct predictions) in the first few weeks of the season, Zoltar picks the home team to win. After that, Zoltar uses his algorithms to pick a winner.



My prediction system is named after the Zoltar fortune teller machine you can find in arcades. My favorite arcade games are pinball machines. Here are two NFL football themed pinball machines. Center: “Pro Football” (1973) by Gottlieb. Right: “Monday Night Football” (1989) by Data East.

Posted in Zoltar | Leave a comment

The Kendall Tau Distance For Permutations Example C# Code

Suppose you have a permutation p1 = (0, 2, 4, 1, 3) and a second permutation p2 = (4, 0, 3, 2, 1) and you want to know the distance/difference between them.

The Kendall Tau distance between two permutations is the number of pairs of elements that have a different ordering in the two permutations. For the two permutations above, there are 5 elements so there are a total of 5 * (5-1) / 2 = 10 pairs of elements.

Of the 10 pairs of elements, 4 pairs have different ordering: (0,4), (1,3), (2,4), (2,3). In p1 element 0 comes before 4 but in p2 element 0 comes after 4, and so on. The other 6 pairs of elements have the same ordering. For example, for the pair (1,2) element 2 comes before element 1 in both p1 and p2.

The normalized Kendall Tau distance is the raw number of mismatched-order pairs of elements divided by the total number of possible pairs. For the two permutations above, the normalized Kendall Tau distance is 4 / 10 = 0.4000. This can be interpreted as percentage of mismatched pairs.

A few weeks ago I implemented Kendall Tau distance for permutations using Python. One weekend morning while I was waiting for the rain to stop so I could walk my dogs, for mental exercise I decided to refactor the implementation using the C# language.

Good fun.



Playing card games are all about permutations of 52 elements. Here are three interesting examples of playing cards used for 3D art rather than for mathematical permutations.


Demo code. Replace “lt” (less-than) with symbol — my blog editor chokes on symbols.

using System;

namespace KendallTauPermDistance
{
  class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nKendall Tau perm distance ");

      int[] p1 = new int[] { 0, 2, 4, 1, 3 };
      int[] p2 = new int[] { 4, 0, 3, 2, 1 };

      Console.Write("\npermutation p1 = "); ShowPerm(p1);
      Console.Write("permutation p2 = "); ShowPerm(p2);

      double[] distances = KendallTauDist(p1, p2);
      int rawDist = (int)distances[0];
      double normDist = distances[1];

      Console.WriteLine("\nRaw KT distance = " +
        rawDist);
      Console.WriteLine("Normalized KT distance = " +
        normDist.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main()

    static void ShowPerm(int[] p)
    {
      int n = p.Length;
      Console.Write("[ ");
      for (int i = 0; i "lt" n; ++i)
        Console.Write(p[i] + " ");
      Console.WriteLine("]");
    }

    static double[] KendallTauDist(int[] p1, int[] p2)
    {
      int n = p1.Length;
      int[] indexOf = new int[n];  // lookup into p2
      for (int i = 0; i "lt" n; ++i)
      {
        int element = p2[i];
        indexOf[element] = i;
      }

      int d = 0;  // raw distance = num mismatched-order pairs
      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = i + 1; j "lt" n; ++j) 
        {
          if (indexOf[p1[i]] "gt" indexOf[p1[j]])  // examine p2
            ++d;
        }
      }

      double normTerm = n * (n - 1) / 2.0;  // total number pairs
      double nd = d / normTerm;
      double[] result = new double[] { (double)d, nd };
      return result;
    }

  } // Program

} // ns
Posted in Miscellaneous | Leave a comment

Assigning Fixed Weight and Bias Values to a PyTorch Neural Network

Sometimes it’s useful to assign fixed weight and bias values to a neural network. To do so requires a knowledge of how those values are stored.

I wrote a short demo program to illustrate the technique. The demo creates a 3-4-2 neural network. The single hidden layer is named hid1 and has a total of 3 x 4 = 12 weights and 4 biases. PyTorch sores the weight values in a 4×3 shaped matrix named self.hid1.weight.data. The biases values are stored in self.hid1.bias.data.

Similarly, the output layer is named oupt and has a total of 4 x 2 = 8 weights and 2 biases. They’re stored in a 2×4 shaped matrix named self.oupt.weight.data and self.oupt.bias.data.

The demo code iterates through the weights and biases and stores 0.01, 0.02, . . 0.26 into the network.

The diagram above shows the conceptual view of the neural network, and a representation of the weight and bias data structures.



Software system conceptual diagrams are a facade over the reality and complexity of the underlying code. Here are three remarkable building facades that give a flat wall the appearance of 3D complexity.


Demo code.

# layer_assign_wts.py

# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import torch as T
device = T.device("cpu")  # apply to Tensor or Module

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(3, 4)  # 3-4-2
    self.oupt = T.nn.Linear(4, 2)

    v = 0.01

    for i in range(4):      # hid1 4x3
      for j in range(3):
        self.hid1.weight.data[i][j] = v
        v += 0.01
    for i in range(4):
      self.hid1.bias.data[i] = v
      v += 0.01

    for i in range(2):      # oupt 2x4
      for j in range(4):
        self.oupt.weight.data[i][j] = v
        v += 0.01
    for i in range(2):
      self.oupt.bias.data[i] = v
      v += 0.01

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # no softmax for CrossEntropyLoss() 
    return z

def main():
  print("\nBegin ")
  T.manual_seed(1)

  print("\nCreating a 3-4-2 network with fixed wts and biases ")
  net = Net().to(device)

  print("\nhid1 wts and biases: ")
  print(net.hid1.weight.data)
  print(net.hid1.bias.data)

  print("\noupt wts and biases: ")
  print(net.oupt.weight.data)
  print(net.oupt.bias.data)

  print("\nEnd ")

if __name__ == "__main__":
  main()
Posted in PyTorch | Leave a comment

Chi-Square From Scratch Using Python

One night I just couldn’t fall asleep so to kill time productively I decided to implement chi-square from scratch using Python.

The term “chi-square” has multiple related meanings. There is a chi-square test that compares a set of observed counts with a corresponding set of expected (theoretical) counts. There is a chi-square statistic which is a single value computed from observed and expected counts. There is a chi-square distribution.

For my demo, I set up observed counts of [192, 163, 25] and expected/theoretical counts of [180, 180, 20]. These correspond to a roulette wheel counts for red, black, and green if you spin the wheel 380 times. Is the roulette wheel fair?

The main challenge is to implement a function that returns the area under the curve of the chi-square distribution from a given chi-square statistic to infinity. I did this using ACM Algorithm 299, which calls a function that returns the area under the Gaussian (Normal) distribution using ACM Algorithm 209. Very complicated.

To verify my from-scratch implementation of a chi-square test, I fed the observed and expected counts to the scipy chisquare() function and my my_chisquare() function. Both versions computed a chi-square value of 3.6555 which in turn yielded a p-value of 0.1607. The p-value is the probability that the observed and expected counts match, so a small p-value means no match. Typically, if the p-value is less than 0.05 you can conclude the observed doesn’t match the expected, otherwise (p greater than 0.05) you conclude there’s not enough information for a strong statement.



Roulette etiquette, courtesy of a search for “elegant gambling” in stock photos. Left: It’s not considered good form to lunge onto the table and grab the roulette wheel to stop it where you want. Center: I suspect roulette dealers aren’t fond of players tossing chips randomly onto the betting area. Right: Emotional support pandas for roulette are generally frowned upon, at least in the casinos I’ve been to.


Demo code below. Replace “lt” (less-than), “gt”, etc., with correct symbols. (My blog editor chokes on the symbols).

# chisq_area.py
# chi-square test from scratch

import numpy as np
from scipy.stats import chisquare  # for verification

def my_chisq_pval(x, df):
  # x = computed chi-square stat
  # df = degreess of freedom
  # ACM algorithm 299 (calls ACM 209)
  if x "lte" 0.0 or df "lt" 1:
    print("FATAL argument error ")

  a = 0.0  # 299 variable names
  y = 0.0
  s = 0.0
  z = 0.0
  ee = 0.0  # change from e
  c = 0.0

  even = True
  a = 0.5 * x
  if df % 2 == 0: even = True
  else: even = False

  if df "gt" 1: y = my_exp(-a)  # ACM update remark (4)
  if (even == True): s = y
  else: s = 2.0 * gauss(-np.sqrt(x))
  
  if df "gt" 2:
    x = 0.5 * (df - 1.0)
    if even == True: z = 1.0
    else: z = 0.5
    if a "gt" 40.0:  # ACM remark (5)
      if even == True: ee = 0.0
      else: ee = 0.5723649429247000870717135
      c = np.log(a)   # log base e
      while z "lte" x:
        ee = np.log(z) + ee
        s = s + my_exp(c * z - a - ee)  # ACM update remark (6)
        z = z + 1.0
      return s
    else:  # a "lte" 40.0
      if even == True: 
        ee = 1.0
      else:
        ee = 0.5641895835477562869480795 / np.sqrt(a)
          
      c = 0.0
      while z "lte" x:
        ee = ee * (a / z)  # ACM update remark (7)
        c = c + ee
        z = z + 1.0
      return c * y + s
  else:  # df "lte" 2
    return s

def my_exp(x):
  if x "lt" -40.0:  # ACM update remark (8)
    return 0.0
  else:
    return np.exp(x)

def gauss(z):
  # input: z-value (-inf to +inf)
  # output = p under Normal curve from -inf to z
  # ACM 209  

  y = 0.0
  p = 0.0
  w = 0.0

  if z == 0.0:
    p = 0.0
  else:
    y = np.abs(z) / 2
    if y "gte" 3.0:
      p = 1.0
    elif y "lt" 1.0:
      w = y * y
      p = ((((((((0.000124818987 * w \
        - 0.001075204047) * w + 0.005198775019) * w \
        - 0.019198292004) * w + 0.059054035642) * w \
        - 0.151968751364) * w + 0.319152932694) * w \
        - 0.531923007300) * w + 0.797884560593) * y \
        * 2.0
    else:
      y = y - 2.0
      p = (((((((((((((-0.000045255659 * y \
        + 0.000152529290) * y - 0.000019538132) * y \
        - 0.000676904986) * y + 0.001390604284) * y \
        - 0.000794620820) * y - 0.002034254874) * y \
       + 0.006549791214) * y - 0.010557625006) * y \
       + 0.011630447319) * y - 0.009279453341) * y \
       + 0.005353579108) * y - 0.002141268741) * y \
       + 0.000535310849) * y + 0.999936657524

  if z "gt" 0.0:
    return (p + 1.0) / 2
  else:
    return (1.0 - p) / 2 

def my_chisquare(obs, expect):
  x = 0.0
  for i in range(len(obs)):
    x += (obs[i] - expect[i])**2 / expect[i]
  df = len(obs) - 1
  p_val = my_chisq_pval(x, df)
  return (x, p_val)

def main():
  print("\nBegin chi-square demo ")

  obs = [192, 163, 25]
  expect = [180, 180, 20]
  print("\nObserved counts, expected counts: ")
  print(obs)
  print(expect)

  # scipy
  (chi_stat, pvalue) = chisquare(obs, expect)
  print("\nchi_stat, p_val from scipy: ")
  print("%0.8f" % chi_stat)
  print("%0.8f" % pvalue)

  # scratch
  (chi_stat, pvalue) = my_chisquare(obs, expect)
  print("\nchi_stat, p_val from scratch: ")
  print("%0.8f" % chi_stat)
  print("%0.8f" % pvalue)

  if pvalue "lt" 0.05:
    print("\nData suggests obs does not match expect! ")
  else:
    print("\nNo evidence obs and expect do not match ")

  print("\nEnd demo ")

if __name__ == "__main__":
  main()
Posted in Miscellaneous | Leave a comment

PyTorch Explicit vs. Implicit Weight and Bias Initialization

Sometimes library code is too helpful. In particular, I don’t like library code that uses default mechanisms. One example is PyTorch library weight and bias initialization. Consider this PyTorch neural network definition:

import torch as T
device = T.device("cpu")

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(3, 4)  # 3-(4-5)-2
    self.hid2 = T.nn.Linear(4, 5)
    self.oupt = T.nn.Linear(5, 2)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)
    return z

. . .

net = Net().to(device)

The code defines a 3-(4-5)-2 neural network. But how are the weights and bias values initialized? If you don’t explicitly specify weight and bias initialization code, PyTorch will use default code.


Left: A 3-(4-5)-2 neural network with default weight and bias initialization. Right: The same network but with explicit weight and bias initialization gives identical values.

I don’t like invisible default code. Default code can change over time — and usually does. This makes program runs non-reproducible. As it turns out, for Linear() layers, PyTorch uses fairly complicated default weight and bias initialization. I went to the initialization source code at C:\Users\(user)\Anaconda3\Lib\site-packages\torch\nn\modules\linear.py and saw default initialization is kaiming_uniform() for weights and uniform() for biases, but with some tricky parameters.

I copy/pasted the library code into the __init__ method and got code that produces the exact same initial weights and biases but is explicit:

import torch as T
device = T.device("cpu")

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(3, 4)  # 3-(4-5)-2
    self.hid2 = T.nn.Linear(4, 5)
    self.oupt = T.nn.Linear(5, 2)

    T.nn.init.kaiming_uniform_(self.hid1.weight,
      a=math.sqrt(5.0))
    bound = 1 / math.sqrt(3)
    T.nn.init.uniform_(self.hid1.bias, -bound, bound)

    T.nn.init.kaiming_uniform_(self.hid2.weight, 
      a=math.sqrt(5.0))
    bound = 1 / math.sqrt(4)
    T.nn.init.uniform_(self.hid2.bias, -bound, bound)

    T.nn.init.kaiming_uniform_(self.oupt.weight, 
      a=math.sqrt(5.0))
    bound = 1 / math.sqrt(5)
    T.nn.init.uniform_(self.oupt.bias, -bound, bound)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)
    return z

. . .

net = Net().to)device)

The sqrt(5.0) is a magic parameter for kaiming_uniform(). In the sqrt(3), sqrt(4), sqrt(5) for the biases, the 3, 4, 5 are the “fan_in” values for each layer — number of inputs.

The downside to explicit weight and bias initialization is more code. But in non-demo production scenarios, it’s almost always better to use explicit code rather than rely on implicit default code that can lead to non-reproducibility.



The goal of photorealistic art is to create an explicit representation of reality. The art deco movement of the 1920s and 1930s used implicit representations of reality. From left to right: Georges Lepape, Erte, Tamara Lempicka.


Demo code.

# layer_default_init.py
# see C:\Users\(user)\Anaconda3\Lib\site-packages
#   \torch\nn\modules\linear.py

# PyTorch 1.10.0-CPU Anaconda3-2020.02  Python 3.7.6
# Windows 10 

import math
import torch as T
device = T.device("cpu")  # apply to Tensor or Module

class Net(T.nn.Module):
  def __init__(self, init_type):
    super(Net, self).__init__()
    # T.manual_seed(1)
    self.hid1 = T.nn.Linear(3, 4)  # 3-(4-5)-2
    self.hid2 = T.nn.Linear(4, 5)
    self.oupt = T.nn.Linear(5, 2)

    if init_type == 'default':
      pass
    elif init_type == 'explicit':
      T.nn.init.kaiming_uniform_(self.hid1.weight, 
        a=math.sqrt(5.0))
      bound = 1 / math.sqrt(3.0)
      T.nn.init.uniform_(self.hid1.bias, -bound, bound)

      T.nn.init.kaiming_uniform_(self.hid2.weight, 
        a=math.sqrt(5.0))
      bound = 1 / math.sqrt(4.0)
      T.nn.init.uniform_(self.hid2.bias, -bound, bound)

      T.nn.init.kaiming_uniform_(self.oupt.weight, 
        a=math.sqrt(5.0))
      bound = 1 / math.sqrt(5.0)
      T.nn.init.uniform_(self.oupt.bias, -bound, bound)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # CrossEntropyLoss() 
    return z

def main():
  print("\nBegin ")
  T.manual_seed(1)

  # print("\nCreating a 3-(4-5)-2 network default init ")
  # net = Net('default').to(device)

  print("\nCreating a 3-(4-5)-2 network explicit init ")
  net = Net('custom').to(device)

  print("\nhid1 wts and biases: ")
  print(net.hid1.weight.data)
  print(net.hid1.bias.data)

  print("\nhid2 wts and biases: ")
  print(net.hid2.weight.data)
  print(net.hid2.bias.data)


  print("\noupt wts and biases: ")
  print(net.oupt.weight.data)
  print(net.oupt.bias.data)

  print("\nEnd ")

if __name__ == "__main__":
  main()
Posted in PyTorch | 3 Comments