Programmatically Analyzing Chess Games Using Stockfish With Python

One rainy Saturday afternoon, I thought I’d investigate the possibility of programmatically analyzing chess positions and entire chess games.

After spending some time on the Internet, I realized there were lots of possible ways to approach this problem. I ended up using a Python library of functions that interact with the Stockfish chess engine.

First I downloaded the Stockfish engine for Windows from https://stockfishchess.org/download/windows/ by clicking on the 64-bit link button. This downloaded file stockfish-windows-x86-64.zip to my Downloads directory. I unzipped the file, and copied the extracted files to a local directory C:\Python\Stockfish that I created. The unzipped root directory has the Stockfish engine named as stockfish-windows-x86-64.exe in subdirectory Stockfish\stockfish-windows-x86-64\stockfish.

You don’t run the Stockfish executable directly. Stockfish needs an application program that accesses the executable.

I installed a Python library interface to Stockfish by opening a command shell and issuing the command “pip install stockfish”. The stockfish Python library is just an interface — it doesn’t include the Stockfish engine. The stockfish library is very slick and the documentation at https://pypi.org/project/stockfish/ is very good.

After a few hours of experimentation, I was able to programmatically analyze a chess game. Part of the output of my demo program looks like:

----------

position = 40 | white to move |  move # 21

position in FEN =
r2qr3/pp1b1pkp/2ppnn2/4pp2/3PP3/P3PQNP/BPP3P1/R4RK1 w - - 0 21

position:
+---+---+---+---+---+---+---+---+
| r |   |   | q | r |   |   |   | 8
+---+---+---+---+---+---+---+---+
| p | p |   | b |   | p | k | p | 7
+---+---+---+---+---+---+---+---+
|   |   | p | p | n | n |   |   | 6
+---+---+---+---+---+---+---+---+
|   |   |   |   | p | p |   |   | 5
+---+---+---+---+---+---+---+---+
|   |   |   | P | P |   |   |   | 4
+---+---+---+---+---+---+---+---+
| P |   |   |   | P | Q | N | P | 3
+---+---+---+---+---+---+---+---+
| B | P | P |   |   |   | P |   | 2
+---+---+---+---+---+---+---+---+
| R |   |   |   |   | R | K |   | 1
+---+---+---+---+---+---+---+---+
  a   b   c   d   e   f   g   h

Position evaluation = {'type': 'cp', 'value': 376}

Best moves in this position:
{'Move': 'g3f5', 'Centipawn': 399, 'Mate': None}
{'Move': 'd4e5', 'Centipawn': 185, 'Mate': None}
{'Move': 'f3f5', 'Centipawn': 54, 'Mate': None}
{'Move': 'h3h4', 'Centipawn': -173, 'Mate': None}
{'Move': 'a2e6', 'Centipawn': -269, 'Mate': None}

----------

The evaluation values, such as 399 and -269 above, are measured in centipawns, or 1/100 of a pawn. A positive value means the position favors White. A negative value means the position favors Black. Almost all popular chess programs convert these evaluations by dividing by 100, and so the move 21.Ng3xf5 results in a position with an evaluation of +3.99 pawns in favor of White, and the move 21.Ba2xe6 results in a position with an evaluation of -2.69 pawns in favor of Black.



Left: The Stockfish engine download page. Right: The online tool I used to convert PGN notation to FEN notation.



Left: The documentation for the Python Stockfish library. Right: The Nakamura-Topalov game at chessgames.com


I was analyzing a game between Hikaru Nakamura (White) and Veselin Topalov (Black) from 2017. I downloaded the game in PGN format from chessgames.com which is:

[Event "Champions Showdown in Saint Louis (Blitz)"]
[Site "St Louis, MO USA"]
[Date "2017.11.12"]
[EventDate "2017.10.21"]
[Round "12.1"]
[Result "1-0"]
[White "Hikaru Nakamura"]
[Black "Veselin Topalov"]
[ECO "C26"]
[WhiteElo "2774"]
[BlackElo "2749"]
[PlyCount "43"]

1. e4 e5 2. Nc3 Nf6 3. Bc4 Bc5 4. d3 c6 5. Bb3 d6 6. Nf3 O-O
7. h3 Nbd7 8. O-O Bb6 9. a3 Nc5 10. Ba2 Ne6 11. Ne2 Re8
12. Be3 Bxe3 13. fxe3 Qc7 14. Nh4 Qd8 15. Nf3 Bd7 16. Ng3 g6
17. d4 Qc7 18. Nh4 Qd8 19. Qf3 Kg7 20. Nhf5+ gxf5 21. Nxf5+
Kg6 22. Bxe6 1-0

I was unable to determine how to use the stockfish Python library with PGN data, so I had to convert the PGN data to FEN (“Forsyth–Edwards Notation”) data. Luckily I discovered a very nice online tool to do this at https://www.lutanho.net/pgn/pgn2fen.html. The game in FEN format is:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
rnbqkbnr/pppp1ppp/8/4p3/4P3/8/PPPP1PPP/RNBQKBNR w KQkq e6 0 2
rnbqkbnr/pppp1ppp/8/4p3/4P3/2N5/PPPP1PPP/R1BQKBNR b KQkq - 1 2
rnbqkb1r/pppp1ppp/5n2/4p3/4P3/2N5/PPPP1PPP/R1BQKBNR w KQkq - 2 3
rnbqkb1r/pppp1ppp/5n2/4p3/2B1P3/2N5/PPPP1PPP/R1BQK1NR b KQkq - 3 3
. . .
r2qr3/pp1b1p1p/2ppnnk1/4pN2/3PP3/P3PQ1P/BPP3P1/R4RK1 w - - 1 22
r2qr3/pp1b1p1p/2ppBnk1/4pN2/3PP3/P3PQ1P/1PP3P1/R4RK1 b - - 0 22

Each line of data is a position from the game. The full FEN is listed below. The Wikipedia article on Forsyth–Edwards Notation has a good explanation.

Here are a few of the key lines of my demo program. These statements fire up the Stockfish engine:

from stockfish import Stockfish

loc = "C:\\Python\\Stockfish\\" + \
  "stockfish-windows-x86-64\\stockfish\\" + \
  "stockfish-windows-x86-64.exe"
stockfish = Stockfish(path=loc)

These lines set and display a position from the game:

game = ".\\game_04.txt"  # FEN data
f = open(game, "r")
. . .
  line = f.readline()  # read a line from the FEN data
  stockfish.set_fen_position(line)  # set position using FEN
  vis_pos = stockfish.get_board_visual() # ASCII visual
  print("position: ")
  print(vis_pos)
. . .

All in all it was a very interesting experiment and there are many ideas to explore. For example, chess positions where about half of the possible moves result in positive evaluations, and the other possible moves result in negative evaluations, seem like “risky” positions in some sense. I suspect some grandmaster chess players tend to make moves that lead to risky positions, as opposed to objectively the best move, because a risky position gives their opponent more chance to make a mistake.

I’m sure there’s a lot more to learn about the stockfish library, but even so, I think I’ve made a good start.



Chess grandmasters often have very warped personalities because they must dedicate essentially their entire life to chess. For example, Robert Fisher (1943-2008, 11th champion 1972-1975) was truly a bizarre human being — and not in a nice way. But several of the 17 modern world chess champions have reputations as being nice people. These nice guys of chess include Jose Raul Capablanca (1888-1942, 3rd champion 1921-1927), Max Euwe (1901-1981, 5th champion 1935-1937), Vasily Smyslov (1921-2010, 7th champion 1957-1958), Boris Spassky (b. 1937, 10th champion 1969-1971), and Viswanathan Anand (b. 1969, 15th champion 2007-2013).

Left: Jose Raul Capablanca. Center: Max Euwe. Right: Viswanathan Anand.


Demo program:

# stockfish_demo.py
# Anaconda3-2023.09-0  Python 3.11.5
# Windows 10/11

from stockfish import Stockfish

loc = "C:\\Python\\Stockfish\\" + \
  "stockfish-windows-x86-64\\stockfish\\" + \
  "stockfish-windows-x86-64.exe"

stockfish = Stockfish(path=loc)
stockfish.update_engine_parameters({"UCI_Elo": 2000})
p = stockfish.get_parameters()
print("\nstockfish parameters: ")
print(p)

game = ".\\game_04.txt"
f = open(game, "r")
pos_number = 0
while True:
  line = f.readline()
  if not line: break
  if line.startswith("#"): continue
  if line.startswith("["): continue
  line = line.strip()
  print("\n----------")
  print("\nposition = " + str(pos_number) +\
    " | ", end="")
  tokens = line.split(" ")
  if tokens[1] == "w":
    print("white to move | ", end="")
  elif tokens[1] == "b":
    print("black to move | ", end="")
  print(" move # " + str(tokens[-1]))
  print("\nposition in FEN = ")
  print(line)
  stockfish.set_fen_position(line)
  vis_pos = stockfish.get_board_visual()
  print("\nposition: ")
  print(vis_pos)

  curr_eval = stockfish.get_evaluation()
  print("Position evaluation = ", end="")
  print(curr_eval)

  bms = stockfish.get_top_moves(5)
  print("\nBest moves in this position:")
  for i in range(len(bms)):
    print(bms[i])

  pos_number += 1
  print("\n----------")
  # input()
f.close()

print("\nEnd analysis ")

Data:

# game_04.txt
#
# [Event "Champions Showdown in Saint Louis (Blitz)"]
# [Site "St Louis, MO USA"]
# [Date "2017.11.12"]
# [EventDate "2017.10.21"]
# [Round "12.1"]
# [Result "1-0"]
# [White "Hikaru Nakamura"]
# [Black "Veselin Topalov"]
# [ECO "C26"]
# [WhiteElo "2774"]
# [BlackElo "2749"]
# [PlyCount "43"]
#
# 1. e4 e5 2. Nc3 Nf6 3. Bc4 Bc5 4. d3 c6 5. Bb3 d6 6. Nf3 O-O
# 7. h3 Nbd7 8. O-O Bb6 9. a3 Nc5 10. Ba2 Ne6 11. Ne2 Re8
# 12. Be3 Bxe3 13. fxe3 Qc7 14. Nh4 Qd8 15. Nf3 Bd7 16. Ng3 g6
# 17. d4 Qc7 18. Nh4 Qd8 19. Qf3 Kg7 20. Nhf5+ gxf5 21. Nxf5+
# Kg6 22. Bxe6 1-0
#
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
rnbqkbnr/pppp1ppp/8/4p3/4P3/8/PPPP1PPP/RNBQKBNR w KQkq e6 0 2
rnbqkbnr/pppp1ppp/8/4p3/4P3/2N5/PPPP1PPP/R1BQKBNR b KQkq - 1 2
rnbqkb1r/pppp1ppp/5n2/4p3/4P3/2N5/PPPP1PPP/R1BQKBNR w KQkq - 2 3
rnbqkb1r/pppp1ppp/5n2/4p3/2B1P3/2N5/PPPP1PPP/R1BQK1NR b KQkq - 3 3
rnbqk2r/pppp1ppp/5n2/2b1p3/2B1P3/2N5/PPPP1PPP/R1BQK1NR w KQkq - 4 4
rnbqk2r/pppp1ppp/5n2/2b1p3/2B1P3/2NP4/PPP2PPP/R1BQK1NR b KQkq - 0 4
rnbqk2r/pp1p1ppp/2p2n2/2b1p3/2B1P3/2NP4/PPP2PPP/R1BQK1NR w KQkq - 0 5
rnbqk2r/pp1p1ppp/2p2n2/2b1p3/4P3/1BNP4/PPP2PPP/R1BQK1NR b KQkq - 1 5
rnbqk2r/pp3ppp/2pp1n2/2b1p3/4P3/1BNP4/PPP2PPP/R1BQK1NR w KQkq - 0 6
rnbqk2r/pp3ppp/2pp1n2/2b1p3/4P3/1BNP1N2/PPP2PPP/R1BQK2R b KQkq - 1 6
rnbq1rk1/pp3ppp/2pp1n2/2b1p3/4P3/1BNP1N2/PPP2PPP/R1BQK2R w KQ - 2 7
rnbq1rk1/pp3ppp/2pp1n2/2b1p3/4P3/1BNP1N1P/PPP2PP1/R1BQK2R b KQ - 0 7
r1bq1rk1/pp1n1ppp/2pp1n2/2b1p3/4P3/1BNP1N1P/PPP2PP1/R1BQK2R w KQ - 1 8
r1bq1rk1/pp1n1ppp/2pp1n2/2b1p3/4P3/1BNP1N1P/PPP2PP1/R1BQ1RK1 b - - 2 8
r1bq1rk1/pp1n1ppp/1bpp1n2/4p3/4P3/1BNP1N1P/PPP2PP1/R1BQ1RK1 w - - 3 9
r1bq1rk1/pp1n1ppp/1bpp1n2/4p3/4P3/PBNP1N1P/1PP2PP1/R1BQ1RK1 b - - 0 9
r1bq1rk1/pp3ppp/1bpp1n2/2n1p3/4P3/PBNP1N1P/1PP2PP1/R1BQ1RK1 w - - 1 10
r1bq1rk1/pp3ppp/1bpp1n2/2n1p3/4P3/P1NP1N1P/BPP2PP1/R1BQ1RK1 b - - 2 10
r1bq1rk1/pp3ppp/1bppnn2/4p3/4P3/P1NP1N1P/BPP2PP1/R1BQ1RK1 w - - 3 11
r1bq1rk1/pp3ppp/1bppnn2/4p3/4P3/P2P1N1P/BPP1NPP1/R1BQ1RK1 b - - 4 11
r1bqr1k1/pp3ppp/1bppnn2/4p3/4P3/P2P1N1P/BPP1NPP1/R1BQ1RK1 w - - 5 12
r1bqr1k1/pp3ppp/1bppnn2/4p3/4P3/P2PBN1P/BPP1NPP1/R2Q1RK1 b - - 6 12
r1bqr1k1/pp3ppp/2ppnn2/4p3/4P3/P2PbN1P/BPP1NPP1/R2Q1RK1 w - - 0 13
r1bqr1k1/pp3ppp/2ppnn2/4p3/4P3/P2PPN1P/BPP1N1P1/R2Q1RK1 b - - 0 13
r1b1r1k1/ppq2ppp/2ppnn2/4p3/4P3/P2PPN1P/BPP1N1P1/R2Q1RK1 w - - 1 14
r1b1r1k1/ppq2ppp/2ppnn2/4p3/4P2N/P2PP2P/BPP1N1P1/R2Q1RK1 b - - 2 14
r1bqr1k1/pp3ppp/2ppnn2/4p3/4P2N/P2PP2P/BPP1N1P1/R2Q1RK1 w - - 3 15
r1bqr1k1/pp3ppp/2ppnn2/4p3/4P3/P2PPN1P/BPP1N1P1/R2Q1RK1 b - - 4 15
r2qr1k1/pp1b1ppp/2ppnn2/4p3/4P3/P2PPN1P/BPP1N1P1/R2Q1RK1 w - - 5 16
r2qr1k1/pp1b1ppp/2ppnn2/4p3/4P3/P2PPNNP/BPP3P1/R2Q1RK1 b - - 6 16
r2qr1k1/pp1b1p1p/2ppnnp1/4p3/4P3/P2PPNNP/BPP3P1/R2Q1RK1 w - - 0 17
r2qr1k1/pp1b1p1p/2ppnnp1/4p3/3PP3/P3PNNP/BPP3P1/R2Q1RK1 b - - 0 17
r3r1k1/ppqb1p1p/2ppnnp1/4p3/3PP3/P3PNNP/BPP3P1/R2Q1RK1 w - - 1 18
r3r1k1/ppqb1p1p/2ppnnp1/4p3/3PP2N/P3P1NP/BPP3P1/R2Q1RK1 b - - 2 18
r2qr1k1/pp1b1p1p/2ppnnp1/4p3/3PP2N/P3P1NP/BPP3P1/R2Q1RK1 w - - 3 19
r2qr1k1/pp1b1p1p/2ppnnp1/4p3/3PP2N/P3PQNP/BPP3P1/R4RK1 b - - 4 19
r2qr3/pp1b1pkp/2ppnnp1/4p3/3PP2N/P3PQNP/BPP3P1/R4RK1 w - - 5 20
r2qr3/pp1b1pkp/2ppnnp1/4pN2/3PP3/P3PQNP/BPP3P1/R4RK1 b - - 6 20
r2qr3/pp1b1pkp/2ppnn2/4pp2/3PP3/P3PQNP/BPP3P1/R4RK1 w - - 0 21
r2qr3/pp1b1pkp/2ppnn2/4pN2/3PP3/P3PQ1P/BPP3P1/R4RK1 b - - 0 21
r2qr3/pp1b1p1p/2ppnnk1/4pN2/3PP3/P3PQ1P/BPP3P1/R4RK1 w - - 1 22
r2qr3/pp1b1p1p/2ppBnk1/4pN2/3PP3/P3PQ1P/1PP3P1/R4RK1 b - - 0 22
Posted in Programmatic Chess | Leave a comment

Regression Example Using LightGBM (Light Gradient Boosting Machine)

I’ve been looking at the LightGBM (light gradient boosting machine) system lately. One morning before work, I figured I’d zap out a regression demo.

LightGBM is a sophisticated tree-based system that can perform classification (multi-class and binary), regression, and ranking.

There are three programming language interfaces to LightGBM — C, Python, R. I like the relatively easy-to-use Python scikit-learn API. LightGBM isn’t installed by default with the Anaconda Python distribution I use, so I installed it with the command “pip install lightgbm”.

For my demo, I used one of my standard synthetic datasets. The regression problem goal is to predict income from sex, age, State, and political leaning. The 240-item tab-delimited raw data looks like:

F   24   michigan   29500.00   liberal
M   39   oklahoma   51200.00   moderate
F   63   nebraska   75800.00   conservative
M   36   michigan   44500.00   moderate
F   27   nebraska   28600.00   liberal
. . .

For LightGBM, it’s best to use ordinal encoding for categorical variables. I encoded the sex variable as M = 0 and F = 1. I encoded State as Michigan = 0, Nebraska = 1, Oklahoma =2. I encoded politics as conservative = 0, moderate = 1, liberal = 2.

Because LGBM is tree-based, it’s not necessary to normalize numeric data.

I split the encoded data into a 200-item set of training data and a 40-item set of test data. The resulting comma-delimited encoded data looks like:

1, 24, 0, 29500.00, 2
0, 39, 2, 51200.00, 1
1, 63, 1, 75800.00, 0
0, 36, 0, 44500.00, 1
1, 27, 1, 28600.00, 2
. . .

The key statements of my demo program are:

import numpy as np
import lightgbm as lgbm  # Python scikit API

train_file = ".\\Data\\people_train.txt"
# sex, age, State, income, politics
#  0    1     2       3       4
x_train = np.loadtxt(train_file, usecols=[0,1,2,4],
  delimiter=",", comments="#", dtype=np.float64)
y_train = np.loadtxt(train_file, usecols=3,
  delimiter=",", comments="#", dtype=np.float64)

params = {
  'objective': 'regression', # not required
  'boosting_type': 'gbdt',  # default
  'num_leaves': 31,  # default
  'learning_rate': 0.05,  # default = 0.10
  'min_data_in_leaf': 2,  # default = 20
  'random_state': 99,  # default = None
  'verbosity': -1
}
model = lgbm.LGBMRegressor(**params) 
model.fit(x_train, y_train)

The main challenge when using LightGBM is wading through the dozens of parameters. There are 57 Learning Control Parameters (min_data_in_leaf, bagging_fraction, etc.), and the LGBMRegressor module has 19 parameters, for a total of 76 parameters to deal with. Here are the 19 model parameters:

boosting_type='gbdt', 
num_leaves=31,
max_depth=-1,
learning_rate=0.1,
n_estimators=100,
subsample_for_bin=200000,
objective=None,
class_weight=None,
min_split_gain=0.0,
min_child_weight=0.001,
min_child_samples=20,
subsample=1.0,
subsample_freq=0,
colsample_bytree=1.0,
reg_alpha=0.0,
reg_lambda=0.0,
random_state=None,
n_jobs=None,
importance_type='split',
**kwargs

Because the number of parameters is not manageable, you must rely on the default values and then try to find the handful of parameters that will create a good model. For my demo, I changed the learning rate from default 0.10 to 0.05, the random_state (from default None to an arbitrary value of 99, to get reproducible results), and the min_data_in_leaf from the default of 20 to 2 — it had a big effect. I also set verbosity to -1 to suppress messages, but in a non-demo scenario you really want to see all system warning and error messages. The near-impossibility of fully understanding all the LightGBM parameters and their interactions is the main reason why I rarely use LightGBM.

Anyway, the LightGBM model predicted the 40-item test data with 85% accuracy (34 out of 40 correct) where a correct income prediction is one that’s within 10% of the true income. This is roughly comparable to the accuracy achieved by a neural network binary classifier. When LightGBM works, it often works very well. However, tree-based systems are highly susceptible to overfitting, although the LightGBM algorithms mitigate overfitting.



There are many ways to generate income. A well-known movie theme is the attractive woman who is after a rich man — the “gold digger”. Here are three gold digger comedies that I like.

In “Heartbreakers” (2001) Max (actress Sigourney Weaver) and Page (actress Jennifer Love Hewitt) are a mother-daughter team of con artists. The conspire to get tycoon William Tensey (actor Gene Hackman) to propose marriage to Max. This movie has some scenes that I thought were hilarious.

In “Tommy Boy” (1995), widower “Big Tom” Callahan is a wealthy owner of an automobile parts company. Beverly (actress Bo Derek) tricks Big Tom into marrying her. Her plan is foiled by son Tommy (actor Chris Farley) who has a heart of gold and a head of lead — but who rises to the challenge when the chips are down. Another movie with some very funny scenes, and it has become something of a cult favorite.

In “Gentlemen Prefer Blondes” (1953), Lorelei Lee (actress Marilyn Monroe) and her friend Dorothy (actress Jane Russell) are showgirls looking for husbands. Lorelei, who has a good heart, meets and falls in love with young scion Gus Esmond. Great scene at the end of the movie where the rich father of Gus says, “Young lady, you don’t fool me one bit. Have you got the nerve to stand there and expect me to believe that you don’t want to marry my son for his money?” Lorelei replies, “Of course not. I want to marry him for YOUR money.” And they all live happily ever after. A famous, iconic, and quite entertaining movie.


Demo program. Replace the “lt” (less than) with Boolean operator symbol.

# people_income_lgbm.py

import numpy as np
import lightgbm as lgbm

def accuracy(model, data_x, data_y, pct_close):
  n = len(data_x)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_x[i].reshape(1, -1)
    y = data_y[i]  # true income
    pred = model.predict(x)  # predicted income []
    if np.abs(pred[0] - y) "lt" np.abs(pct_close * y):
      n_correct += 1
    else:
      n_wrong += 1
  return (n_correct * 1.0) / (n_correct + n_wrong)

def main():
  # 0. get started
  print("\nBegin People predict income using LightGBM ")
  print("Predict income from sex, age, State, politics ")
  np.random.seed(1)

  # 1. load data
  # sex, age, State, income, politics
  #  0    1     2       3       4
  print("\nLoading train and test data ")
  train_file = ".\\Data\\people_train.txt"
  test_file = ".\\Data\\people_test.txt"

  x_train = np.loadtxt(train_file, usecols=[0,1,2,4],
    delimiter=",", comments="#", dtype=np.float64)
  y_train = np.loadtxt(train_file, usecols=3,
    delimiter=",", comments="#", dtype=np.float64)

  x_test = np.loadtxt(test_file, usecols=[0,1,2,4],
    delimiter=",", comments="#", dtype=np.float64)
  y_test = np.loadtxt(test_file, usecols=3,
    delimiter=",", comments="#", dtype=np.float64)

  np.set_printoptions(precision=0, suppress=True)
  print("\nFirst few train data: ")
  for i in range(3):
    print(x_train[i], end="")
    print("  | " + str(y_train[i]))
  print(". . . ")

  # 2. create and train model
  print("\nCreating and training LightGBM regression model ")
  params = {
    'objective': 'regression',  # not required
    'boosting_type': 'gbdt',  # default
    'num_leaves': 31,  # default
    'learning_rate': 0.05,  # default = 0.10
    'feature_fraction': 1.0,  # default
    'min_data_in_leaf': 2,  # default = 20
    'random_state': 99,
    'verbosity': -1
  }
  model = lgbm.LGBMRegressor(**params)  # scikit API
  model.fit(x_train, y_train)
  print("Done ")

  # 3. evaluate model
  print("\nEvaluating model accuracy (within 0.10) ")
  acc_train = accuracy(model, x_train, y_train, 0.10)
  print("accuracy on train data = %0.4f " % acc_train)
  acc_test = accuracy(model, x_test, y_test, 0.10)
  print("accuracy on test data = %0.4f " % acc_test)

  # 4. use model
  print("\nPredicting income for M 35 Oklahoma moderate ")
  x = np.array([[0, 35, 2, 1]], dtype=np.float64)
  y_pred = model.predict(x)
  print("\nPredicted income = %0.2f " % y_pred[0])

  print("\nEnd demo ")

if __name__ == "__main__":
  main()

Training data:

# people_train.txt
# sex (M = 0, F = 1)
# age
# State (Michigan = 0, Nebraska = 1, Oklahoma = 2)
# income
# politics (conservative = 0, moderate = 1, liberal = 2)
#
1,24,0,29500.00,2
0,39,2,51200.00,1
1,63,1,75800.00,0
0,36,0,44500.00,1
1,27,1,28600.00,2
1,50,1,56500.00,1
1,50,2,55000.00,1
0,19,2,32700.00,0
1,22,1,27700.00,1
0,39,2,47100.00,2
1,34,0,39400.00,1
0,22,0,33500.00,0
1,35,2,35200.00,2
0,33,1,46400.00,1
1,45,1,54100.00,1
1,42,1,50700.00,1
0,33,1,46800.00,1
1,25,2,30000.00,1
0,31,1,46400.00,0
1,27,0,32500.00,2
1,48,0,54000.00,1
0,64,1,71300.00,2
1,61,1,72400.00,0
1,54,2,61000.00,0
1,29,0,36300.00,0
1,50,2,55000.00,1
1,55,2,62500.00,0
1,40,0,52400.00,0
1,22,0,23600.00,2
1,68,1,78400.00,0
0,60,0,71700.00,2
0,34,2,46500.00,1
0,25,2,37100.00,0
0,31,1,48900.00,1
1,43,2,48000.00,1
1,58,1,65400.00,2
0,55,1,60700.00,2
0,43,1,51100.00,1
0,43,2,53200.00,1
0,21,0,37200.00,0
1,55,2,64600.00,0
1,64,1,74800.00,0
0,41,0,58800.00,1
1,64,2,72700.00,0
0,56,2,66600.00,2
1,31,2,36000.00,1
0,65,2,70100.00,2
1,55,2,64300.00,0
0,25,0,40300.00,0
1,46,2,51000.00,1
0,36,0,53500.00,0
1,52,1,58100.00,1
1,61,2,67900.00,0
1,57,2,65700.00,0
0,46,1,52600.00,1
0,62,0,66800.00,2
1,55,2,62700.00,0
0,22,2,27700.00,1
0,50,0,62900.00,0
0,32,1,41800.00,1
0,21,2,35600.00,0
1,44,1,52000.00,1
1,46,1,51700.00,1
1,62,1,69700.00,0
1,57,1,66400.00,0
0,67,2,75800.00,2
1,29,0,34300.00,2
1,53,0,60100.00,0
0,44,0,54800.00,1
1,46,1,52300.00,1
0,20,1,30100.00,1
0,38,0,53500.00,1
1,50,1,58600.00,1
1,33,1,42500.00,1
0,33,1,39300.00,1
1,26,1,40400.00,0
1,58,0,70700.00,0
1,43,2,48000.00,1
0,46,0,64400.00,0
1,60,0,71700.00,0
0,42,0,48900.00,1
0,56,2,56400.00,2
0,62,1,66300.00,2
0,50,0,64800.00,1
1,47,2,52000.00,1
0,67,1,80400.00,2
0,40,2,50400.00,1
1,42,1,48400.00,1
1,64,0,72000.00,0
0,47,0,58700.00,2
1,45,1,52800.00,1
0,25,2,40900.00,0
1,38,0,48400.00,0
1,55,2,60000.00,1
0,44,0,60600.00,1
1,33,0,41000.00,1
1,34,2,39000.00,1
1,27,1,33700.00,2
1,32,1,40700.00,1
1,42,2,47000.00,1
0,24,2,40300.00,0
1,42,1,50300.00,1
1,25,2,28000.00,2
1,51,1,58000.00,1
0,55,1,63500.00,2
1,44,0,47800.00,2
0,18,0,39800.00,0
0,67,1,71600.00,2
1,45,2,50000.00,1
1,48,0,55800.00,1
0,25,1,39000.00,1
0,67,0,78300.00,1
1,37,2,42000.00,1
0,32,0,42700.00,1
1,48,0,57000.00,1
0,66,2,75000.00,2
1,61,0,70000.00,0
0,58,2,68900.00,1
1,19,0,24000.00,2
1,38,2,43000.00,1
0,27,0,36400.00,1
1,42,0,48000.00,1
1,60,0,71300.00,0
0,27,2,34800.00,0
1,29,1,37100.00,0
0,43,0,56700.00,1
1,48,0,56700.00,1
1,27,2,29400.00,2
0,44,0,55200.00,0
1,23,1,26300.00,2
0,36,1,53000.00,2
1,64,2,72500.00,0
1,29,2,30000.00,2
0,33,0,49300.00,1
0,66,1,75000.00,2
0,21,2,34300.00,0
1,27,0,32700.00,2
1,29,0,31800.00,2
0,31,0,48600.00,1
1,36,2,41000.00,1
1,49,1,55700.00,1
0,28,0,38400.00,0
0,43,2,56600.00,1
0,46,1,58800.00,1
1,57,0,69800.00,0
0,52,2,59400.00,1
0,31,2,43500.00,1
0,55,0,62000.00,2
1,50,0,56400.00,1
1,48,1,55900.00,1
0,22,2,34500.00,0
1,59,2,66700.00,0
1,34,0,42800.00,2
0,64,0,77200.00,2
1,29,2,33500.00,2
0,34,1,43200.00,1
0,61,0,75000.00,2
1,64,2,71100.00,0
0,29,0,41300.00,0
1,63,1,70600.00,0
0,29,1,40000.00,0
0,51,0,62700.00,1
0,24,2,37700.00,0
1,48,1,57500.00,1
1,18,0,27400.00,0
1,18,0,20300.00,2
1,33,1,38200.00,2
0,20,2,34800.00,0
1,29,2,33000.00,2
0,44,2,63000.00,0
0,65,2,81800.00,0
0,56,0,63700.00,2
0,52,2,58400.00,1
0,29,1,48600.00,0
0,47,1,58900.00,1
1,68,0,72600.00,2
1,31,2,36000.00,1
1,61,1,62500.00,2
1,19,1,21500.00,2
1,38,2,43000.00,1
0,26,0,42300.00,0
1,61,1,67400.00,0
1,40,0,46500.00,1
0,49,0,65200.00,1
1,56,0,67500.00,0
0,48,1,66000.00,1
1,52,0,56300.00,2
0,18,0,29800.00,0
0,56,2,59300.00,2
0,52,1,64400.00,1
0,18,1,28600.00,1
0,58,0,66200.00,2
0,39,1,55100.00,1
0,46,0,62900.00,1
0,40,1,46200.00,1
0,60,0,72700.00,2
1,36,1,40700.00,2
1,44,0,52300.00,1
1,28,0,31300.00,2
1,54,2,62600.00,0

Test data:

# people_test.txt
#
0,51,0,61200.00,1
0,32,1,46100.00,1
1,55,0,62700.00,0
1,25,2,26200.00,2
1,33,2,37300.00,2
0,29,1,46200.00,0
1,65,0,72700.00,0
0,43,1,51400.00,1
0,54,1,64800.00,2
1,61,1,72700.00,0
1,52,1,63600.00,0
1,30,1,33500.00,2
1,29,0,31400.00,2
0,47,2,59400.00,1
1,39,1,47800.00,1
1,47,2,52000.00,1
0,49,0,58600.00,1
0,63,2,67400.00,2
0,30,0,39200.00,0
0,61,2,69600.00,2
0,47,2,58700.00,1
1,30,2,34500.00,2
0,51,2,58000.00,1
0,24,0,38800.00,1
0,49,0,64500.00,1
1,66,2,74500.00,0
0,65,0,76900.00,0
0,46,1,58000.00,0
0,45,2,51800.00,1
0,47,0,63600.00,0
0,29,0,44800.00,0
0,57,2,69300.00,2
0,20,0,28700.00,2
0,35,0,43400.00,1
0,61,2,67000.00,2
0,31,2,37300.00,1
1,18,0,20800.00,2
1,26,2,29200.00,2
0,28,0,36400.00,2
0,59,2,69400.00,2
Posted in Machine Learning | Leave a comment

Clustering Mixed Categorical and Numeric Data Using k-Means With C#

Data clustering is the process of grouping data items together so that similar items are in the same group/cluster. For strictly numeric data, the k-means clustering technique is simplest, and the most commonly used. For non-numeric, i.e. categorical data, there are fairly complicated techniques that use entropy or Bayesian probability or categorical utility. But clustering mixed categorical and numeric data is very tricky.

I use a technique for clustering mixed data that I haven’t seen described anywhere. Briefly, for numeric data, I use min-max normalization. For standard nominal categorical data, I encode using one-over-n-hot encoding. For binary categorical data, I use reduced one-over-n-hot encoding (zero-zero-point-five). For ordinal categorical data, I encode using equal-interval encoding. After normalizing and encoding this way, all values will be between 0.0 and 1.0 so that k-means can be used without modification.

The normalization and encoding is best explained using a concrete example. I created a synthtic 240-item dataset that looks like:

F  short   24  arkansas  29500  liberal
M  tall    39  delaware  51200  moderate
F  short   63  colorado  75800  conservative
M  medium  36  illinois  44500  moderate
F  short   27  colorado  28600  liberal
. . .

Each line represents a person. The fields are sex, height, age, State, income, political leaning.

The encoded and normalized data looks like:

0.5, 0.25, 0.12, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.42, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.90, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.36, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.18, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
. . .

The sex variable is binary categorical so I encode M = 0.0 and F = 0.5.

The height variable is ordinal categorical so I use equal-interval encoding as short = 0.25, medium = 0.50, tall = 0.75.

The age variable is numeric so I use min-max normalization. The min age in the datset is 18 and the max age is 68, so normalized age = (age – 18) / (68 – 18).

There are four possible values for the nominal categorical State variable so I encode them as Arkansas = 0.25 0 0 0, Colorado = 0 0.25 0 0, Delaware = 0 0 0.25 0, Illinois = 0 0 0 0.25.

The income variable is numeric so I use min-max normalization. The min income in the dataset is $20,300 and the max income is $81,800, so normalized income = (income – 20300) / (81800 – 20300).

There are three possible values for the nominal categorical variable so I encode them as conservative = 0.3333 0 0, moderate = 0 0.3333 0, liberal = 0 0 0.3333.

I fed the encoded and normalized data to a C# implementation of k–means clustering. I used k = 3 clusters and got this clustering:

Clustering with k=3 seed=0
Done

Result clustering:
  0  1  2  0  0  2  2  0  0  1  0  0  0  1  2  2 . . .
Result WCSS = 49.3195

The seed value controls the initial random cluster assignments. Different seed values should give very similar (but not necessarily identical) results. If different seed values give significantly different results, the k-means technique is not a good choice for the dataset.

The clustering result means item [0] belongs to cluster 0, item[1] belongs to cluster 1, item [2] belongs to cluster 2, item [3] belongs to cluster 0, and so on. The WCSS (within cluster sum of squares) is the value that k-means attempts to minimize, so smaller values are better.

Another way to view the clustering results is by-cluster:

cluster 0 | count = 89 :
   0    3    4    7    8   10   11   12   17   18   19  . . .

cluster 1 | count = 77 :
   1    9   13   16   21   30   31   36   37   38   42  . . .

cluster 2 | count = 74 :
   2    5    6   14   15   20   22   23   25   26   27  . . .

This means data items [0], [3], [4], etc. are in cluster 0, and so on. A third way to view the results is source data by cluster. For cluster 1:

cluster 1:
[  1]  M tall 39 delaware 51200 moderate
[  9]  M tall 39 delaware 47100 liberal
[ 13]  M tall 33 colorado 46400 moderate
 . . .

So cluster 1 looks like the “tall male mid-30s” cluster. The demo program concludes by displaying the 3 means/centroids for the clusters:

Means:
[  0]    0.3  0.40  0.19  0.09  0.06  0.09  0.02  0.2568  0.1049  0.1123  0.1161
[  1]    0.0  0.63  0.66  0.08  0.06  0.08  0.03  0.6758  0.0390  0.1645  0.1299
[  2]    0.5  0.32  0.69  0.07  0.08  0.05  0.05  0.6542  0.1576  0.1531  0.0225

The data items assigned to cluster 0 average to (0.3, 0.40, 0.19, 0.09, 0.06, 0.09, 0.02, 0.2568, 0.1049, 0.1123, 0.1161). All the data items assigned to cluster 0 are closer to that mean/centroid vector than to the other two means/centroids. And so on.

Compared to specialized techniques for clustering mixed categorical and numeric data (such as k-prototypes clustering) an advantage of the technique described here is that you can use any k-means implementation. For example, I passed the normalized and encoded data to the scikit-learn library KMeans module and got identical results. I’ve listed that Python program at the very bottom of this post.



I’m a big fan of old 1950s science fiction movies. Here’s a cluster of three movies that I like, which feature very slow-moving threats.

Left: In “Caltiki the Immortal Monster” (1959), Caltiki is a big blob monster that lives in ancient Mayan ruins. He moves at a glacial pace, yet somehow manages to trap several archeologists. I watched this at least one hundred times on TV when I was young.

Center: In “From Hell It Came” (1957), native Kimo is falsely accused of murder and is executed. Kimo’s body is placed in a hollow tree stump — that has been exposed to radiation from atomic tests. Bad idea. Tree-Kimo may be the slowest threat in sci fi movie history, but I like this film anyway.

Right: In “The Creeping Unknown” (1955), also known as “The Quatermass Xperiment”, Dr. Quatermass oversees a first-men-into-space effort. Three go up. Only one returns. He’s infected with something and becomes a blob-like creature that threatens to grow until it overwhelms the planet.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor often chokes on these symbols).

using System;
using System.IO;
using System.Collections.Generic;

namespace ClusterMixedKMeans
{
  internal class ClusterMixedProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin mixed data k-means" +
        " using C# ");

      string rf =
        "..\\..\\..\\Data\\people_raw_space.txt";
      string[] rawFileArray = FileLoad(rf, "#");

      Console.WriteLine("\nRaw source data: ");
      for (int i = 0; i "lt" 4; ++i)
      {
        Console.Write("[" + i.ToString().PadLeft(3) + "]  ");
        Console.WriteLine(rawFileArray[i]);
      }
      Console.WriteLine(" . . . ");

      // preprocessed data version
      string fn =
        "..\\..\\..\\Data\\people_encoded.txt";
      double[][] X = MatLoad(fn,
        new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 },
        ',', "#");

      // programmatic version
      //string rf =
      //  "..\\..\\..\\Data\\people_raw_space.txt";
      //double[][] X = NormAndEncode(rf, ' ', "#");

      Console.WriteLine("\nEncoded data: ");
      // decimals to display
      int[] decs = new int[] { 1, 2,2,2,2,2,2, 4,4,4,4 };
      MatShow(X, decs, 4, true);

      Console.WriteLine("\nClustering with k=3 seed=0");
      KMeans km = new KMeans(X, k:3, seed:0);
      // km.trials = X.Length * 5; // set n trials explicit
      int[] clustering = km.Cluster();
      Console.WriteLine("Done ");
      
      Console.WriteLine("\nResult clustering: ");
      VecShow(clustering, 3, 16);
      Console.WriteLine("Result WCSS = " + 
        km.bestWCSS.ToString("F4"));

      List"lt"int"gt"[] clusterLists = 
        ItemsByCluster(clustering, k:3);
      Console.WriteLine("\nItem indices by cluster ID: ");
      ShowItemIndicesByCluster(clusterLists, 12);

      Console.WriteLine("\nSource data by cluster ID: ");
      ShowItemsByCluster(clusterLists, rawFileArray, 3);

      Console.WriteLine("\nMeans: ");
      MatShow(km.bestMeans, decs, nRows:3,
        showIndices:true);
 
      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main

    // ------------------------------------------------------
    // helper: NormAndEncode() for this data only
    // ------------------------------------------------------

    static double[][] NormAndEncode(string fn, char delim,
      string comment)
    {
      // specific to this demo data
      // F,short,24,arkansas,29500,liberal
      // M,tall,39,delaware,51200,moderate
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      string line = "";
      string[] tokens = null;

      double[][] result = new double[240][];
      for (int k = 0; k "lt" 240; ++k)
        result[k] = new double[11];

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true) continue;
        line = line.Trim();
        tokens = line.Split(delim);

        // sex
        string sexStr = tokens[0].Trim();
        if (sexStr == "M") result[i][0] = 0.0;
        else if (sexStr == "F") result[i][0] = 0.5;
        // height
        string heightStr = tokens[1].Trim();
        if (heightStr == "short") result[i][1] = 0.25;
        else if (heightStr == "medium") result[i][1] = 0.50;
        else if (heightStr == "tall") result[i][1] = 0.75;
        // age
        double age = double.Parse(tokens[2].Trim());
        double ageMin = 18.0;
        double ageMax = 68.0;
        result[i][2] = (age - ageMin) / (ageMax - ageMin);
        // State
        string stateStr = tokens[3].Trim();
        if (stateStr == "arkansas") result[i][3] = 0.25;
        else if (stateStr == "colorado") result[i][4] = 0.25;
        else if (stateStr == "delaware") result[i][5] = 0.25;
        else if (stateStr == "illinois") result[i][6] = 0.25;
        // income
        double income = double.Parse(tokens[4]);
        double incomeMin = 20300.0;
        double incomeMax = 81800.0;
        result[i][7] = 
          (income - incomeMin) / (incomeMax - incomeMin);
        // political leaning
        string politicsStr = tokens[5].Trim();
        if (politicsStr == "conservative") 
          result[i][8] = 0.3333;
        else if (politicsStr == "moderate") 
          result[i][9] = 0.3333;
        else if (politicsStr == "liberal") 
          result[i][10] = 0.3333;

        ++i;  // next row
      }
      return result;
    }

    // ------------------------------------------------------
    // helpers specifically for k-means: ItemsByCluster(),
    // ShowItemIndicesByCluster(), ShowItemsByCluster()
    // ------------------------------------------------------

    static List"lt"int"gt"[] ItemsByCluster(int[] clustering,
      int k)
    {
      // this.clustering is like [2, 0, 1, 1, . . ]
      List"lt"int"gt"[] result = new List"lt"int"gt"[k];
      // array of Lists of int
      for (int cid = 0; cid "lt" k; ++cid)
        result[cid] = new List"lt"int"gt"();

      int n = clustering.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        int clusterID = clustering[i];
        result[clusterID].Add(i);
      }
      return result;
    }

    // ------------------------------------------------------

    static void ShowItemIndicesByCluster(List"lt"int"gt"[]
      arr, int nItemsPerCluster)
    {
      // nItemsPerCluster limits display
      for (int cid = 0; cid "lt" arr.Length; ++cid)
      {
        Console.WriteLine("\ncluster " + cid + 
          " | count = " + arr[cid].Count + " : ");
        if (arr[cid].Count "lt" nItemsPerCluster) 
          nItemsPerCluster = arr[cid].Count;
        for (int i = 0; i "lt" nItemsPerCluster; ++i)
        {
          Console.Write(arr[cid][i].ToString().
            PadLeft(4) + " ");
        }
        if (nItemsPerCluster "lt" arr[cid].Count)
          Console.Write(" . . . ");
        Console.WriteLine("");
      }
    }

    // ------------------------------------------------------

    static void ShowItemsByCluster(List"lt"int"gt"[] arr,
      string[] rawData, int nItemsPerCluster)
    {
      // nItemsPerCluster limits display
      for (int cid = 0; cid "lt" arr.Length; ++cid)
      {
        Console.WriteLine("\ncluster " + cid + ": ");
        if (arr[cid].Count "lt" nItemsPerCluster)
          nItemsPerCluster = arr[cid].Count;
        for (int i = 0; i "lt" nItemsPerCluster; ++i)
        {
          int idx = arr[cid][i];
          string s = rawData[idx];
          Console.Write("[" + idx.ToString().
            PadLeft(3) + "]  ");
          Console.WriteLine(s);
        }
        if (nItemsPerCluster "lt" arr[cid].Count)
          Console.WriteLine(" . . . ");
        else Console.WriteLine("");
      }
    }

    // ------------------------------------------------------
    // general helpers:
    // MatShow(), VecShow(), FileLoad(), MatLoad()
    // ------------------------------------------------------

    // ------------------------------------------------------

    static void MatShow(double[][] m, int[] decs,
      int nRows, bool showIndices)
    {
      // decs[] = number decimals to display for each column
      for (int i = 0; i "lt" nRows; ++i)
      {
        if (showIndices == true)
          Console.Write("[" + i.ToString().
            PadLeft(3) + "]  ");
        for (int j = 0; j "lt" m[0].Length; ++j)
        {
          double v = m[i][j];
          Console.Write(v.ToString("F" + decs[j]).
            PadLeft(decs[j] + 4));
        }
        Console.WriteLine("");
      }
      if (nRows "lt" m.Length)
        Console.WriteLine(" . . . ");
    }

    // ------------------------------------------------------

    static void VecShow(int[] vec, int wid, int nItems)
    {
      if (vec.Length "lt" nItems) nItems = vec.Length;
      for (int i = 0; i "lt" nItems; ++i)
      {
        Console.Write(vec[i].ToString().PadLeft(wid));
      }
      if (nItems "lt" vec.Length) Console.Write(" . . . ");
      Console.WriteLine("");
    }

    // ------------------------------------------------------

    static string[] FileLoad(string fn, string comment)
    {
      List"lt"string"gt" lst = new List"lt"string"gt"();
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      string line = "";
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment)) continue;
        line = line.Trim();
        lst.Add(line);
      }
      sr.Close(); ifs.Close();
      string[] result = lst.ToArray();
      return result;
    }

    // ------------------------------------------------------

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment)
    {
      // self-contained
      int nRows = 0;
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
        if (line.StartsWith(comment) == false)
          ++nRows;
      sr.Close(); ifs.Close();  // could reset fp instead

      int nCols = usecols.Length;
      double[][] result = new double[nRows][];
      for (int r = 0; r "lt" nRows; ++r)
        result[r] = new double[nCols];

      line = "";
      string[] tokens = null;
      ifs = new FileStream(fn, FileMode.Open);
      sr = new StreamReader(ifs);

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        tokens = line.Split(sep);
        for (int j = 0; j "lt" nCols; ++j)
        {
          int k = usecols[j];  // into tokens
          result[i][j] = double.Parse(tokens[k]);
        }
        ++i;
      }
      sr.Close(); ifs.Close();
      return result;
    }

    // ------------------------------------------------------
  } // Program

  public class KMeans
  {
    // all members public for easier debugging
    public double[][] data;
    public int k;
    public int N;
    public int dim;
    public int trials;  // to find best
    public int maxIter; // inner loop
    public Random rnd;
    public int[] clustering; // scratch not final
    public double[][] means; // scratch not final

    public int[] bestClustering;
    public double[][] bestMeans; // allocated in Cluster()
    public double bestWCSS;

    // ------------------------------------------------------
    // public methods:
    //   KMeans(), Cluster()
    //
    // private methods:
    //   Initialize(), Shuffle(),  SumSquared(), WCSS()
    //   EucDistance(), ArgMin(), AreEqual(),
    //   UpdateMeans(), UpdateClustering(), ClusterOnce()
    // ------------------------------------------------------

    public KMeans(double[][] data, int k, int seed)
    {
      this.data = data;  // by ref
      this.k = k;  // assumes k is 2 or greater
      this.N = data.Length;
      this.dim = data[0].Length;
      this.trials = N * 5;   // for Cluster()
      this.maxIter = N * 2;  // sanity for ClusterOnce()
      this.Initialize(seed); // seed, means, clustering
    }

    public int[] Cluster()
    {
      // special case k = 1
      if (this.k == 1)
      {
        // single mean of all data
        for (int i = 0; i "lt" this.data.Length; ++i)
          for (int j = 0; j "lt" this.dim; ++j)
            this.means[0][j] += this.data[i][j];
        for (int j = 0; j "lt" this.dim; ++j)
          this.means[0][j] /= this.N;
        this.bestMeans = Copy(this.means);

        // all items belong to cluster 0
        for (int i = 0; i "lt" this.N; ++i)
          this.clustering[i] = 0;

        // WCSS
        double wcss = 0.0;
        for (int i = 0; i "lt" this.N; ++i)
          wcss += SumSquared(this.bestMeans[0],
            this.data[i]);
        this.bestWCSS = wcss;

        return this.clustering;
      }

      // k = 2 or greater
      this.bestWCSS = this.WCSS();  // initial clustering
      this.bestClustering = Copy(this.clustering);
      this.bestMeans = Copy(this.means);

      for (int i = 0; i "lt" this.trials; ++i)
      {
        this.Initialize(i);  // new seed, means, clustering
        int[] clustering = this.ClusterOnce();
        double wcss = this.WCSS();
        if (wcss "lt" this.bestWCSS)
        {
          this.bestWCSS = wcss;
          this.bestClustering = Copy(clustering);
          this.bestMeans = Copy(this.means);
        }
      }
      return this.bestClustering;
    } // Cluster()

    private int[] ClusterOnce()
    {
      bool ok = true;
      int sanityCt = 1;
      while (sanityCt "lte" this.maxIter)  // N * 2
      {
        if ((ok = this.UpdateClustering() == false)) break;
        if ((ok = this.UpdateMeans() == false)) break;
        ++sanityCt;
      }
      // consider warning if sanity "gt" maxIter
      return this.clustering;
    } // ClusterOnce()

    private void Initialize(int seed)
    {
      this.rnd = new Random(seed);
      this.clustering = new int[this.N];  // scratch
      this.means = new double[this.k][];  // scratch
      for (int i = 0; i "lt" this.k; ++i)
        this.means[i] = new double[this.dim];

      // initial clustering
      // Random Partition (not Forgy or k-means++)
      int[] indices = new int[this.N];
      for (int i = 0; i "lt" this.N; ++i)
        indices[i] = i;
      Shuffle(indices);
      for (int i = 0; i "lt" this.k; ++i)  // first k items
        this.clustering[indices[i]] = i;
      for (int i = this.k; i "lt" this.N; ++i)
        this.clustering[indices[i]] =
          this.rnd.Next(0, this.k); // remaining items
      this.UpdateMeans();
    }

    private void Shuffle(int[] indices)
    {
      // Fisher-Yates mini-algorithm
      int n = indices.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        int r = this.rnd.Next(i, n);
        int tmp = indices[i];
        indices[i] = indices[r];
        indices[r] = tmp;
      }
    }

    private static double SumSquared(double[] v1,
      double[] v2)
    {
      // used by EucDistance() and WCSS()
      int dim = v1.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" dim; ++i)
        sum += (v1[i] - v2[i]) * (v1[i] - v2[i]);
      return sum;
    }

    private static double EucDistance(double[] item,
      double[] mean)
    {
      double ss = SumSquared(item, mean);
      return Math.Sqrt(ss);
    }

    private static int ArgMin(double[] v)
    {
      // index of smallest value in v
      int dim = v.Length;
      int minIdx = 0;
      double minVal = v[0];
      for (int i = 0; i "lt" v.Length; ++i)
      {
        if (v[i] "lt" minVal)
        {
          minVal = v[i];
          minIdx = i;
        }
      }
      return minIdx;
    }

    private static bool AreEqual(int[] a1, int[] a2)
    {
      // to check if clustering has changed
      int dim = a1.Length;
      for (int i = 0; i "lt" dim; ++i)
        if (a1[i] != a2[i]) return false;
      return true;
    }

    private static int[] Copy(int[] arr)
    {
      // called by Cluster()
      // make a copy of new best clustering
      int dim = arr.Length;
      int[] result = new int[dim];
      for (int i = 0; i "lt" dim; ++i)
        result[i] = arr[i];
      return result;
    }

    private static double[][] Copy(double[][] matrix)
    {
      // make a copy of new best means
      int nr = matrix.Length;
      int nc = matrix[0].Length;
      double[][] result = new double[nr][];
      for (int i = 0; i "lt" nr; ++i)
        result[i] = new double[nc];
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[i][j] = matrix[i][j];
      return result;
    }

    private bool UpdateMeans()
    {
      // first, verify no zero-counts
      // should never happen
      int[] counts = new int[this.k];
      for (int i = 0; i "lt" this.N; ++i)
      {
        int cid = this.clustering[i];
        ++counts[cid];
      }
      for (int kk = 0; kk "lt" this.k; ++kk)
      {
        if (counts[kk] == 0)
          throw
            new Exception("0-count in UpdateMeans()");
      }

      // compute proposed new means
      for (int kk = 0; kk "lt" this.k; ++kk)
        counts[kk] = 0;  // reset
      double[][] newMeans = new double[this.k][];
      for (int i = 0; i "lt" this.k; ++i)
        newMeans[i] = new double[this.dim];
      for (int i = 0; i "lt" this.N; ++i)
      {
        int cid = this.clustering[i];
        ++counts[cid];
        for (int j = 0; j "lt" this.dim; ++j)
          newMeans[cid][j] += this.data[i][j];
      }
      for (int kk = 0; kk "lt" this.k; ++kk)
        if (counts[kk] == 0)
          return false;  // bad attempt to update

      for (int kk = 0; kk "lt" this.k; ++kk)
        for (int j = 0; j "lt" this.dim; ++j)
          newMeans[kk][j] /= counts[kk];

      // copy new means
      for (int kk = 0; kk "lt" this.k; ++kk)
        for (int j = 0; j "lt" this.dim; ++j)
          this.means[kk][j] = newMeans[kk][j];

      return true;
    } // UpdateMeans()

    private bool UpdateClustering()
    {
      // first, verify no zero-counts
      int[] counts = new int[this.k];
      for (int i = 0; i "lt" this.N; ++i)
      {
        int cid = this.clustering[i];
        ++counts[cid];
      }
      // should never happen
      for (int kk = 0; kk "lt" this.k; ++kk)
      {
        if (counts[kk] == 0)
          throw new
            Exception("0-count in UpdateClustering()");
      }

      // proposed new clustering
      int[] newClustering = new int[this.N];
      for (int i = 0; i "lt" this.N; ++i)
        newClustering[i] = this.clustering[i];

      double[] distances = new double[this.k];
      for (int i = 0; i "lt" this.N; ++i)
      {
        for (int kk = 0; kk "lt" this.k; ++kk)
        {
          distances[kk] =
            EucDistance(this.data[i], this.means[kk]);
          int newID = ArgMin(distances);
          newClustering[i] = newID;
        }
      }

      if (AreEqual(this.clustering, newClustering) == true)
        return false;  // no change; short-circuit

      // make sure no count went to 0
      for (int i = 0; i "lt" this.k; ++i)
        counts[i] = 0;  // reset
      for (int i = 0; i "lt" this.N; ++i)
      {
        int cid = newClustering[i];
        ++counts[cid];
      }
      for (int kk = 0; kk "lt" this.k; ++kk)
        if (counts[kk] == 0)
          return false;  // bad update attempt

      // no 0 counts so update
      for (int i = 0; i "lt" this.N; ++i)
        this.clustering[i] = newClustering[i];

      return true;
    } // UpdateClustering()
    
    private double WCSS()
    {
      // within-cluster sum of squares
      double sum = 0.0;
      for (int i = 0; i "lt" this.N; ++i)
      {
        int cid = this.clustering[i];
        double[] mean = this.means[cid];
        double ss = SumSquared(this.data[i], mean);
        sum += ss;
      }
      return sum;
    }

  } // class KMeans
} // ns

Raw data:

# people_raw_space.txt
# space delimited
#
F short 24 arkansas 29500 liberal
M tall 39 delaware 51200 moderate
F short 63 colorado 75800 conservative
M medium 36 illinois 44500 moderate
F short 27 colorado 28600 liberal
F short 50 colorado 56500 moderate
F medium 50 illinois 55000 moderate
M tall 19 delaware 32700 conservative
F short 22 illinois 27700 moderate
M tall 39 delaware 47100 liberal
F short 34 arkansas 39400 moderate
M medium 22 illinois 33500 conservative
F medium 35 delaware 35200 liberal
M tall 33 colorado 46400 moderate
F short 45 colorado 54100 moderate
F short 42 illinois 50700 moderate
M tall 33 colorado 46800 moderate
F tall 25 delaware 30000 moderate
M medium 31 colorado 46400 conservative
F short 27 arkansas 32500 liberal
F short 48 illinois 54000 moderate
M tall 64 illinois 71300 liberal
F medium 61 colorado 72400 conservative
F short 54 illinois 61000 conservative
F short 29 arkansas 36300 conservative
F short 50 delaware 55000 moderate
F medium 55 illinois 62500 conservative
F medium 40 illinois 52400 conservative
F short 22 arkansas 23600 liberal
F short 68 colorado 78400 conservative
M tall 60 illinois 71700 liberal
M tall 34 delaware 46500 moderate
M medium 25 delaware 37100 conservative
M short 31 illinois 48900 moderate
F short 43 delaware 48000 moderate
F short 58 colorado 65400 liberal
M tall 55 illinois 60700 liberal
M tall 43 colorado 51100 moderate
M tall 43 delaware 53200 moderate
M medium 21 arkansas 37200 conservative
F short 55 delaware 64600 conservative
F short 64 colorado 74800 conservative
M tall 41 illinois 58800 moderate
F medium 64 delaware 72700 conservative
M medium 56 illinois 66600 liberal
F short 31 delaware 36000 moderate
M tall 65 delaware 70100 liberal
F tall 55 illinois 64300 conservative
M short 25 arkansas 40300 conservative
F short 46 delaware 51000 moderate
M tall 36 illinois 53500 conservative
F short 52 illinois 58100 moderate
F short 61 delaware 67900 conservative
F short 57 delaware 65700 conservative
M tall 46 colorado 52600 moderate
M tall 62 arkansas 66800 liberal
F short 55 illinois 62700 conservative
M medium 22 delaware 27700 moderate
M tall 50 illinois 62900 conservative
M tall 32 illinois 41800 moderate
M short 21 delaware 35600 conservative
F medium 44 colorado 52000 moderate
F short 46 illinois 51700 moderate
F short 62 colorado 69700 conservative
F short 57 illinois 66400 conservative
M medium 67 illinois 75800 liberal
F short 29 arkansas 34300 liberal
F short 53 illinois 60100 conservative
M tall 44 arkansas 54800 moderate
F medium 46 colorado 52300 moderate
M tall 20 illinois 30100 moderate
M medium 38 illinois 53500 moderate
F short 50 colorado 58600 moderate
F short 33 colorado 42500 moderate
M tall 33 colorado 39300 moderate
F short 26 colorado 40400 conservative
F short 58 arkansas 70700 conservative
F tall 43 illinois 48000 moderate
M medium 46 arkansas 64400 conservative
F short 60 arkansas 71700 conservative
M tall 42 arkansas 48900 moderate
M tall 56 delaware 56400 liberal
M short 62 colorado 66300 liberal
M short 50 arkansas 64800 moderate
F short 47 illinois 52000 moderate
M tall 67 colorado 80400 liberal
M tall 40 delaware 50400 moderate
F short 42 colorado 48400 moderate
F short 64 arkansas 72000 conservative
M medium 47 arkansas 58700 liberal
F medium 45 colorado 52800 moderate
M tall 25 delaware 40900 conservative
F short 38 arkansas 48400 conservative
F short 55 delaware 60000 moderate
M tall 44 arkansas 60600 moderate
F medium 33 arkansas 41000 moderate
F short 34 delaware 39000 moderate
F short 27 colorado 33700 liberal
F short 32 colorado 40700 moderate
F tall 42 illinois 47000 moderate
M short 24 delaware 40300 conservative
F short 42 colorado 50300 moderate
F short 25 delaware 28000 liberal
F short 51 colorado 58000 moderate
M medium 55 colorado 63500 liberal
F short 44 arkansas 47800 liberal
M short 18 arkansas 39800 conservative
M tall 67 colorado 71600 liberal
F short 45 delaware 50000 moderate
F short 48 arkansas 55800 moderate
M short 25 colorado 39000 moderate
M tall 67 arkansas 78300 moderate
F short 37 delaware 42000 moderate
M short 32 arkansas 42700 moderate
F short 48 arkansas 57000 moderate
M tall 66 delaware 75000 liberal
F tall 61 arkansas 70000 conservative
M medium 58 delaware 68900 moderate
F short 19 arkansas 24000 liberal
F short 38 delaware 43000 moderate
M medium 27 arkansas 36400 moderate
F short 42 arkansas 48000 moderate
F short 60 arkansas 71300 conservative
M tall 27 delaware 34800 conservative
F tall 29 colorado 37100 conservative
M medium 43 arkansas 56700 moderate
F medium 48 arkansas 56700 moderate
F medium 27 delaware 29400 liberal
M tall 44 arkansas 55200 conservative
F short 23 colorado 26300 liberal
M tall 36 colorado 53000 liberal
F short 64 delaware 72500 conservative
F short 29 delaware 30000 liberal
M short 33 arkansas 49300 moderate
M tall 66 colorado 75000 liberal
M medium 21 delaware 34300 conservative
F short 27 arkansas 32700 liberal
F short 29 arkansas 31800 liberal
M tall 31 arkansas 48600 moderate
F short 36 delaware 41000 moderate
F short 49 colorado 55700 moderate
M short 28 arkansas 38400 conservative
M medium 43 delaware 56600 moderate
M medium 46 colorado 58800 moderate
F short 57 arkansas 69800 conservative
M short 52 delaware 59400 moderate
M tall 31 delaware 43500 moderate
M tall 55 arkansas 62000 liberal
F short 50 arkansas 56400 moderate
F short 48 colorado 55900 moderate
M medium 22 delaware 34500 conservative
F short 59 delaware 66700 conservative
F short 34 arkansas 42800 liberal
M tall 64 arkansas 77200 liberal
F short 29 delaware 33500 liberal
M medium 34 colorado 43200 moderate
M medium 61 arkansas 75000 liberal
F short 64 delaware 71100 conservative
M short 29 arkansas 41300 conservative
F short 63 colorado 70600 conservative
M medium 29 colorado 40000 conservative
M tall 51 arkansas 62700 moderate
M tall 24 delaware 37700 conservative
F medium 48 colorado 57500 moderate
F short 18 arkansas 27400 conservative
F short 18 arkansas 20300 liberal
F short 33 colorado 38200 liberal
M medium 20 delaware 34800 conservative
F short 29 delaware 33000 liberal
M short 44 delaware 63000 conservative
M tall 65 delaware 81800 conservative
M tall 56 arkansas 63700 liberal
M medium 52 delaware 58400 moderate
M medium 29 colorado 48600 conservative
M tall 47 colorado 58900 moderate
F medium 68 arkansas 72600 liberal
F short 31 delaware 36000 moderate
F short 61 colorado 62500 liberal
F short 19 colorado 21500 liberal
F tall 38 delaware 43000 moderate
M tall 26 arkansas 42300 conservative
F short 61 colorado 67400 conservative
F short 40 arkansas 46500 moderate
M medium 49 arkansas 65200 moderate
F medium 56 arkansas 67500 conservative
M short 48 colorado 66000 moderate
F short 52 arkansas 56300 liberal
M tall 18 arkansas 29800 conservative
M tall 56 delaware 59300 liberal
M medium 52 colorado 64400 moderate
M medium 18 colorado 28600 moderate
M tall 58 arkansas 66200 liberal
M tall 39 colorado 55100 moderate
M tall 46 arkansas 62900 moderate
M medium 40 colorado 46200 moderate
M medium 60 arkansas 72700 liberal
F short 36 colorado 40700 liberal
F short 44 arkansas 52300 moderate
F short 28 arkansas 31300 liberal
F short 54 delaware 62600 conservative
M medium 51 arkansas 61200 moderate
M short 32 colorado 46100 moderate
F short 55 arkansas 62700 conservative
F short 25 delaware 26200 liberal
F medium 33 delaware 37300 liberal
M medium 29 colorado 46200 conservative
F short 65 arkansas 72700 conservative
M tall 43 colorado 51400 moderate
M short 54 colorado 64800 liberal
F short 61 colorado 72700 conservative
F short 52 colorado 63600 conservative
F short 30 colorado 33500 liberal
F short 29 arkansas 31400 liberal
M tall 47 delaware 59400 moderate
F short 39 colorado 47800 moderate
F short 47 delaware 52000 moderate
M medium 49 arkansas 58600 moderate
M tall 63 delaware 67400 liberal
M medium 30 arkansas 39200 conservative
M tall 61 delaware 69600 liberal
M medium 47 delaware 58700 moderate
F short 30 delaware 34500 liberal
M medium 51 delaware 58000 moderate
M medium 24 arkansas 38800 moderate
M short 49 arkansas 64500 moderate
F medium 66 delaware 74500 conservative
M tall 65 arkansas 76900 conservative
M short 46 colorado 58000 conservative
M tall 45 delaware 51800 moderate
M short 47 arkansas 63600 conservative
M tall 29 arkansas 44800 conservative
M tall 57 delaware 69300 liberal
M medium 20 arkansas 28700 liberal
M medium 35 arkansas 43400 moderate
M tall 61 delaware 67000 liberal
M short 31 delaware 37300 moderate
F short 18 arkansas 20800 liberal
F medium 26 delaware 29200 liberal
M medium 28 arkansas 36400 liberal
M tall 59 delaware 69400 liberal

Encoded and normalized data:


# people_encoded.txt
#
# sex (M = 0.0, F = 0.5)
# height (short = 0.25, medium = 0.50, tall = 0.75)
# age (min = 18, max = 68)
# State [Arkansas = (0.25 0 0 0), Colorado = (0 0.25 0 0),
#   Delaware (0 0 0.25 0), Illinois (0 0 0 0.25)]
# income (min = 20,300.00 max = 81,800.00)
# politics [(conservative = 0.3333 0 0), moderate (0 0.3333 0),
#   liberal (0 0 0.3333)]
# 
0.5, 0.25, 0.12, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.42, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.90, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.36, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.18, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.64, 0.00, 0.25, 0.00, 0.00, 0.5886, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.64, 0.00, 0.00, 0.00, 0.25, 0.5642, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.02, 0.00, 0.00, 0.25, 0.00, 0.2016, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.08, 0.00, 0.00, 0.00, 0.25, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.42, 0.00, 0.00, 0.25, 0.00, 0.4358, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.32, 0.25, 0.00, 0.00, 0.00, 0.3106, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.08, 0.00, 0.00, 0.00, 0.25, 0.2146, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.34, 0.00, 0.00, 0.25, 0.00, 0.2423, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.30, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.54, 0.00, 0.25, 0.00, 0.00, 0.5496, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.48, 0.00, 0.00, 0.00, 0.25, 0.4943, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.30, 0.00, 0.25, 0.00, 0.00, 0.4309, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.14, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.26, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.18, 0.25, 0.00, 0.00, 0.00, 0.1984, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.60, 0.00, 0.00, 0.00, 0.25, 0.5480, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.92, 0.00, 0.00, 0.00, 0.25, 0.8293, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.86, 0.00, 0.25, 0.00, 0.00, 0.8472, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.72, 0.00, 0.00, 0.00, 0.25, 0.6618, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.2602, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.64, 0.00, 0.00, 0.25, 0.00, 0.5642, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.74, 0.00, 0.00, 0.00, 0.25, 0.6862, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.44, 0.00, 0.00, 0.00, 0.25, 0.5220, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.08, 0.25, 0.00, 0.00, 0.00, 0.0537, 0.0000, 0.0000, 0.3333
0.5, 0.25, 1.00, 0.00, 0.25, 0.00, 0.00, 0.9447, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.84, 0.00, 0.00, 0.00, 0.25, 0.8358, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.32, 0.00, 0.00, 0.25, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.14, 0.00, 0.00, 0.25, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.26, 0.00, 0.00, 0.00, 0.25, 0.4650, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.50, 0.00, 0.00, 0.25, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.80, 0.00, 0.25, 0.00, 0.00, 0.7333, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.74, 0.00, 0.00, 0.00, 0.25, 0.6569, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.50, 0.00, 0.25, 0.00, 0.00, 0.5008, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.50, 0.00, 0.00, 0.25, 0.00, 0.5350, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.06, 0.25, 0.00, 0.00, 0.00, 0.2748, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.74, 0.00, 0.00, 0.25, 0.00, 0.7203, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.92, 0.00, 0.25, 0.00, 0.00, 0.8862, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.46, 0.00, 0.00, 0.00, 0.25, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.92, 0.00, 0.00, 0.25, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.76, 0.00, 0.00, 0.00, 0.25, 0.7528, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.26, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.94, 0.00, 0.00, 0.25, 0.00, 0.8098, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.74, 0.00, 0.00, 0.00, 0.25, 0.7154, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.14, 0.25, 0.00, 0.00, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.56, 0.00, 0.00, 0.25, 0.00, 0.4992, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.36, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.68, 0.00, 0.00, 0.00, 0.25, 0.6146, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.86, 0.00, 0.00, 0.25, 0.00, 0.7740, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.78, 0.00, 0.00, 0.25, 0.00, 0.7382, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.56, 0.00, 0.25, 0.00, 0.00, 0.5252, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.88, 0.25, 0.00, 0.00, 0.00, 0.7561, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.74, 0.00, 0.00, 0.00, 0.25, 0.6894, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.08, 0.00, 0.00, 0.25, 0.00, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.64, 0.00, 0.00, 0.00, 0.25, 0.6927, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.28, 0.00, 0.00, 0.00, 0.25, 0.3496, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.06, 0.00, 0.00, 0.25, 0.00, 0.2488, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.52, 0.00, 0.25, 0.00, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.56, 0.00, 0.00, 0.00, 0.25, 0.5106, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.88, 0.00, 0.25, 0.00, 0.00, 0.8033, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.78, 0.00, 0.00, 0.00, 0.25, 0.7496, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.98, 0.00, 0.00, 0.00, 0.25, 0.9024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.2276, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.70, 0.00, 0.00, 0.00, 0.25, 0.6472, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.52, 0.25, 0.00, 0.00, 0.00, 0.5610, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.56, 0.00, 0.25, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.04, 0.00, 0.00, 0.00, 0.25, 0.1593, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.40, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.64, 0.00, 0.25, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.30, 0.00, 0.25, 0.00, 0.00, 0.3610, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.30, 0.00, 0.25, 0.00, 0.00, 0.3089, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.16, 0.00, 0.25, 0.00, 0.00, 0.3268, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.80, 0.25, 0.00, 0.00, 0.00, 0.8195, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.50, 0.00, 0.00, 0.00, 0.25, 0.4504, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.56, 0.25, 0.00, 0.00, 0.00, 0.7171, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.84, 0.25, 0.00, 0.00, 0.00, 0.8358, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.48, 0.25, 0.00, 0.00, 0.00, 0.4650, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.76, 0.00, 0.00, 0.25, 0.00, 0.5870, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.88, 0.00, 0.25, 0.00, 0.00, 0.7480, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.64, 0.25, 0.00, 0.00, 0.00, 0.7236, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.58, 0.00, 0.00, 0.00, 0.25, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.98, 0.00, 0.25, 0.00, 0.00, 0.9772, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.44, 0.00, 0.00, 0.25, 0.00, 0.4894, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.48, 0.00, 0.25, 0.00, 0.00, 0.4569, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.92, 0.25, 0.00, 0.00, 0.00, 0.8407, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.58, 0.25, 0.00, 0.00, 0.00, 0.6244, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.54, 0.00, 0.25, 0.00, 0.00, 0.5285, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.14, 0.00, 0.00, 0.25, 0.00, 0.3350, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.40, 0.25, 0.00, 0.00, 0.00, 0.4569, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.74, 0.00, 0.00, 0.25, 0.00, 0.6455, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.52, 0.25, 0.00, 0.00, 0.00, 0.6553, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.30, 0.25, 0.00, 0.00, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.32, 0.00, 0.00, 0.25, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.18, 0.00, 0.25, 0.00, 0.00, 0.2179, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.28, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.48, 0.00, 0.00, 0.00, 0.25, 0.4341, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.12, 0.00, 0.00, 0.25, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.48, 0.00, 0.25, 0.00, 0.00, 0.4878, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.14, 0.00, 0.00, 0.25, 0.00, 0.1252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.66, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.74, 0.00, 0.25, 0.00, 0.00, 0.7024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.52, 0.25, 0.00, 0.00, 0.00, 0.4472, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.3171, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.98, 0.00, 0.25, 0.00, 0.00, 0.8341, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.54, 0.00, 0.00, 0.25, 0.00, 0.4829, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.60, 0.25, 0.00, 0.00, 0.00, 0.5772, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.14, 0.00, 0.25, 0.00, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.98, 0.25, 0.00, 0.00, 0.00, 0.9431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.38, 0.00, 0.00, 0.25, 0.00, 0.3528, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.28, 0.25, 0.00, 0.00, 0.00, 0.3642, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.60, 0.25, 0.00, 0.00, 0.00, 0.5967, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.96, 0.00, 0.00, 0.25, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.86, 0.25, 0.00, 0.00, 0.00, 0.8081, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.80, 0.00, 0.00, 0.25, 0.00, 0.7902, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.02, 0.25, 0.00, 0.00, 0.00, 0.0602, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.40, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.18, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.48, 0.25, 0.00, 0.00, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.84, 0.25, 0.00, 0.00, 0.00, 0.8293, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.18, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.22, 0.00, 0.25, 0.00, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.50, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.60, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.18, 0.00, 0.00, 0.25, 0.00, 0.1480, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.52, 0.25, 0.00, 0.00, 0.00, 0.5675, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.10, 0.00, 0.25, 0.00, 0.00, 0.0976, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.36, 0.00, 0.25, 0.00, 0.00, 0.5317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.92, 0.00, 0.00, 0.25, 0.00, 0.8488, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.22, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.30, 0.25, 0.00, 0.00, 0.00, 0.4715, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.96, 0.00, 0.25, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.06, 0.00, 0.00, 0.25, 0.00, 0.2276, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.18, 0.25, 0.00, 0.00, 0.00, 0.2016, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.1870, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.26, 0.25, 0.00, 0.00, 0.00, 0.4602, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.36, 0.00, 0.00, 0.25, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.62, 0.00, 0.25, 0.00, 0.00, 0.5756, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.20, 0.25, 0.00, 0.00, 0.00, 0.2943, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.50, 0.00, 0.00, 0.25, 0.00, 0.5902, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.56, 0.00, 0.25, 0.00, 0.00, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.78, 0.25, 0.00, 0.00, 0.00, 0.8049, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.68, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.26, 0.00, 0.00, 0.25, 0.00, 0.3772, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.74, 0.25, 0.00, 0.00, 0.00, 0.6780, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.64, 0.25, 0.00, 0.00, 0.00, 0.5870, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.60, 0.00, 0.25, 0.00, 0.00, 0.5789, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.08, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.82, 0.00, 0.00, 0.25, 0.00, 0.7545, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.32, 0.25, 0.00, 0.00, 0.00, 0.3659, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.92, 0.25, 0.00, 0.00, 0.00, 0.9252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.00, 0.00, 0.25, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.32, 0.00, 0.25, 0.00, 0.00, 0.3724, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.86, 0.25, 0.00, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.92, 0.00, 0.00, 0.25, 0.00, 0.8260, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.3415, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.90, 0.00, 0.25, 0.00, 0.00, 0.8179, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.22, 0.00, 0.25, 0.00, 0.00, 0.3203, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.66, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.12, 0.00, 0.00, 0.25, 0.00, 0.2829, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.60, 0.00, 0.25, 0.00, 0.00, 0.6049, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.1154, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.0000, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.30, 0.00, 0.25, 0.00, 0.00, 0.2911, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.04, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.22, 0.00, 0.00, 0.25, 0.00, 0.2065, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.52, 0.00, 0.00, 0.25, 0.00, 0.6943, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.94, 0.00, 0.00, 0.25, 0.00, 1.0000, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.76, 0.25, 0.00, 0.00, 0.00, 0.7057, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.68, 0.00, 0.00, 0.25, 0.00, 0.6195, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.22, 0.00, 0.25, 0.00, 0.00, 0.4602, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.58, 0.00, 0.25, 0.00, 0.00, 0.6276, 0.0000, 0.3333, 0.0000
0.5, 0.50, 1.00, 0.25, 0.00, 0.00, 0.00, 0.8504, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.26, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.86, 0.00, 0.25, 0.00, 0.00, 0.6862, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.02, 0.00, 0.25, 0.00, 0.00, 0.0195, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.40, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.16, 0.25, 0.00, 0.00, 0.00, 0.3577, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.86, 0.00, 0.25, 0.00, 0.00, 0.7659, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.44, 0.25, 0.00, 0.00, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.62, 0.25, 0.00, 0.00, 0.00, 0.7301, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.76, 0.25, 0.00, 0.00, 0.00, 0.7675, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.60, 0.00, 0.25, 0.00, 0.00, 0.7431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.68, 0.25, 0.00, 0.00, 0.00, 0.5854, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.00, 0.25, 0.00, 0.00, 0.00, 0.1545, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.76, 0.00, 0.00, 0.25, 0.00, 0.6341, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.68, 0.00, 0.25, 0.00, 0.00, 0.7171, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.00, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.80, 0.25, 0.00, 0.00, 0.00, 0.7463, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.42, 0.00, 0.25, 0.00, 0.00, 0.5659, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.56, 0.25, 0.00, 0.00, 0.00, 0.6927, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.44, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.84, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.36, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.52, 0.25, 0.00, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.20, 0.25, 0.00, 0.00, 0.00, 0.1789, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.72, 0.00, 0.00, 0.25, 0.00, 0.6878, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.66, 0.25, 0.00, 0.00, 0.00, 0.6650, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.28, 0.00, 0.25, 0.00, 0.00, 0.4195, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.74, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.14, 0.00, 0.00, 0.25, 0.00, 0.0959, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.30, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.22, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.94, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.50, 0.00, 0.25, 0.00, 0.00, 0.5057, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.72, 0.00, 0.25, 0.00, 0.00, 0.7236, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.86, 0.00, 0.25, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.68, 0.00, 0.25, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.24, 0.00, 0.25, 0.00, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.1805, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.58, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.42, 0.00, 0.25, 0.00, 0.00, 0.4472, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.58, 0.00, 0.00, 0.25, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.62, 0.25, 0.00, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.90, 0.00, 0.00, 0.25, 0.00, 0.7659, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.24, 0.25, 0.00, 0.00, 0.00, 0.3073, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.86, 0.00, 0.00, 0.25, 0.00, 0.8016, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.58, 0.00, 0.00, 0.25, 0.00, 0.6244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.24, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.66, 0.00, 0.00, 0.25, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.12, 0.25, 0.00, 0.00, 0.00, 0.3008, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.62, 0.25, 0.00, 0.00, 0.00, 0.7187, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.96, 0.00, 0.00, 0.25, 0.00, 0.8813, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.94, 0.25, 0.00, 0.00, 0.00, 0.9203, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.56, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.54, 0.00, 0.00, 0.25, 0.00, 0.5122, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.58, 0.25, 0.00, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.22, 0.25, 0.00, 0.00, 0.00, 0.3984, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.78, 0.00, 0.00, 0.25, 0.00, 0.7967, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.04, 0.25, 0.00, 0.00, 0.00, 0.1366, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.34, 0.25, 0.00, 0.00, 0.00, 0.3756, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.86, 0.00, 0.00, 0.25, 0.00, 0.7593, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.26, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.0081, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.16, 0.00, 0.00, 0.25, 0.00, 0.1447, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.20, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.82, 0.00, 0.00, 0.25, 0.00, 0.7984, 0.0000, 0.0000, 0.3333

Python k-means program gives identical results:

# kmeans_demo.py

import numpy as np
from sklearn.cluster import KMeans

fn = ".\\Data\\people_encoded.txt"
X = np.loadtxt(fn, usecols=[0,1,2,3,4,5,6,7,8,9,10],
  delimiter=',', comments="#", dtype=np.float64)
print("\nsource data:")
print(X)
km = KMeans(n_clusters=3, random_state=1, init='random')
km.fit(X)
print("\nclustering (first 12) = ")
print(km.labels_)
print("\nWCSS = %0.4f " % km.inertia_)
print("\ncounts: ")
print(np.sum(km.labels_ == 0))
print(np.sum(km.labels_ == 1))
print(np.sum(km.labels_ == 2))
print("\nmeans: ")
np.set_printoptions(precision=2)
print(km.cluster_centers_)

Output:

C:\VSM\ClusterMixedKMeans: python kmeans_scikit_demo.py

source data:
[[0.5  0.25 0.12 ... 0.   0.   0.33]
 [0.   0.75 0.42 ... 0.   0.33 0.  ]
 [0.5  0.25 0.9  ... 0.33 0.   0.  ]
 ...
 [0.5  0.5  0.16 ... 0.   0.   0.33]
 [0.   0.5  0.2  ... 0.   0.   0.33]
 [0.   0.75 0.82 ... 0.   0.   0.33]]

clustering (first 12) =
[0 1 2 0 0 2 2 0 0 1 0 0]

WCSS = 49.3195

counts:
89
77
74

means:
[[0.26 0.4  0.19 0.09 0.06 0.09 0.02 0.26 0.1  0.11 0.12]
 [0.   0.63 0.66 0.08 0.06 0.08 0.03 0.68 0.04 0.16 0.13]
 [0.5  0.32 0.69 0.07 0.08 0.05 0.05 0.65 0.16 0.15 0.02]]
Posted in Machine Learning | Leave a comment

Data Anomaly Detection For Mixed Data Using a Self-Organizing Map (SOM) From Scratch JavaScript

Several days ago, I put together a demo of data anomaly detection for mixed numeric and categorical data using a self-organizing map (SOM), from scratch, using the C# language. Then, a few days later, I refactored the C# version to Python. And then, for this blog post, I figured I’d refactor the system to raw JavaScript.

Refactoring a non-trivial system from one language to another always gives me new insights into the system, algorithms, and data structures, as well as features of the two programming languages involved.

A self-organizing map (SOM) is a data structure and associated algorithms that can be used to cluster data. Each cluster has a representative vector, somewhat similar to the way each cluster in k-means clustering has an associated mean/centroid. Data items that are assigned to a SOM cluster but are far (Euclidean distance) from the cluster representative vector are anomalous.

I made a 240-item set of synthetic data that looks like:

F  short   24  arkansas  29500  liberal
M  tall    39  delaware  51200  moderate
F  short   63  colorado  75800  conservative
M  medium  36  illinois  44500  moderate
F  short   27  colorado  28600  liberal
. . .

The fields are sex, height, age, State, income, political leaning.

Because SOM clustering uses Euclidean distance, the data must be normalized and encoded. I used min-max normalization on the age (min = 18, max = 68) and income (min = $20,300, max = $81,800) columns. I used one-over-n-hot encoding on the sex, State, and political leaning columns. I used equal-interval encoding for the height column, because it has a natural order.

The resulting normalized and encoded data looks like:

0.5, 0.25, 0.1200, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4200, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.9000, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.3600, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1800, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
. . .

I set up the demo SOM map as size 2-by-2 for a total of 4 map nodes. Creating a SOM map is an iterative process that requires a steps_max value (I used 1,000) and a lrn_rate_max value (I used 2.00). SOM maps are very sensitive to these values, and they must be determined by trial and error. I monitored the SOM map building every 200 iterations by computing the sum of Euclidean distances (SED) between map node vectors and data items assigned to the map node / cluster:

Computing SOM clustering
map build step 0     |  SED = 311.4767
map build step 200   |  SED = 229.7895
map build step 400   |  SED = 160.0903
map build step 600   |  SED = 122.9567
map build step 800   |  SED = 105.7636
Done

Each of the 4 map nodes is identified by a [row][col] pair of indices. The resulting four map node associated vectors are:

SOM map nodes:
[0][0] :  0.00  0.66  0.70  0.08  0.09  0.03  0.05  0.71  0.02  0.12  0.19
[0][1] :  0.50  0.33  0.22  0.08  0.07  0.09  0.01  0.21  0.02  0.08  0.23
[1][0] :  0.50  0.31  0.70  0.05  0.10  0.07  0.04  0.66  0.17  0.14  0.02
[1][1] :  0.00  0.53  0.19  0.03  0.06  0.11  0.05  0.32  0.16  0.18  0.00

It’s important to look at the SOM mapping to determine if the steps_max and lrn_rate_max parameter values are good. The 240 data items were assigned to map nodes according to this distribution:

SOM mapping:
[0][0] : 43 items
[0][1] : 49 items
[1][0] : 77 items
[1][1] : 71 items

These counts seem reasonable. My demo has a function to display the [r][c] cluster ID for each data item. The first four cluster assignments are:

clustering:
X[0] :  0  1
X[1] :  1  1
X[2] :  1  0
X[3] :  1  1
. . .

After the SOM map was constructed, I analyzed the data, looking for the data item assigned to each cluster/node that is farthest from the map node representative vector:

Analyzing

node [0][0] :
  most anomalous data idx =  229
  0.00  0.25  0.58  0.25  0.00  0.00  0.00  0.70  0.33  0.00  0.00
  M  short   47  arkansas  63600  conservative
  distance = 0.6081

node [0][1] :
  most anomalous data idx =  179
  0.50  0.75  0.40  0.00  0.00  0.25  0.00  0.37  0.00  0.33  0.00
  F  tall    38  delaware  43000  moderate
  distance = 0.6267

node [1][0] :
  most anomalous data idx =   99
  0.50  0.75  0.48  0.00  0.00  0.00  0.25  0.43  0.00  0.33  0.00
  F  tall    42  illinois  47000  moderate
  distance = 0.6505

node [1][1] :
  most anomalous data idx =  232
  0.00  0.50  0.04  0.25  0.00  0.00  0.00  0.14  0.00  0.00  0.33
  M  medium  20  arkansas  28700  liberal
  distance = 0.5363

I displayed the index of the anomalous data item, its normalized and encoded form, its raw form, and the distance from the item to its map node vector. In a non-demo scenario, these data items would be examined to determine if they are in fact anomalies, and if so, what might be the cause.

An interesting exploration!



Inanimate objects normally don’t go around trying to kill people. Here are three anomalies in movies. Left: In “Rubber” (2010), a tire in the desert becomes sentient and has psychokinetic powers. A clever, funny, strange, experimental horror film. Center: In “Amityville 4: The Evil Escapes” (1989), a haunted house has an evil lamp. A full on horror film that’s pretty scary. Right: “Killer Sofa” (2019) is a comedy-horror film made in New Zealand, where, I guess they call recliner chairs sofas. I thought this film about a chair that is possessed by an evil spirit was very nicely done — I especially liked scenes where the chair shuffles around the apartment in which it lives.


Demo code. Replace “lt” (less than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols.

// anomaly_som.js
// self-organizing map (SOM) anomaly detection

let FS = require('fs');  // to read data file

// ----------------------------------------------------------

class ClusterSOM
{
  constructor(X, mapRows, mapCols, seed)
  {
    this.mapRows = mapRows;
    this.mapCols = mapCols;
    this.data = X;  // by ref
    this.seed = seed + 0.5;  // avoid 0
    let dim = X[0].length;
    this.map = this.makeMap(mapRows, mapCols, dim);
    for (let i = 0; i "lt" mapRows; ++i) {
      for (let j = 0; j "lt" mapCols; ++j) {
        for (let k = 0; k "lt" dim; ++k) {
          this.map[i][j][k] = this.next();  // random
        }
      }
    }
    this.mapping = 
      this.makeMap(mapRows, mapCols, 1); // matrix of lists
  }

  // --------------------------------------------------------
  // methods: cluster(), getClustering(), analyze()
  // --------------------------------------------------------

  cluster(lrnRateMax, stepsMax) 
  {
    let n = this.data.length;
    let dim = this.data[0].length;
    let rangeMax = this.mapRows + this.mapCols;

    for (let step = 0; step "lt" stepsMax; ++step) {

      if (step % Math.trunc(stepsMax / 5) == 0) {
        process.stdout.write("map build step = ");
        process.stdout.write(step.toString().
          padStart(4, ' '));
        let sum = 0.0;
        for (let ix = 0; ix "lt" n; ++ix) {
          let RC = this.closestNode(ix);

          //console.log(RC[0].toString());
          //console.log(RC[1].toString());

          let item = this.data[ix];
          let node = this.map[RC[0]][RC[1]];
          let dist = this.eucDistance(item, node);
          sum += dist;
        }
        console.log("  |  SED = " + 
          sum.toFixed(4).toString().padStart(9, ' '));
      } // show progress

      let pctLeft = 1.0 - ((step * 1.0) / stepsMax);
      let currRange = (pctLeft * rangeMax);
      let currLrnRate = pctLeft * lrnRateMax;

      // pick a random index
      let idx = this.nextInt(0, n);
      let bmuRC = this.closestNode(idx);
      // move each map node
      for (let i = 0; i "lt" this.mapRows; ++i) {
        for (let j = 0; j "lt" this.mapCols; ++j) {
          if (this.manhattDist(bmuRC[0],
            bmuRC[1], i, j) "lte" currRange) {
            for (let d = 0; d "lt" dim; ++d) {
              this.map[i][j][d] = this.map[i][j][d] +
                currLrnRate * (this.data[idx][d] -
                  this.map[i][j][d]);
            } // d
          } // if
        } // j
      } // i

    } // step
    // map has been created

    // compute mapping
    for (let idx = 0; idx "lt" n; ++idx) {
      let rc = this.closestNode(idx);
      let r = rc[0]; let c = rc[1];
      this.mapping[r][c].push(idx);
    }

    // mapping has dummy 0.0 first values
    // remove them
    for (let i = 0; i "lt" this.mapRows; ++i) {
      for (let j = 0; j "lt" this.mapCols; ++j) {
        if (this.mapping[i][j].length "gte" 2)
          this.mapping[i][j].shift();  // remove first
      }
    }

    return;
  } // cluster()

  // --------------------------------------------------------

  getClustering()
  {
    // cluster (r,c) ID for every data item
    let n = this.data.length;
    let result = this.matMake(n, 2, 0.0);
    for (let i = 0; i "lt" this.mapRows; ++i) {
      for (let j = 0; j "lt" this.mapCols; ++j) {
        for (let k = 0; k "lt" this.mapping[i][j].length;
          ++k) {
          let idx = this.mapping[i][j][k];
          result[idx][0] = i;
          result[idx][1] = j;
        }
      }
    }
    return result;
  }

  // --------------------------------------------------------

  analyze(rawFileArray)
  {
    for (let i = 0; i "lt" this.mapRows; ++i) {
      for (let j = 0; j "lt" this.mapCols; ++j) {
        let nodeVec = this.map[i][j];
        let largeDist = 0.0;
        let anomIdx = 0;
        for (let k = 0; k "lt" this.mapping[i][j].length;
          ++k) {
          let idx = this.mapping[i][j][k];
          let item = this.data[idx];
          let dist = this.eucDistance(nodeVec, item);
          if (dist "gt" largeDist) {
            largeDist = dist;
            anomIdx = idx;
          }
        } // k

        console.log("\nnode [" + i.toString() + "][" +
          j.toString() + "] : ");
        console.log("  most anomalous data idx = " +
          anomIdx.toFixed(0).toString().padStart(4, " "));

        for (let jj = 0; jj "lt" this.data[anomIdx].length;
          ++jj) {
          //process.stdout.write(this.data[anomIdx][jj].
          //  toFixed(4).toString().padStart(8));
          process.stdout.write(this.data[anomIdx][jj].
            toFixed(2).toString().padStart(6));
        }
        console.log("");

        console.log("  " + rawFileArray[anomIdx]);
        console.log("  distance = " + 
          largeDist.toFixed(4).toString());
      } // j
    } // i
  }

  // --------------------------------------------------------
  // helper functions: makeMap(), matMake(), vecMake(),
  //  closestNode(), eucDistance(), manhattDist(), next(),
  //  nextInt()
  // --------------------------------------------------------

  makeMap(d1, d2, d3)
  {
    let result = [];
    for (let i = 0; i "lt" d1; ++i) {
      result[i] = [];
      for (let j = 0; j "lt" d2; ++j) {
        result[i][j] = [];
        for (let k = 0; k "lt" d3; ++k) {
          result[i][j][k] = 0.0;
        }
      }
    }
    return result;
  }

  // --------------------------------------------------------

  matMake(rows, cols, val)
  {
    let result = [];
    for (let i = 0; i "lt" rows; ++i) {
      result[i] = [];
      for (let j = 0; j "lt" cols; ++j) {
        result[i][j] = val;
      }
    }
    return result;
  }

  // --------------------------------------------------------

  vecMake(n, val)
  {
    let result = [];
    for (let i = 0; i "lt" n; ++i) {
      result[i] = val;
    }
    return result;
  }

  // --------------------------------------------------------

  closestNode(idx)
  {
    // r,c of map vec closest to data[idx]
    let smallDist = 100000000.0;
    let result = this.vecMake(2, 0);
    result[0] = -1;
    result[1] = -1;
    for (let i = 0; i "lt" this.map.length; ++i) {
      for (let j = 0; j "lt" this.map[0].length; ++j) {
        let dist = this.eucDistance(this.data[idx],
          this.map[i][j]);
        if (dist "lt" smallDist) {
          smallDist = dist;
          result[0] = i;
          result[1] = j;
        }
      }
    }
    return result;
  } // closestNode()

  // --------------------------------------------------------

  eucDistance(v1, v2)
  {
    let dim = v1.length;
    let sum = 0.0;
    for (let i = 0; i "lt" dim; ++i)
      sum += (v1[i] - v2[i]) * (v1[i] - v2[i]);
    return Math.sqrt(sum);    
  }

  // --------------------------------------------------------

  manhattDist(r1, c1, r2, c2)
  {
    return Math.abs(r1 - r2) + Math.abs(c1 - c2);
  }

  // --------------------------------------------------------

  next() // next double
  {
    let x = Math.sin(this.seed) * 1000;
    let result = x - Math.floor(x);  // [0.0,1.0)
    this.seed = result;  // for next call
    return result;
  }

  // --------------------------------------------------------

  nextInt(lo, hi)
  {
    let x = this.next();
    return Math.trunc((hi - lo) * x + lo);
  }

  // --------------------------------------------------------

} // class ClusterSOM


// ----------------------------------------------------------
// helpers for main(): loadTxt(), fileLoad(),
//  matShow(), vecShow()
// ----------------------------------------------------------

function loadTxt(fn, delimit, usecols, comment) 
{
  // efficient but mildly complicated
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");  // array of lines

  // count number non-comment lines
  let nRows = 0;
  for (let i = 0; i "lt" lines.length; ++i) {
    if (!lines[i].startsWith(comment))
      ++nRows;
  }
  nCols = usecols.length;
  let result = [];
  for (let i = 0; i "lt" nRows; ++i) {
    result[i] = [];
    for (let j = 0; j "lt" nCols; ++j) {
      result[i][j] = 0.0;
    }
  }
  
  let r = 0;  // into lines
  let i = 0;  // into result[][]
  while (r "lt" lines.length) {
    if (lines[r].startsWith(comment)) {
      ++r;  // next row
    }
    else {
      let tokens = lines[r].split(delimit);
      for (let j = 0; j "lt" nCols; ++j) {
        result[i][j] = parseFloat(tokens[usecols[j]]);
      }
      ++r;
      ++i;
    }
  }

  return result;
} // loadTxt()

// ----------------------------------------------------------

function fileLoad(fn, comment) 
{
  // efficient but mildly complicated
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");  // array of lines

  let result = [];
  for (let i = 0; i "lt" lines.length; ++i) {
    if (!lines[i].startsWith(comment)) {
      result.push(lines[i].trim());
    }
  }

  return result;
} // fileLoad()

// ----------------------------------------------------------

function matShow(m, dec, wid, showIndices)
{
  let rows = m.length;
  let cols = m[0].length;
  for (let i = 0; i "lt" rows; ++i) {
    if (showIndices == true)
      process.stdout.write("[" + i.toString().
        padStart(3, ' ') + "]");
    for (let j = 0; j "lt" cols; ++j) {
      let v = m[i][j];
      if (Math.abs(v) "lt" 0.000001) v = 0.0  // avoid -0
      let vv = v.toFixed(dec);
      let s = vv.toString().padStart(wid, ' ');
      process.stdout.write(s);
      process.stdout.write("  ");
    }
    process.stdout.write("\n");
  }
}

// ----------------------------------------------------------

function vecShow(vec, dec, wid)
{
  for (let i = 0; i "lt" vec.length; ++i) {
    let x = vec[i].toFixed(dec);
    let s = x.toString().padStart(wid, ' ');
    process.stdout.write(s);
    process.stdout.write(" ");
  }
  process.stdout.write("\n");
}

// ----------------------------------------------------------

function main()
{
  console.log("\nBegin self-organizing" +
    " map (SOM) anomaly analysis using JavaScript ");

  // 1. load data
  console.log("\nLoading 240-item People dataset ");
  let rf = ".\\Data\\people_raw.txt";
  let rawFileArray = fileLoad(rf, "#");
  // for (let i = 0; i "lt" rawFileArray.length; ++i) {
  //   console.log(rawFileArray[i]);
  // }

  let fn = ".\\Data\\people_240.txt";
  let X = loadTxt(fn, ",", [0,1,2,3,4,5,6,7,8,9,10], "#");
  // matShow(X, 1, 8, true);
  console.log("\nFirst three normalized and encoded: ");
  for (let i = 0; i "lt" 3; ++i) {
    vecShow(X[i], 4, 8);
  }

  // 2. create ClusterSOM object and cluster
  let mapRows = 2;
  let mapCols = 2;
  let lrnRateMax = 2.00;
  let stepsMax = 1000;
  console.log("\nSetting mapRows = " + 
    mapRows.toString());
  console.log("Setting mapCols = " + 
    mapCols.toString());
  console.log("Setting  lrnRateMax = " + 
    lrnRateMax.toFixed(3).toString());
  console.log("Setting stepsMax = " + 
    stepsMax.toString());

  console.log("\nComputing SOM clustering ");
  let seed = 1;
  som = new ClusterSOM(X, mapRows, mapCols, seed);
  som.cluster(lrnRateMax, stepsMax);
  console.log("Done ");

  // 3. show the SOM map and mapping
  console.log("\nSOM map nodes: ");
  for (let i = 0; i "lt" mapRows; ++i) {
    for (let j = 0; j "lt" mapCols; ++j) {
      process.stdout.write("[" + i.toString() +
        "][" + j.toString() + "] : ");
      //vecShow(som.map[i][j], 4, 7);
      vecShow(som.map[i][j], 2, 5);
    }
  }

  console.log("\nSOM mapping: ");
  for (let i = 0; i "lt" mapRows; ++i) {
    for (let j = 0; j "lt" mapCols; ++j) {
      // show count
      process.stdout.write("[" + i.toString() + "][" +
        j.toString() + "] : ");
      console.log(som.mapping[i][j].length.toString() +
        " items ");
    }
  }

  // 4. show clustering result
  console.log("\nclustering: ");
  let clustering = som.getClustering();
  for (let i = 0; i "lt" 4; ++i) { // first four
    process.stdout.write("X[" + i.toString() + "] : ");
    vecShow(clustering[i], 0, 2);
  }

  // 5. anomaly analysis
  console.log("\nAnalyzing ");
  som.analyze(rawFileArray);
 
  console.log("\nEnd SOM anomaly demo");
}

main()

Raw data:

# people_raw.txt
#
F  short   24  arkansas  29500  liberal
M  tall    39  delaware  51200  moderate
F  short   63  colorado  75800  conservative
M  medium  36  illinois  44500  moderate
F  short   27  colorado  28600  liberal
F  short   50  colorado  56500  moderate
F  medium  50  illinois  55000  moderate
M  tall    19  delaware  32700  conservative
F  short   22  illinois  27700  moderate
M  tall    39  delaware  47100  liberal
F  short   34  arkansas  39400  moderate
M  medium  22  illinois  33500  conservative
F  medium  35  delaware  35200  liberal
M  tall    33  colorado  46400  moderate
F  short   45  colorado  54100  moderate
F  short   42  illinois  50700  moderate
M  tall    33  colorado  46800  moderate
F  tall    25  delaware  30000  moderate
M  medium  31  colorado  46400  conservative
F  short   27  arkansas  32500  liberal
F  short   48  illinois  54000  moderate
M  tall    64  illinois  71300  liberal
F  medium  61  colorado  72400  conservative
F  short   54  illinois  61000  conservative
F  short   29  arkansas  36300  conservative
F  short   50  delaware  55000  moderate
F  medium  55  illinois  62500  conservative
F  medium  40  illinois  52400  conservative
F  short   22  arkansas  23600  liberal
F  short   68  colorado  78400  conservative
M  tall    60  illinois  71700  liberal
M  tall    34  delaware  46500  moderate
M  medium  25  delaware  37100  conservative
M  short   31  illinois  48900  moderate
F  short   43  delaware  48000  moderate
F  short   58  colorado  65400  liberal
M  tall    55  illinois  60700  liberal
M  tall    43  colorado  51100  moderate
M  tall    43  delaware  53200  moderate
M  medium  21  arkansas  37200  conservative
F  short   55  delaware  64600  conservative
F  short   64  colorado  74800  conservative
M  tall    41  illinois  58800  moderate
F  medium  64  delaware  72700  conservative
M  medium  56  illinois  66600  liberal
F  short   31  delaware  36000  moderate
M  tall    65  delaware  70100  liberal
F  tall    55  illinois  64300  conservative
M  short   25  arkansas  40300  conservative
F  short   46  delaware  51000  moderate
M  tall    36  illinois  53500  conservative
F  short   52  illinois  58100  moderate
F  short   61  delaware  67900  conservative
F  short   57  delaware  65700  conservative
M  tall    46  colorado  52600  moderate
M  tall    62  arkansas  66800  liberal
F  short   55  illinois  62700  conservative
M  medium  22  delaware  27700  moderate
M  tall    50  illinois  62900  conservative
M  tall    32  illinois  41800  moderate
M  short   21  delaware  35600  conservative
F  medium  44  colorado  52000  moderate
F  short   46  illinois  51700  moderate
F  short   62  colorado  69700  conservative
F  short   57  illinois  66400  conservative
M  medium  67  illinois  75800  liberal
F  short   29  arkansas  34300  liberal
F  short   53  illinois  60100  conservative
M  tall    44  arkansas  54800  moderate
F  medium  46  colorado  52300  moderate
M  tall    20  illinois  30100  moderate
M  medium  38  illinois  53500  moderate
F  short   50  colorado  58600  moderate
F  short   33  colorado  42500  moderate
M  tall    33  colorado  39300  moderate
F  short   26  colorado  40400  conservative
F  short   58  arkansas  70700  conservative
F  tall    43  illinois  48000  moderate
M  medium  46  arkansas  64400  conservative
F  short   60  arkansas  71700  conservative
M  tall    42  arkansas  48900  moderate
M  tall    56  delaware  56400  liberal
M  short   62  colorado  66300  liberal
M  short   50  arkansas  64800  moderate
F  short   47  illinois  52000  moderate
M  tall    67  colorado  80400  liberal
M  tall    40  delaware  50400  moderate
F  short   42  colorado  48400  moderate
F  short   64  arkansas  72000  conservative
M  medium  47  arkansas  58700  liberal
F  medium  45  colorado  52800  moderate
M  tall    25  delaware  40900  conservative
F  short   38  arkansas  48400  conservative
F  short   55  delaware  60000  moderate
M  tall    44  arkansas  60600  moderate
F  medium  33  arkansas  41000  moderate
F  short   34  delaware  39000  moderate
F  short   27  colorado  33700  liberal
F  short   32  colorado  40700  moderate
F  tall    42  illinois  47000  moderate
M  short   24  delaware  40300  conservative
F  short   42  colorado  50300  moderate
F  short   25  delaware  28000  liberal
F  short   51  colorado  58000  moderate
M  medium  55  colorado  63500  liberal
F  short   44  arkansas  47800  liberal
M  short   18  arkansas  39800  conservative
M  tall    67  colorado  71600  liberal
F  short   45  delaware  50000  moderate
F  short   48  arkansas  55800  moderate
M  short   25  colorado  39000  moderate
M  tall    67  arkansas  78300  moderate
F  short   37  delaware  42000  moderate
M  short   32  arkansas  42700  moderate
F  short   48  arkansas  57000  moderate
M  tall    66  delaware  75000  liberal
F  tall    61  arkansas  70000  conservative
M  medium  58  delaware  68900  moderate
F  short   19  arkansas  24000  liberal
F  short   38  delaware  43000  moderate
M  medium  27  arkansas  36400  moderate
F  short   42  arkansas  48000  moderate
F  short   60  arkansas  71300  conservative
M  tall    27  delaware  34800  conservative
F  tall    29  colorado  37100  conservative
M  medium  43  arkansas  56700  moderate
F  medium  48  arkansas  56700  moderate
F  medium  27  delaware  29400  liberal
M  tall    44  arkansas  55200  conservative
F  short   23  colorado  26300  liberal
M  tall    36  colorado  53000  liberal
F  short   64  delaware  72500  conservative
F  short   29  delaware  30000  liberal
M  short   33  arkansas  49300  moderate
M  tall    66  colorado  75000  liberal
M  medium  21  delaware  34300  conservative
F  short   27  arkansas  32700  liberal
F  short   29  arkansas  31800  liberal
M  tall    31  arkansas  48600  moderate
F  short   36  delaware  41000  moderate
F  short   49  colorado  55700  moderate
M  short   28  arkansas  38400  conservative
M  medium  43  delaware  56600  moderate
M  medium  46  colorado  58800  moderate
F  short   57  arkansas  69800  conservative
M  short   52  delaware  59400  moderate
M  tall    31  delaware  43500  moderate
M  tall    55  arkansas  62000  liberal
F  short   50  arkansas  56400  moderate
F  short   48  colorado  55900  moderate
M  medium  22  delaware  34500  conservative
F  short   59  delaware  66700  conservative
F  short   34  arkansas  42800  liberal
M  tall    64  arkansas  77200  liberal
F  short   29  delaware  33500  liberal
M  medium  34  colorado  43200  moderate
M  medium  61  arkansas  75000  liberal
F  short   64  delaware  71100  conservative
M  short   29  arkansas  41300  conservative
F  short   63  colorado  70600  conservative
M  medium  29  colorado  40000  conservative
M  tall    51  arkansas  62700  moderate
M  tall    24  delaware  37700  conservative
F  medium  48  colorado  57500  moderate
F  short   18  arkansas  27400  conservative
F  short   18  arkansas  20300  liberal
F  short   33  colorado  38200  liberal
M  medium  20  delaware  34800  conservative
F  short   29  delaware  33000  liberal
M  short   44  delaware  63000  conservative
M  tall    65  delaware  81800  conservative
M  tall    56  arkansas  63700  liberal
M  medium  52  delaware  58400  moderate
M  medium  29  colorado  48600  conservative
M  tall    47  colorado  58900  moderate
F  medium  68  arkansas  72600  liberal
F  short   31  delaware  36000  moderate
F  short   61  colorado  62500  liberal
F  short   19  colorado  21500  liberal
F  tall    38  delaware  43000  moderate
M  tall    26  arkansas  42300  conservative
F  short   61  colorado  67400  conservative
F  short   40  arkansas  46500  moderate
M  medium  49  arkansas  65200  moderate
F  medium  56  arkansas  67500  conservative
M  short   48  colorado  66000  moderate
F  short   52  arkansas  56300  liberal
M  tall    18  arkansas  29800  conservative
M  tall    56  delaware  59300  liberal
M  medium  52  colorado  64400  moderate
M  medium  18  colorado  28600  moderate
M  tall    58  arkansas  66200  liberal
M  tall    39  colorado  55100  moderate
M  tall    46  arkansas  62900  moderate
M  medium  40  colorado  46200  moderate
M  medium  60  arkansas  72700  liberal
F  short   36  colorado  40700  liberal
F  short   44  arkansas  52300  moderate
F  short   28  arkansas  31300  liberal
F  short   54  delaware  62600  conservative
M  medium  51  arkansas  61200  moderate
M  short   32  colorado  46100  moderate
F  short   55  arkansas  62700  conservative
F  short   25  delaware  26200  liberal
F  medium  33  delaware  37300  liberal
M  medium  29  colorado  46200  conservative
F  short   65  arkansas  72700  conservative
M  tall    43  colorado  51400  moderate
M  short   54  colorado  64800  liberal
F  short   61  colorado  72700  conservative
F  short   52  colorado  63600  conservative
F  short   30  colorado  33500  liberal
F  short   29  arkansas  31400  liberal
M  tall    47  delaware  59400  moderate
F  short   39  colorado  47800  moderate
F  short   47  delaware  52000  moderate
M  medium  49  arkansas  58600  moderate
M  tall    63  delaware  67400  liberal
M  medium  30  arkansas  39200  conservative
M  tall    61  delaware  69600  liberal
M  medium  47  delaware  58700  moderate
F  short   30  delaware  34500  liberal
M  medium  51  delaware  58000  moderate
M  medium  24  arkansas  38800  moderate
M  short   49  arkansas  64500  moderate
F  medium  66  delaware  74500  conservative
M  tall    65  arkansas  76900  conservative
M  short   46  colorado  58000  conservative
M  tall    45  delaware  51800  moderate
M  short   47  arkansas  63600  conservative
M  tall    29  arkansas  44800  conservative
M  tall    57  delaware  69300  liberal
M  medium  20  arkansas  28700  liberal
M  medium  35  arkansas  43400  moderate
M  tall    61  delaware  67000  liberal
M  short   31  delaware  37300  moderate
F  short   18  arkansas  20800  liberal
F  medium  26  delaware  29200  liberal
M  medium  28  arkansas  36400  liberal
M  tall    59  delaware  69400  liberal

Normalized and encoded data:

# people_240.txt
#
# sex (M = 0.0, F = 0.5)
# height (short, medium, tall)
# age (min = 18, max = 68)
# State (Arkansas, Colorado, Delaware, Illinois)
# income (min = $20,300, max = $81,800)
# political leaning (conservative, moderate, liberal)
#
0.5, 0.25, 0.1200, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4200, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.9000, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.3600, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1800, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6400, 0.00, 0.25, 0.00, 0.00, 0.5886, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.6400, 0.00, 0.00, 0.00, 0.25, 0.5642, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.0200, 0.00, 0.00, 0.25, 0.00, 0.2016, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.0800, 0.00, 0.00, 0.00, 0.25, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.4200, 0.00, 0.00, 0.25, 0.00, 0.4358, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.3200, 0.25, 0.00, 0.00, 0.00, 0.3106, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0800, 0.00, 0.00, 0.00, 0.25, 0.2146, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.3400, 0.00, 0.00, 0.25, 0.00, 0.2423, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5400, 0.00, 0.25, 0.00, 0.00, 0.5496, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4800, 0.00, 0.00, 0.00, 0.25, 0.4943, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.4309, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.2600, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1800, 0.25, 0.00, 0.00, 0.00, 0.1984, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6000, 0.00, 0.00, 0.00, 0.25, 0.5480, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9200, 0.00, 0.00, 0.00, 0.25, 0.8293, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.8472, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7200, 0.00, 0.00, 0.00, 0.25, 0.6618, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.2602, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.6400, 0.00, 0.00, 0.25, 0.00, 0.5642, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.6862, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.4400, 0.00, 0.00, 0.00, 0.25, 0.5220, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.0800, 0.25, 0.00, 0.00, 0.00, 0.0537, 0.0000, 0.0000, 0.3333
0.5, 0.25, 1.0000, 0.00, 0.25, 0.00, 0.00, 0.9447, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.8400, 0.00, 0.00, 0.00, 0.25, 0.8358, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.3200, 0.00, 0.00, 0.25, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.2600, 0.00, 0.00, 0.00, 0.25, 0.4650, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5000, 0.00, 0.00, 0.25, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8000, 0.00, 0.25, 0.00, 0.00, 0.7333, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.6569, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.5000, 0.00, 0.25, 0.00, 0.00, 0.5008, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.5000, 0.00, 0.00, 0.25, 0.00, 0.5350, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0600, 0.25, 0.00, 0.00, 0.00, 0.2748, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7400, 0.00, 0.00, 0.25, 0.00, 0.7203, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.9200, 0.00, 0.25, 0.00, 0.00, 0.8862, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.4600, 0.00, 0.00, 0.00, 0.25, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.9200, 0.00, 0.00, 0.25, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.7600, 0.00, 0.00, 0.00, 0.25, 0.7528, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9400, 0.00, 0.00, 0.25, 0.00, 0.8098, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.7154, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.1400, 0.25, 0.00, 0.00, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.5600, 0.00, 0.00, 0.25, 0.00, 0.4992, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.3600, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.6800, 0.00, 0.00, 0.00, 0.25, 0.6146, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8600, 0.00, 0.00, 0.25, 0.00, 0.7740, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7800, 0.00, 0.00, 0.25, 0.00, 0.7382, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.5252, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.8800, 0.25, 0.00, 0.00, 0.00, 0.7561, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.6894, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.0800, 0.00, 0.00, 0.25, 0.00, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.6400, 0.00, 0.00, 0.00, 0.25, 0.6927, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.2800, 0.00, 0.00, 0.00, 0.25, 0.3496, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.0600, 0.00, 0.00, 0.25, 0.00, 0.2488, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.5200, 0.00, 0.25, 0.00, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5600, 0.00, 0.00, 0.00, 0.25, 0.5106, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8800, 0.00, 0.25, 0.00, 0.00, 0.8033, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7800, 0.00, 0.00, 0.00, 0.25, 0.7496, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.9800, 0.00, 0.00, 0.00, 0.25, 0.9024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.2276, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.7000, 0.00, 0.00, 0.00, 0.25, 0.6472, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.5610, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.0400, 0.00, 0.00, 0.00, 0.25, 0.1593, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.4000, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6400, 0.00, 0.25, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.3610, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.3089, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1600, 0.00, 0.25, 0.00, 0.00, 0.3268, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8000, 0.25, 0.00, 0.00, 0.00, 0.8195, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.5000, 0.00, 0.00, 0.00, 0.25, 0.4504, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.5600, 0.25, 0.00, 0.00, 0.00, 0.7171, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8400, 0.25, 0.00, 0.00, 0.00, 0.8358, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.4800, 0.25, 0.00, 0.00, 0.00, 0.4650, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.7600, 0.00, 0.00, 0.25, 0.00, 0.5870, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.8800, 0.00, 0.25, 0.00, 0.00, 0.7480, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.6400, 0.25, 0.00, 0.00, 0.00, 0.7236, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5800, 0.00, 0.00, 0.00, 0.25, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9800, 0.00, 0.25, 0.00, 0.00, 0.9772, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4400, 0.00, 0.00, 0.25, 0.00, 0.4894, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4800, 0.00, 0.25, 0.00, 0.00, 0.4569, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.9200, 0.25, 0.00, 0.00, 0.00, 0.8407, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.5800, 0.25, 0.00, 0.00, 0.00, 0.6244, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.5400, 0.00, 0.25, 0.00, 0.00, 0.5285, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.3350, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.4000, 0.25, 0.00, 0.00, 0.00, 0.4569, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7400, 0.00, 0.00, 0.25, 0.00, 0.6455, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.6553, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.3000, 0.25, 0.00, 0.00, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3200, 0.00, 0.00, 0.25, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1800, 0.00, 0.25, 0.00, 0.00, 0.2179, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2800, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.4800, 0.00, 0.00, 0.00, 0.25, 0.4341, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.1200, 0.00, 0.00, 0.25, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.4800, 0.00, 0.25, 0.00, 0.00, 0.4878, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.1252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6600, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.7400, 0.00, 0.25, 0.00, 0.00, 0.7024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.4472, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.3171, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.9800, 0.00, 0.25, 0.00, 0.00, 0.8341, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.5400, 0.00, 0.00, 0.25, 0.00, 0.4829, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6000, 0.25, 0.00, 0.00, 0.00, 0.5772, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.1400, 0.00, 0.25, 0.00, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9800, 0.25, 0.00, 0.00, 0.00, 0.9431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3800, 0.00, 0.00, 0.25, 0.00, 0.3528, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.2800, 0.25, 0.00, 0.00, 0.00, 0.3642, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6000, 0.25, 0.00, 0.00, 0.00, 0.5967, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9600, 0.00, 0.00, 0.25, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.8600, 0.25, 0.00, 0.00, 0.00, 0.8081, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.8000, 0.00, 0.00, 0.25, 0.00, 0.7902, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.0200, 0.25, 0.00, 0.00, 0.00, 0.0602, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.4000, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.1800, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4800, 0.25, 0.00, 0.00, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8400, 0.25, 0.00, 0.00, 0.00, 0.8293, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.1800, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.5000, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.6000, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.1800, 0.00, 0.00, 0.25, 0.00, 0.1480, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.5675, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1000, 0.00, 0.25, 0.00, 0.00, 0.0976, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.3600, 0.00, 0.25, 0.00, 0.00, 0.5317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.9200, 0.00, 0.00, 0.25, 0.00, 0.8488, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2200, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.3000, 0.25, 0.00, 0.00, 0.00, 0.4715, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9600, 0.00, 0.25, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.0600, 0.00, 0.00, 0.25, 0.00, 0.2276, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1800, 0.25, 0.00, 0.00, 0.00, 0.2016, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.1870, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.2600, 0.25, 0.00, 0.00, 0.00, 0.4602, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3600, 0.00, 0.00, 0.25, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6200, 0.00, 0.25, 0.00, 0.00, 0.5756, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.2000, 0.25, 0.00, 0.00, 0.00, 0.2943, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.5000, 0.00, 0.00, 0.25, 0.00, 0.5902, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.7800, 0.25, 0.00, 0.00, 0.00, 0.8049, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.6800, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.3772, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.7400, 0.25, 0.00, 0.00, 0.00, 0.6780, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6400, 0.25, 0.00, 0.00, 0.00, 0.5870, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6000, 0.00, 0.25, 0.00, 0.00, 0.5789, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0800, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8200, 0.00, 0.00, 0.25, 0.00, 0.7545, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.3200, 0.25, 0.00, 0.00, 0.00, 0.3659, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.9200, 0.25, 0.00, 0.00, 0.00, 0.9252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.00, 0.00, 0.25, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.3200, 0.00, 0.25, 0.00, 0.00, 0.3724, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.8600, 0.25, 0.00, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.9200, 0.00, 0.00, 0.25, 0.00, 0.8260, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.3415, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.9000, 0.00, 0.25, 0.00, 0.00, 0.8179, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.3203, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.6600, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.1200, 0.00, 0.00, 0.25, 0.00, 0.2829, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.6000, 0.00, 0.25, 0.00, 0.00, 0.6049, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.1154, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.0000, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.2911, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.0400, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2200, 0.00, 0.00, 0.25, 0.00, 0.2065, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.5200, 0.00, 0.00, 0.25, 0.00, 0.6943, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.9400, 0.00, 0.00, 0.25, 0.00, 1.0000, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.7600, 0.25, 0.00, 0.00, 0.00, 0.7057, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.6800, 0.00, 0.00, 0.25, 0.00, 0.6195, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.4602, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5800, 0.00, 0.25, 0.00, 0.00, 0.6276, 0.0000, 0.3333, 0.0000
0.5, 0.50, 1.0000, 0.25, 0.00, 0.00, 0.00, 0.8504, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.6862, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.0200, 0.00, 0.25, 0.00, 0.00, 0.0195, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.4000, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.1600, 0.25, 0.00, 0.00, 0.00, 0.3577, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.7659, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.4400, 0.25, 0.00, 0.00, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.6200, 0.25, 0.00, 0.00, 0.00, 0.7301, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.7600, 0.25, 0.00, 0.00, 0.00, 0.7675, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.6000, 0.00, 0.25, 0.00, 0.00, 0.7431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6800, 0.25, 0.00, 0.00, 0.00, 0.5854, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.1545, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.7600, 0.00, 0.00, 0.25, 0.00, 0.6341, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.6800, 0.00, 0.25, 0.00, 0.00, 0.7171, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0000, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.8000, 0.25, 0.00, 0.00, 0.00, 0.7463, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4200, 0.00, 0.25, 0.00, 0.00, 0.5659, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.5600, 0.25, 0.00, 0.00, 0.00, 0.6927, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.4400, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.8400, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.3600, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.2000, 0.25, 0.00, 0.00, 0.00, 0.1789, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.7200, 0.00, 0.00, 0.25, 0.00, 0.6878, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.6600, 0.25, 0.00, 0.00, 0.00, 0.6650, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.2800, 0.00, 0.25, 0.00, 0.00, 0.4195, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.7400, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.0959, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.3000, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.9400, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5000, 0.00, 0.25, 0.00, 0.00, 0.5057, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.7200, 0.00, 0.25, 0.00, 0.00, 0.7236, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.6800, 0.00, 0.25, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2400, 0.00, 0.25, 0.00, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.1805, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.5800, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4200, 0.00, 0.25, 0.00, 0.00, 0.4472, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5800, 0.00, 0.00, 0.25, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.6200, 0.25, 0.00, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9000, 0.00, 0.00, 0.25, 0.00, 0.7659, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.2400, 0.25, 0.00, 0.00, 0.00, 0.3073, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.8600, 0.00, 0.00, 0.25, 0.00, 0.8016, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.5800, 0.00, 0.00, 0.25, 0.00, 0.6244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.2400, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.6600, 0.00, 0.00, 0.25, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.1200, 0.25, 0.00, 0.00, 0.00, 0.3008, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.6200, 0.25, 0.00, 0.00, 0.00, 0.7187, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.9600, 0.00, 0.00, 0.25, 0.00, 0.8813, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.9400, 0.25, 0.00, 0.00, 0.00, 0.9203, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5400, 0.00, 0.00, 0.25, 0.00, 0.5122, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.5800, 0.25, 0.00, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.3984, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.7800, 0.00, 0.00, 0.25, 0.00, 0.7967, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.0400, 0.25, 0.00, 0.00, 0.00, 0.1366, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.3400, 0.25, 0.00, 0.00, 0.00, 0.3756, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.8600, 0.00, 0.00, 0.25, 0.00, 0.7593, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.0081, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.1600, 0.00, 0.00, 0.25, 0.00, 0.1447, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.2000, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.8200, 0.00, 0.00, 0.25, 0.00, 0.7984, 0.0000, 0.0000, 0.3333
Posted in JavaScript, Machine Learning | Leave a comment

Multi-Class Classification Example Using LightGBM (Light Gradient Boosting Machine)

Early one Sunday morning, while I was waiting for the dog path to dry off from the evening rain so that I could walk my mutts, I figured I’d take a look at multi-class classification using the LightGBM (light gradient bosting machine) system. LightGBM is a sophisticated tree-based system that can perform classification, regression, and ranking.

There are several interfaces to LightGBM. I like the easy-to-use Python scikit-learn API. LightGBM isn’t installed by default with the Anaconda Python distribution I use, so I installed it with the command “pip install lightgbm”.

For my demo, I used one of my standard synthetic datasets. The goal is to predict political leaning from sex, age, State, and income. The 240-item tab-delimited raw data looks like:

F   24   michigan   29500.00   liberal
M   39   oklahoma   51200.00   moderate
F   63   nebraska   75800.00   conservative
M   36   michigan   44500.00   moderate
F   27   nebraska   28600.00   liberal
. . .

For LightGBM, it’s best to use ordinal encoding for categorical predictor variables. I encoded the sex variable as M = 0 and F = 1. I encoded State as Michigan = 0, Nebraska = 1, Oklahoma = 2. I encoded politics as conservative = 0, moderate = 1, liberal = 2.

Because LightGBM is tree-based, it’s not necessary to normalize numeric data. If you do normalize numeric data, the LGBM classification results will almost always be the same as those for the non-normalized data.

I split the encoded data into a 200-item set of training data and a 40-item set of test data. The resulting comma-delimited encoded data looks like:

1, 24, 0, 29500.00, 2
0, 39, 2, 51200.00, 1
1, 63, 1, 75800.00, 0
0, 36, 0, 44500.00, 1
1, 27, 1, 28600.00, 2
. . .

The key statements of my demo program are:

import numpy as np
import lightgbm as lgbm  # scikit API

train_ = np.loadtxt(train_file, usecols=[0,1,2,3],
  delimiter=",", comments="#", dtype=np.float64)
train_y = np.loadtxt(train_file, usecols=4,
  delimiter=",", comments="#", dtype=np.int64)

params = {
  # 'objective': 'multiclass',  # not needed
  'boosting_type': 'gbdt',  # default
  'num_leaves': 31,  # default
  'max_depth':-1,  # default (unlimited) 
  'n_estimators': 50,  # default = 100
  'learning_rate': 0.05,  # default = 0.10
  'min_data_in_leaf': 5,  # default = 20
  'random_state': 0,
  'verbosity': -1  # only fatal. default = 1 error, warn
}
model = lgbm.LGBMClassifier(**params) 
model.fit(train_x, train_y)

The main challenge when using LightGBM is wading through the dozens of parameters. The LGBMClassifier class/object has 19 parameters (num_leaves, max_depth, etc.) and there are 57 Learning Control Parameters (min_data_in_leaf, bagging_fraction, etc.), for a total of 76 parameters to deal with. Here are the 19 model parameters:

boosting_type='gbdt', 
num_leaves=31,
max_depth=-1,
learning_rate=0.1,
n_estimators=100,
subsample_for_bin=200000,
objective=None,
class_weight=None,
min_split_gain=0.0,
min_child_weight=0.001,
min_child_samples=20,
subsample=1.0,
subsample_freq=0,
colsample_bytree=1.0,
reg_alpha=0.0,
reg_lambda=0.0,
random_state=None,
n_jobs=None,
importance_type='split',
**kwargs

Because the number of parameters is not manageable, you must rely on the default values and then try to find the handful of parameters that will create a good model. For my demo, I changed the n_estimators (number of trees) from the default 100 to 50, the learning rate from default 0.10 to 0.05, the random_state (from default None to an arbitrary value of 0, to get reproducible results), and the min_data_in_leaf from the default of 20 to 5 — it had a big effect. I also set verbosity to -1 to suppress all but fatal error messages, but in a non-demo scenario you really want to see all system warning and error messages too. The near-impossibility of fully understanding all the LightGBM parameters and their interactions is the biggest disadvantage of using LightGBM.

The LightGBM model predicted political leaning for the 40-item test data with 82.5% accuracy (33 out of 40 correct). This is roughly comparable accuracy to that achieved by a neural network multi-class classifier. When LightGBM works, it often works very well. Tree-based systems are highly susceptible to overfitting, but the LightGBM system does a lot to mitigate overfitting.



My synthetic demo data has a political leaning column, but I have very little interest in politics. The kind of people who are attracted to politics generally have none of the personality characteristics I admire, and many of the characteristics I dislike, notably dishonesty. A Google search for “state senator arrested” returned dozens of results, which didn’t really surprise me. Here are three samples. From left to right: New Jersey, New York, Missouri.


Demo program:

# people_politics_lgbm.py
# predict politics from sex, age, State, income
# Anaconda3-2023.09-0  Python 3.11.5  LightGBM 4.3.0

import numpy as np
import lightgbm as lgbm

# -----------------------------------------------------------

def accuracy(model, data_x, data_y):
  # simple
  preds = model.predict(data_x)  # all predicted values
  n_correct = np.sum(preds == data_y)
  result = n_correct / len(data_x)
  return result
  
# -----------------------------------------------------------

def show_accuracy(model, data_x, data_y, n_classes):
  # more details
  n_corrects = np.zeros(n_classes, dtype=np.int64)
  n_wrongs = np.zeros(n_classes, dtype=np.int64)
  for i in range(len(data_x)):
    x = data_x[i].reshape(1, -1)  # batch it
    trgt = data_y[i]  # scalar like 2
    pred = model.predict(x)  # array like [2]
    pred = pred[0]  # like 2
    if pred == trgt:
      n_corrects[trgt] += 1
    else:
      n_wrongs[trgt] += 1

  accs = n_corrects / (n_corrects + n_wrongs)
  counts = n_corrects + n_wrongs

  macro_acc = np.sum(n_corrects) / len(data_x)
  print("Overall accuracy = %8.4f" % macro_acc)

  for c in range(n_classes):
    print("class %d : " % c, end ="")
    print(" ct = %3d " % counts[c], end="")
    print(" correct = %3d " % n_corrects[c], end ="")
    print(" wrong = %3d " % n_wrongs[c], end ="")
    print(" acc = %7.4f " % accs[c])

# -----------------------------------------------------------

def confusion_matrix_multi(model, data_x, data_y, n_classes):
  # assumes n_classes is 3 or greater
  cm = np.zeros((n_classes,n_classes), dtype=np.int64)
  for i in range(len(data_x)):
    x = data_x[i].reshape(1, -1)  # batch it
    trgt_y = data_y[i]  # scalar like 2
    pred_y = model.predict(x)  # array like [2]
    pred_y = pred_y[0]  # like 2
    cm[trgt_y][pred_y] += 1
  return cm

# -----------------------------------------------------------

def show_confusion(cm):
  # cm created using confusion_matrix_multi()
  dim = len(cm)
  mx = np.max(cm)             # largest count in cm
  wid = len(str(mx)) + 1      # width to print
  fmt = "%" + str(wid) + "d"  # like "%3d"
  for i in range(dim):
    print("actual   ", end="")
    print("%3d:" % i, end="")
    for j in range(dim):
      print(fmt % cm[i][j], end="")
    print("")
  print("------------")
  print("predicted    ", end="")
  for j in range(dim):
    print(fmt % j, end="")
  print("")

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin People predict politics using LightGBM ")
  print("Predict politics from sex, age, State, income ")
  np.random.seed(1)

  # 1. load data that looks like:
  # sex, age, State, income, politics
  # 1, 24, 0, 29500.00, 2
  # 0, 39, 2, 51200.00, 1
  # . . .
  print("\nLoading train and test data ")
  train_file = ".\\Data\\people_train.txt"
  train_x = np.loadtxt(train_file, usecols=[0,1,2,3],
    delimiter=",", comments="#", dtype=np.float64)
  train_y = np.loadtxt(train_file, usecols=4,
    delimiter=",", comments="#", dtype=np.int64)

  test_file = ".\\Data\\people_test.txt"
  test_x = np.loadtxt(test_file, usecols=[0,1,2,3],
    delimiter=",", comments="#", dtype=np.float64)
  test_y = np.loadtxt(test_file, usecols=4,
    delimiter=",", comments="#", dtype=np.int64)

  np.set_printoptions(precision=0, suppress=True,
    floatmode='fixed')
  print("\nFirst few train data: ")
  for i in range(3):
    print(train_x[i], end="")
    print("  | " + str(train_y[i]))
  print(". . . ")

  # 2. create and train model
  print("\nCreating and training LGBM multi-class model ")
  # model params:
  # https://lightgbm.readthedocs.io/en/latest/pythonapi/
  #   lightgbm.LGBMClassifier.html
  # core params: 
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html
  params = {
    # 'objective': 'multiclass',  # not needed
    'boosting_type': 'gbdt',  # default
    'num_leaves': 31,  # default
    'max_depth':-1,  # default (unlimited) 
    'n_estimators': 50,  # default = 100
    'learning_rate': 0.05,  # default = 0.10
    'min_data_in_leaf': 5,  # default = 20
    'random_state': 0,
    'verbosity': -1  # only fatal. default = 1 error, warn
  }
  model = lgbm.LGBMClassifier(**params)  # scikit API
  model.fit(train_x, train_y)
  print("Done ")

  # 3. evaluate model
  print("\nEvaluating model ")

  # 3a. using a coarse function
  train_acc = accuracy(model, train_x, train_y)
  print("\nAccuracy on training data = %0.4f " % train_acc)
  test_acc = accuracy(model, test_x, test_y)
  print("Accuracy on test data = %0.4f " % test_acc)

  # 3b. using a detailed function
  print("\nAccuracy on test data: ")
  show_accuracy(model, test_x, test_y, n_classes=3)

  # 3c. using a confusion matrix
  print("\nConfusion matrix for test data: ")
  cm = confusion_matrix_multi(model, test_x,
    test_y, n_classes=3)
  show_confusion(cm)

  # # confusion matrix using scikit module
  # from sklearn.metrics import confusion_matrix
  # pred_y = model.predict(test_x)  # all predicteds
  # cm = confusion_matrix(test_y, pred_y)
  # print(cm)

  # # detailed report using scikit
  # from sklearn.metrics import classification_report
  # pred_y = model.predict(test_x)  # all predicteds
  # report = classification_report(test_y, pred_y,
  #  labels=[0, 1, 2])
  # print(report)

  # 4. use model
  print("\nPredicting politics for M 35 Oklahoma $55,000 ")
  print("(0 = conservative, 1 = moderate, 2 = liberal) ")
  x = np.array([[0, 35, 2, 55000.00]], dtype=np.float64)
  pred = model.predict(x)
  print("\nPredicted politics = " + str(pred[0]))

  # 5. save model
  import pickle
  print("\nSaving model ")
  pth = ".\\Models\\politics_model.pkl"
  with open(pth, "wb") as f:
    pickle.dump(model, f)

  # with open(pth, "rb") as f:
  #   model2 = pickle.load(f)
  #
  # x = np.array([[0, 35, 2, 55000.00]], dtype=np.float64)
  # pred = model2.predict(x)
  # print("\nPredicted politics = " + str(pred[0]))

  print("\nEnd demo ")

if __name__ == "__main__":
  main()

Training data:

# people_train.txt
# sex (M = 0, F = 1)
# age
# State (Michigan = 0, Nebraska = 1, Oklahoma = 2)
# income
# politics (conservative = 0, moderate = 1, liberal = 2)
#
1, 24, 0, 29500.00, 2
0, 39, 2, 51200.00, 1
1, 63, 1, 75800.00, 0
0, 36, 0, 44500.00, 1
1, 27, 1, 28600.00, 2
1, 50, 1, 56500.00, 1
1, 50, 2, 55000.00, 1
0, 19, 2, 32700.00, 0
1, 22, 1, 27700.00, 1
0, 39, 2, 47100.00, 2
1, 34, 0, 39400.00, 1
0, 22, 0, 33500.00, 0
1, 35, 2, 35200.00, 2
0, 33, 1, 46400.00, 1
1, 45, 1, 54100.00, 1
1, 42, 1, 50700.00, 1
0, 33, 1, 46800.00, 1
1, 25, 2, 30000.00, 1
0, 31, 1, 46400.00, 0
1, 27, 0, 32500.00, 2
1, 48, 0, 54000.00, 1
0, 64, 1, 71300.00, 2
1, 61, 1, 72400.00, 0
1, 54, 2, 61000.00, 0
1, 29, 0, 36300.00, 0
1, 50, 2, 55000.00, 1
1, 55, 2, 62500.00, 0
1, 40, 0, 52400.00, 0
1, 22, 0, 23600.00, 2
1, 68, 1, 78400.00, 0
0, 60, 0, 71700.00, 2
0, 34, 2, 46500.00, 1
0, 25, 2, 37100.00, 0
0, 31, 1, 48900.00, 1
1, 43, 2, 48000.00, 1
1, 58, 1, 65400.00, 2
0, 55, 1, 60700.00, 2
0, 43, 1, 51100.00, 1
0, 43, 2, 53200.00, 1
0, 21, 0, 37200.00, 0
1, 55, 2, 64600.00, 0
1, 64, 1, 74800.00, 0
0, 41, 0, 58800.00, 1
1, 64, 2, 72700.00, 0
0, 56, 2, 66600.00, 2
1, 31, 2, 36000.00, 1
0, 65, 2, 70100.00, 2
1, 55, 2, 64300.00, 0
0, 25, 0, 40300.00, 0
1, 46, 2, 51000.00, 1
0, 36, 0, 53500.00, 0
1, 52, 1, 58100.00, 1
1, 61, 2, 67900.00, 0
1, 57, 2, 65700.00, 0
0, 46, 1, 52600.00, 1
0, 62, 0, 66800.00, 2
1, 55, 2, 62700.00, 0
0, 22, 2, 27700.00, 1
0, 50, 0, 62900.00, 0
0, 32, 1, 41800.00, 1
0, 21, 2, 35600.00, 0
1, 44, 1, 52000.00, 1
1, 46, 1, 51700.00, 1
1, 62, 1, 69700.00, 0
1, 57, 1, 66400.00, 0
0, 67, 2, 75800.00, 2
1, 29, 0, 34300.00, 2
1, 53, 0, 60100.00, 0
0, 44, 0, 54800.00, 1
1, 46, 1, 52300.00, 1
0, 20, 1, 30100.00, 1
0, 38, 0, 53500.00, 1
1, 50, 1, 58600.00, 1
1, 33, 1, 42500.00, 1
0, 33, 1, 39300.00, 1
1, 26, 1, 40400.00, 0
1, 58, 0, 70700.00, 0
1, 43, 2, 48000.00, 1
0, 46, 0, 64400.00, 0
1, 60, 0, 71700.00, 0
0, 42, 0, 48900.00, 1
0, 56, 2, 56400.00, 2
0, 62, 1, 66300.00, 2
0, 50, 0, 64800.00, 1
1, 47, 2, 52000.00, 1
0, 67, 1, 80400.00, 2
0, 40, 2, 50400.00, 1
1, 42, 1, 48400.00, 1
1, 64, 0, 72000.00, 0
0, 47, 0, 58700.00, 2
1, 45, 1, 52800.00, 1
0, 25, 2, 40900.00, 0
1, 38, 0, 48400.00, 0
1, 55, 2, 60000.00, 1
0, 44, 0, 60600.00, 1
1, 33, 0, 41000.00, 1
1, 34, 2, 39000.00, 1
1, 27, 1, 33700.00, 2
1, 32, 1, 40700.00, 1
1, 42, 2, 47000.00, 1
0, 24, 2, 40300.00, 0
1, 42, 1, 50300.00, 1
1, 25, 2, 28000.00, 2
1, 51, 1, 58000.00, 1
0, 55, 1, 63500.00, 2
1, 44, 0, 47800.00, 2
0, 18, 0, 39800.00, 0
0, 67, 1, 71600.00, 2
1, 45, 2, 50000.00, 1
1, 48, 0, 55800.00, 1
0, 25, 1, 39000.00, 1
0, 67, 0, 78300.00, 1
1, 37, 2, 42000.00, 1
0, 32, 0, 42700.00, 1
1, 48, 0, 57000.00, 1
0, 66, 2, 75000.00, 2
1, 61, 0, 70000.00, 0
0, 58, 2, 68900.00, 1
1, 19, 0, 24000.00, 2
1, 38, 2, 43000.00, 1
0, 27, 0, 36400.00, 1
1, 42, 0, 48000.00, 1
1, 60, 0, 71300.00, 0
0, 27, 2, 34800.00, 0
1, 29, 1, 37100.00, 0
0, 43, 0, 56700.00, 1
1, 48, 0, 56700.00, 1
1, 27, 2, 29400.00, 2
0, 44, 0, 55200.00, 0
1, 23, 1, 26300.00, 2
0, 36, 1, 53000.00, 2
1, 64, 2, 72500.00, 0
1, 29, 2, 30000.00, 2
0, 33, 0, 49300.00, 1
0, 66, 1, 75000.00, 2
0, 21, 2, 34300.00, 0
1, 27, 0, 32700.00, 2
1, 29, 0, 31800.00, 2
0, 31, 0, 48600.00, 1
1, 36, 2, 41000.00, 1
1, 49, 1, 55700.00, 1
0, 28, 0, 38400.00, 0
0, 43, 2, 56600.00, 1
0, 46, 1, 58800.00, 1
1, 57, 0, 69800.00, 0
0, 52, 2, 59400.00, 1
0, 31, 2, 43500.00, 1
0, 55, 0, 62000.00, 2
1, 50, 0, 56400.00, 1
1, 48, 1, 55900.00, 1
0, 22, 2, 34500.00, 0
1, 59, 2, 66700.00, 0
1, 34, 0, 42800.00, 2
0, 64, 0, 77200.00, 2
1, 29, 2, 33500.00, 2
0, 34, 1, 43200.00, 1
0, 61, 0, 75000.00, 2
1, 64, 2, 71100.00, 0
0, 29, 0, 41300.00, 0
1, 63, 1, 70600.00, 0
0, 29, 1, 40000.00, 0
0, 51, 0, 62700.00, 1
0, 24, 2, 37700.00, 0
1, 48, 1, 57500.00, 1
1, 18, 0, 27400.00, 0
1, 18, 0, 20300.00, 2
1, 33, 1, 38200.00, 2
0, 20, 2, 34800.00, 0
1, 29, 2, 33000.00, 2
0, 44, 2, 63000.00, 0
0, 65, 2, 81800.00, 0
0, 56, 0, 63700.00, 2
0, 52, 2, 58400.00, 1
0, 29, 1, 48600.00, 0
0, 47, 1, 58900.00, 1
1, 68, 0, 72600.00, 2
1, 31, 2, 36000.00, 1
1, 61, 1, 62500.00, 2
1, 19, 1, 21500.00, 2
1, 38, 2, 43000.00, 1
0, 26, 0, 42300.00, 0
1, 61, 1, 67400.00, 0
1, 40, 0, 46500.00, 1
0, 49, 0, 65200.00, 1
1, 56, 0, 67500.00, 0
0, 48, 1, 66000.00, 1
1, 52, 0, 56300.00, 2
0, 18, 0, 29800.00, 0
0, 56, 2, 59300.00, 2
0, 52, 1, 64400.00, 1
0, 18, 1, 28600.00, 1
0, 58, 0, 66200.00, 2
0, 39, 1, 55100.00, 1
0, 46, 0, 62900.00, 1
0, 40, 1, 46200.00, 1
0, 60, 0, 72700.00, 2
1, 36, 1, 40700.00, 2
1, 44, 0, 52300.00, 1
1, 28, 0, 31300.00, 2
1, 54, 2, 62600.00, 0

Test data:

# people_test.txt
#
# people_test.txt
#
0, 51, 0, 61200.00, 1
0, 32, 1, 46100.00, 1
1, 55, 0, 62700.00, 0
1, 25, 2, 26200.00, 2
1, 33, 2, 37300.00, 2
0, 29, 1, 46200.00, 0
1, 65, 0, 72700.00, 0
0, 43, 1, 51400.00, 1
0, 54, 1, 64800.00, 2
1, 61, 1, 72700.00, 0
1, 52, 1, 63600.00, 0
1, 30, 1, 33500.00, 2
1, 29, 0, 31400.00, 2
0, 47, 2, 59400.00, 1
1, 39, 1, 47800.00, 1
1, 47, 2, 52000.00, 1
0, 49, 0, 58600.00, 1
0, 63, 2, 67400.00, 2
0, 30, 0, 39200.00, 0
0, 61, 2, 69600.00, 2
0, 47, 2, 58700.00, 1
1, 30, 2, 34500.00, 2
0, 51, 2, 58000.00, 1
0, 24, 0, 38800.00, 1
0, 49, 0, 64500.00, 1
1, 66, 2, 74500.00, 0
0, 65, 0, 76900.00, 0
0, 46, 1, 58000.00, 0
0, 45, 2, 51800.00, 1
0, 47, 0, 63600.00, 0
0, 29, 0, 44800.00, 0
0, 57, 2, 69300.00, 2
0, 20, 0, 28700.00, 2
0, 35, 0, 43400.00, 1
0, 61, 2, 67000.00, 2
0, 31, 2, 37300.00, 1
1, 18, 0, 20800.00, 2
1, 26, 2, 29200.00, 2
0, 28, 0, 36400.00, 2
0, 59, 2, 69400.00, 2
Posted in Machine Learning | Leave a comment

“Data Anomaly Detection Using a Neural Autoencoder with C#” in Visual Studio Magazine

I wrote an article titled “Data Anomaly Detection Using a Neural Autoencoder with C#” in the April 2024 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/Articles/2024/04/15/data-anomaly-detection.aspx.

Data anomaly detection is the process of examining a set of source data to find data items that are different in some way from the majority of the source items. My article explains how to use a neural autoencoder implemented using raw C# to find anomalous data items.

My demo program uses a synthetic dataset that has 240 items. The raw data looks like:

F  24  michigan  29500.00  liberal
M  39  oklahoma  51200.00  moderate
F  63  nebraska  75800.00  conservative
M  36  michigan  44500.00  moderate
F  27  nebraska  28600.00  liberal
. . .

Each line of data represents a person. The fields are sex (male, female), age, State (Michigan, Nebraska, Oklahoma), income, and political leaning (conservative, moderate, liberal).

The result is that the data item that has the largest reconstruction error is (M, 36, nebraska, $53000.00, liberal), which has encoded and normalized form (0.00000, 0.36000, 0.00000, 1.00000, 0.00000, 0.53000, 0.00000, 0.00000, 1.00000).

The predicted output is (-0.00122, 0.40366, -0.00134, 0.99657, 0.00477, 0.49658, 0.01607, -0.01048, 0.99440). This indicates that the anomalous data item has an age value that’s a bit too small (actual 36 versus a predicted of 40) and an income value that’s a bit too large (actual $53,000 versus a predicted of $49,658).

The neural autoencoder anomaly detection technique presented in the article is just one of many ways to look for data anomalies. The technique assumes you are working with tabular data, such as log files. Working with image data, working with time series data, and working with natural language data, all require more specialized techniques.



In many science fiction movies, acting intelligent is anomalous behavior.

Left: In “Deep Blue Sea” (1999), scientists sedate a super intelligent, genentically enchanced shark. Choice A = Leave it alone. Choice B = Go poke it to see if it’s really sedated or just pretending.

Center: In “Alien” (1979), a space crew finds an abandoned alien ship with a cargo full of creepy, menacing egg-like pods. Choice A = Get away quickly. Choice B = Go poke one, and when it slowly opens, stick your helmet with an incredibly fragile glass faceplate directly in front of the pod.

Right: In “Life” (2017), a space station crew retrieves a probe to Mars that has an unknown life form. Choice A = Assume it might be dangerous, keep it isolated, and leave it alone until it can be transferred to a secure facility. Choice B = Assume it’s friendly, give it a cute name, and poke it with your hand covered only by a cheap plastic glove.


Posted in Machine Learning | Leave a comment

PyTorch TransformerEncoder Reconstruction Error Anomaly Detection for Ordered Data

A fairly well known anomaly detection technique uses a neural encoder-decoder (aka autoencoder) combined with reconstruction error. A few weeks ago, I experimented by inserting a TransformerEncoder module into such a system and the results seem promising.

However, transformer architecture is really designed for input vectors that have an inherent ordering — typically sentences. So, I created some synthetic medical data that has order. I made synthetic patient data that looks like:

0.1668, 0.2881, 0.1000, 0.4209, 0.2587, 0.6369, 0.5745, 0.6382, 0.4587, 0.3155, 0.1677, 0.3741
0.0818, 0.3512, 0.1110, 0.5682, 0.3669, 0.8235, 0.5562, 0.5792, 0.6203, 0.4873, 0.1254, 0.3769
0.3506, 0.3578, 0.1340, 0.3156, 0.2679, 0.9513, 0.5393, 0.6684, 0.6832, 0.3133, 0.2768, 0.2262
. . .

Each line of of the 200-item dataset represents a patient. The 12 values on each line are some sort of hypothetical reading taken every hour for 12 hours (or every 2 hours for 24 hours, etc.) The idea of using synthetic medical data came from my colleague Paige R.

Next, I put together a PyTorch program to create an encoder-decoder network that predicts its input. Data item that aren’t reconstructed closely are anomalies, at least according to the model.

The heart of the program is:

class Transformer_Net(T.nn.Module):
  def __init__(self):
    # 12 numeric inputs: no exact word embedding equivalent
    # pseudo embed_dim = 4
    # seq_len = 12
    super(Transformer_Net, self).__init__()

    self.fc1 = T.nn.Linear(12, 12*4)  # pseudo-embedding

    self.pos_enc = \
      PositionalEncoding(4, dropout=0.00)  # positional

    self.enc_layer = T.nn.TransformerEncoderLayer(d_model=4,
      nhead=2, dim_feedforward=100, 
      batch_first=True)  # d_model divisible by nhead

    self.trans_enc = T.nn.TransformerEncoder(self.enc_layer,
      num_layers=6)

    self.dec1 = T.nn.Linear(48, 18)
    self.dec2 = T.nn.Linear(18, 12)

    # use default weight initialization

  def forward(self, x):
    # x is Size([bs, 12])
    z = T.tanh(self.fc1(x))   # [bs, 48]
    z = z.reshape(-1, 12, 4)  # [bs, 12, 4] 
    z = self.pos_enc(z)       # [bs, 12, 4]
    z = self.trans_enc(z)     # [bs, 12, 4]

    z = z.reshape(-1, 48)              # [bs, 48]
    z = T.tanh(self.dec1(z))           # [bs, 18]
    z = self.dec2(z)  # no activation  # [bs, 12]
  
    return z

The architecture is very complicated. Briefly, each numeric input is mapped to a pseudo-embedding vector with 4 values. Then positional encoding is added so the transformer knows the order of the inputs. The data is converted to 3D to accommodate the TransformerEncoder requirement. The output of the TransformerEncoder is reshaped back to 2D and then fed to two Linear fully connected layers, designed so that the final output shape matches the input shape. Whew!

One architecture alternative I want to explore concerns the numeric embedding where each input reading maps to four values. My implementation really isn’t an embedding because I use a Linear layer, which is fully connected. I want to try a true embedding layer. See https://jamesmccaffrey.wordpress.com/2023/04/20/anomaly-detection-for-tabular-data-using-a-pytorch-transformer-with-numeric-embedding/.

After the model has been trained, I invoke an analyze() function that feeds each of the 200 data items to the model, fetches the output, and measures the difference between input and output. I used a custom error function that is the normalized sum of squared differences — close to but not quite Euclidean distance.

The result looks like:

Analyzing data for largest reconstruction error

Largest reconstruction idx: [140]

Largest reconstruction item:
[ 0.0362  0.0516  0.1421  0.3691  0.2506  0.9113
  0.5158  0.5966  0.6516  0.4894  0.2422  0.4905]

Largest reconstruction error: 0.0248

Its reconstruction =
[ 0.1870  0.2014  0.3200  0.5255  0.4023  0.7735
  0.6971  0.7262  0.4979  0.2906  0.2078  0.2887]

This technique seems very promising, but there are a lot of questions that need to be explored.



Putting together a PyTorch program is like putting together a jigsaw puzzle — it’s difficult to make all the pieces fit together. Jigsaw puzzle manufacturers use the same cutting template for different puzzle images. This means you can combine jigsaw puzzles if you have a lot of patience.


Demo program.

# medical_trans_anomaly.py
# Transformer based reconstruction error anomaly detection
# PyTorch 2.2.1-CPU Anaconda3-2023.09-0  Python 3.11.5
# Windows 10/11

import numpy as np
import torch as T

device = T.device('cpu') 
T.set_num_threads(1)

# -----------------------------------------------------------

class PatientDataset(T.utils.data.Dataset):
  # 12 columns
  def __init__(self, src_file):
    tmp_x = np.loadtxt(src_file, usecols=range(0,12),
      delimiter=",", comments="#", dtype=np.float32)
    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx, :]  # row idx, all cols
    sample = { 'predictors' : preds }  # as Dictionary
    return sample  

# -----------------------------------------------------------

class PositionalEncoding(T.nn.Module):  # documentation code
  def __init__(self, d_model: int, dropout: float=0.1,
   max_len: int=5000):
    super(PositionalEncoding, self).__init__()  # old syntax
    self.dropout = T.nn.Dropout(p=dropout)
    pe = T.zeros(max_len, d_model)  # like 10x4
    position = \
      T.arange(0, max_len, dtype=T.float).unsqueeze(1)
    div_term = T.exp(T.arange(0, d_model, 2).float() * \
      (-np.log(10_000.0) / d_model))
    pe[:, 0::2] = T.sin(position * div_term)
    pe[:, 1::2] = T.cos(position * div_term)
    pe = pe.unsqueeze(0).transpose(0, 1)
    self.register_buffer('pe', pe)  # allows state-save

  def forward(self, x):
    x = x + self.pe[:x.size(0), :]
    return self.dropout(x)

# -----------------------------------------------------------

class Transformer_Net(T.nn.Module):
  def __init__(self):
    # 12 numeric inputs: no exact word embedding equivalent
    # pseudo embed_dim = 4
    # seq_len = 12
    super(Transformer_Net, self).__init__()

    self.fc1 = T.nn.Linear(12, 12*4)  # pseudo-embedding

    self.pos_enc = \
      PositionalEncoding(4, dropout=0.00)  # positional

    self.enc_layer = T.nn.TransformerEncoderLayer(d_model=4,
      nhead=2, dim_feedforward=100, 
      batch_first=True)  # d_model divisible by nhead

    self.trans_enc = T.nn.TransformerEncoder(self.enc_layer,
      num_layers=6)

    self.dec1 = T.nn.Linear(48, 18)
    self.dec2 = T.nn.Linear(18, 12)

    # use default weight initialization

  def forward(self, x):
    # x is Size([bs, 12])
    z = T.tanh(self.fc1(x))   # [bs, 48]
    z = z.reshape(-1, 12, 4)  # [bs, 12, 4] 
    z = self.pos_enc(z)       # [bs, 12, 4]
    z = self.trans_enc(z)     # [bs, 12, 4]

    z = z.reshape(-1, 48)              # [bs, 48]
    z = T.tanh(self.dec1(z))           # [bs, 18]
    z = self.dec2(z)  # no activation  # [bs, 12]
  
    return z

# -----------------------------------------------------------

def analyze_error(model, ds):
  largest_err = 0.0
  worst_x = None
  worst_y = None
  worst_idx = 0
  n_features = len(ds[0]['predictors'])

  for i in range(len(ds)):
    X = ds[i]['predictors']
    with T.no_grad():
      Y = model(X)  # should be same as X
    err = T.sum((X-Y)*(X-Y)).item()  # SSE all features
    err = err / n_features           # sort of norm'ed SSE 

    if err "gt" largest_err:  # replace gt with operator
      largest_err = err
      worst_x = X
      worst_y = Y
      worst_idx = i

  np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
  print("\nLargest reconstruction idx: " + str(worst_idx))
  print("\nLargest reconstruction item: ")
  print(worst_x.numpy())
  print("\nLargest reconstruction error: %0.4f" % largest_err)
  print("\nIts reconstruction = " )
  print(worst_y.numpy())

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nBegin patient transformer-based anomaly detect ")
  T.manual_seed(0)
  np.random.seed(0)
  
  # 1. create DataLoader objects
  print("\nCreating Patient Dataset ")

  data_file = ".\\Data\\medical_data_200.txt"
  data_ds = PatientDataset(data_file)  # 200 rows

  bat_size = 10
  data_ldr = T.utils.data.DataLoader(data_ds,
    batch_size=bat_size, shuffle=True)

  # 2. create network
  print("\nCreating Transformer encoder-decoder network ")
  net = Transformer_Net().to(device)

# -----------------------------------------------------------

  # 3. train autoencoder model
  max_epochs = 100
  ep_log_interval = 10
  # lrn_rate = 0.005
  lrn_rate = 0.010

  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(net.parameters(), lr=lrn_rate)

  print("\nbat_size = %3d " % bat_size)
  print("loss = " + str(loss_func))
  print("optimizer = Adam")
  print("lrn_rate = %0.3f " % lrn_rate)
  print("max_epochs = %3d " % max_epochs)
  
  print("\nStarting training")
  net.train()
  for epoch in range(0, max_epochs):
    epoch_loss = 0  # for one full epoch

    for (batch_idx, batch) in enumerate(data_ldr):
      X = batch['predictors'] 
      Y = batch['predictors'] 

      optimizer.zero_grad()
      oupt = net(X)
      loss_val = loss_func(oupt, Y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()
      optimizer.step()

    if epoch % ep_log_interval == 0:
      print("epoch = %4d  |  loss = %0.4f" % \
       (epoch, epoch_loss))
  print("Done ")

# -----------------------------------------------------------

  # 4. find item with largest reconstruction error
  print("\nAnalyzing data for largest reconstruction error ")
  net.eval()
  analyze_error(net, data_ds)

  print("\nEnd transformer autoencoder anomaly demo ")

if __name__ == "__main__":
  main()

Synthetic data:

# medical_data_200.txt
#
0.1668, 0.2881, 0.1000, 0.4209, 0.2587, 0.6369, 0.5745, 0.6382, 0.4587, 0.3155, 0.1677, 0.3741
0.0818, 0.3512, 0.1110, 0.5682, 0.3669, 0.8235, 0.5562, 0.5792, 0.6203, 0.4873, 0.1254, 0.3769
0.3506, 0.3578, 0.1340, 0.3156, 0.2679, 0.9513, 0.5393, 0.6684, 0.6832, 0.3133, 0.2768, 0.2262
0.2746, 0.3339, 0.1073, 0.6001, 0.5955, 0.8993, 0.6122, 0.8157, 0.3413, 0.2792, 0.3634, 0.2174
0.1151, 0.0520, 0.1077, 0.5715, 0.2847, 0.7062, 0.6966, 0.5213, 0.5296, 0.1587, 0.2357, 0.3799
0.0409, 0.1656, 0.3778, 0.4657, 0.2200, 0.8144, 0.7655, 0.7060, 0.6778, 0.3346, 0.3614, 0.1550
0.0557, 0.3230, 0.2591, 0.3661, 0.5710, 0.7391, 0.8003, 0.7904, 0.6533, 0.3495, 0.3004, 0.2396
0.1080, 0.3584, 0.2712, 0.6859, 0.4654, 0.8487, 0.5459, 0.8798, 0.4800, 0.3314, 0.1633, 0.1948
0.3614, 0.2295, 0.1011, 0.5469, 0.3307, 0.8108, 0.8544, 0.6429, 0.6634, 0.3493, 0.0063, 0.4718
0.2764, 0.3989, 0.1689, 0.3549, 0.5730, 0.8787, 0.5264, 0.8022, 0.6016, 0.4692, 0.2846, 0.1497
0.0080, 0.0105, 0.1113, 0.3985, 0.5440, 0.8155, 0.7211, 0.8368, 0.3497, 0.2117, 0.2343, 0.4878
0.2244, 0.0075, 0.4203, 0.3932, 0.5228, 0.7551, 0.8454, 0.7988, 0.5225, 0.1546, 0.0240, 0.1485
0.0178, 0.0430, 0.1903, 0.5852, 0.4239, 0.6050, 0.5288, 0.8869, 0.5272, 0.1813, 0.1009, 0.3975
0.0782, 0.2325, 0.4880, 0.6387, 0.2959, 0.7975, 0.7480, 0.8316, 0.3627, 0.1074, 0.0280, 0.2945
0.2425, 0.2275, 0.2269, 0.6954, 0.4319, 0.7521, 0.7204, 0.7981, 0.5677, 0.2060, 0.0265, 0.2480
0.2519, 0.0841, 0.4011, 0.3266, 0.3041, 0.9219, 0.5774, 0.7558, 0.5099, 0.4699, 0.1053, 0.1264
0.2940, 0.3089, 0.4631, 0.6728, 0.2056, 0.6937, 0.7467, 0.8796, 0.6801, 0.3227, 0.3662, 0.3566
0.1560, 0.1944, 0.3417, 0.5198, 0.5705, 0.9675, 0.6580, 0.8853, 0.3696, 0.1505, 0.0540, 0.3023
0.0086, 0.3792, 0.4308, 0.3060, 0.2705, 0.7328, 0.5524, 0.8238, 0.4379, 0.4760, 0.2328, 0.4515
0.3379, 0.3622, 0.2840, 0.5185, 0.5194, 0.7143, 0.6961, 0.7396, 0.3062, 0.3374, 0.1735, 0.4229
0.1261, 0.3572, 0.3311, 0.3736, 0.5152, 0.8448, 0.5216, 0.6681, 0.5716, 0.4674, 0.0002, 0.4907
0.1506, 0.3895, 0.3419, 0.6315, 0.4299, 0.8512, 0.6142, 0.7347, 0.6000, 0.4433, 0.3020, 0.3792
0.3458, 0.1291, 0.3683, 0.4803, 0.3528, 0.7643, 0.6606, 0.6270, 0.5488, 0.2721, 0.3895, 0.3711
0.0794, 0.1707, 0.2373, 0.6191, 0.5520, 0.9615, 0.7651, 0.6081, 0.4009, 0.4420, 0.2111, 0.4209
0.2290, 0.2933, 0.3076, 0.6084, 0.4275, 0.7863, 0.6371, 0.5273, 0.4512, 0.1319, 0.3931, 0.1726
0.3247, 0.3500, 0.3754, 0.5278, 0.2644, 0.7868, 0.6381, 0.5900, 0.5370, 0.2249, 0.3665, 0.4639
0.1028, 0.0444, 0.1772, 0.4998, 0.4914, 0.6833, 0.5992, 0.8407, 0.4663, 0.3467, 0.0935, 0.1408
0.2063, 0.1909, 0.1611, 0.5487, 0.4176, 0.8617, 0.5578, 0.8006, 0.3888, 0.3077, 0.3141, 0.1089
0.1297, 0.3492, 0.4379, 0.5154, 0.5466, 0.9799, 0.8306, 0.8416, 0.3395, 0.3605, 0.2814, 0.3441
0.3198, 0.0138, 0.4081, 0.5927, 0.3039, 0.7028, 0.7529, 0.6381, 0.6186, 0.2785, 0.3131, 0.4962
0.1201, 0.0572, 0.4605, 0.5166, 0.5899, 0.8546, 0.8976, 0.7184, 0.5106, 0.1542, 0.1423, 0.1105
0.0642, 0.2983, 0.1122, 0.4466, 0.5449, 0.8771, 0.7764, 0.5755, 0.4768, 0.3326, 0.3959, 0.1816
0.0991, 0.1049, 0.4001, 0.4828, 0.2228, 0.8034, 0.5848, 0.8194, 0.4189, 0.1110, 0.2374, 0.4375
0.1524, 0.2999, 0.3045, 0.5164, 0.5838, 0.9216, 0.5129, 0.7838, 0.4860, 0.4790, 0.0886, 0.2068
0.0326, 0.1714, 0.1436, 0.5535, 0.5212, 0.8787, 0.8065, 0.6370, 0.6383, 0.2715, 0.3296, 0.3506
0.0574, 0.0314, 0.1073, 0.3267, 0.3834, 0.6453, 0.5111, 0.8019, 0.4579, 0.3988, 0.1810, 0.2800
0.1912, 0.1896, 0.4213, 0.4610, 0.5619, 0.6148, 0.8095, 0.5503, 0.5474, 0.1041, 0.2155, 0.1012
0.3805, 0.3622, 0.4184, 0.6661, 0.2582, 0.6631, 0.5751, 0.7490, 0.6623, 0.4960, 0.2844, 0.3927
0.3637, 0.1603, 0.1999, 0.3694, 0.2478, 0.9250, 0.5587, 0.6057, 0.6276, 0.2242, 0.3930, 0.2067
0.2135, 0.1258, 0.4643, 0.4466, 0.3734, 0.8049, 0.8756, 0.5124, 0.5868, 0.4564, 0.0109, 0.3088
0.1304, 0.3438, 0.3234, 0.5761, 0.3811, 0.8513, 0.6160, 0.5037, 0.5307, 0.2246, 0.2069, 0.4666
0.1706, 0.0990, 0.2485, 0.6727, 0.5747, 0.9377, 0.8681, 0.5912, 0.3350, 0.1909, 0.1258, 0.1699
0.2428, 0.1654, 0.4265, 0.3741, 0.4808, 0.6961, 0.7297, 0.6396, 0.3228, 0.1915, 0.2656, 0.2989
0.2076, 0.0699, 0.3283, 0.6987, 0.5267, 0.8377, 0.8904, 0.8606, 0.5382, 0.1130, 0.0374, 0.1261
0.1807, 0.1502, 0.4901, 0.3672, 0.5891, 0.9070, 0.8297, 0.7530, 0.5675, 0.2908, 0.0053, 0.2412
0.1968, 0.2920, 0.2875, 0.4830, 0.2551, 0.6044, 0.8033, 0.6280, 0.6938, 0.1881, 0.1355, 0.3096
0.3020, 0.1855, 0.1499, 0.4250, 0.4018, 0.8695, 0.8081, 0.5521, 0.3092, 0.3076, 0.3240, 0.1050
0.2690, 0.2747, 0.2797, 0.6659, 0.4577, 0.6021, 0.6938, 0.8437, 0.6322, 0.3597, 0.2695, 0.3314
0.1096, 0.2242, 0.3687, 0.4410, 0.5423, 0.6780, 0.7989, 0.6158, 0.6095, 0.2711, 0.3231, 0.2414
0.0855, 0.3069, 0.2235, 0.5933, 0.4978, 0.6886, 0.5856, 0.5796, 0.3570, 0.2508, 0.0107, 0.1444
0.2698, 0.3199, 0.1322, 0.3927, 0.2831, 0.9669, 0.7845, 0.7216, 0.4218, 0.4339, 0.1741, 0.4694
0.2824, 0.1912, 0.1505, 0.6904, 0.2639, 0.6810, 0.6725, 0.6617, 0.3587, 0.3917, 0.0755, 0.3576
0.3017, 0.0843, 0.3404, 0.5996, 0.4553, 0.8389, 0.6182, 0.7926, 0.6781, 0.2702, 0.3129, 0.1225
0.3341, 0.0769, 0.2580, 0.4200, 0.2320, 0.9619, 0.6481, 0.7123, 0.4976, 0.1529, 0.0826, 0.1305
0.2032, 0.1046, 0.2428, 0.3432, 0.5150, 0.6426, 0.8943, 0.5709, 0.5290, 0.1179, 0.3148, 0.1758
0.2112, 0.2960, 0.1600, 0.5204, 0.2866, 0.9037, 0.7892, 0.5706, 0.6448, 0.1079, 0.3441, 0.3236
0.1613, 0.3035, 0.3868, 0.6949, 0.3112, 0.6015, 0.8736, 0.8432, 0.5915, 0.3067, 0.2828, 0.4122
0.1500, 0.3081, 0.4002, 0.5453, 0.3607, 0.8789, 0.5012, 0.8100, 0.6586, 0.1957, 0.0483, 0.1881
0.1208, 0.3532, 0.3173, 0.4147, 0.2553, 0.7161, 0.7455, 0.6297, 0.4829, 0.2776, 0.3313, 0.2705
0.1383, 0.2700, 0.1886, 0.4869, 0.3259, 0.8507, 0.8509, 0.6791, 0.6138, 0.2828, 0.2625, 0.1527
0.1732, 0.3637, 0.3422, 0.6067, 0.4019, 0.7992, 0.8372, 0.5271, 0.5293, 0.4771, 0.2071, 0.1778
0.3392, 0.1007, 0.3803, 0.5161, 0.5795, 0.8497, 0.8352, 0.5032, 0.6957, 0.1311, 0.1289, 0.4785
0.0036, 0.3291, 0.4445, 0.4759, 0.3023, 0.9211, 0.6911, 0.5537, 0.6711, 0.4584, 0.1966, 0.4427
0.1674, 0.2734, 0.2592, 0.5023, 0.2758, 0.9860, 0.6177, 0.5414, 0.3577, 0.1056, 0.2864, 0.3258
0.3178, 0.2028, 0.4167, 0.5783, 0.5111, 0.7626, 0.7591, 0.5719, 0.4287, 0.1690, 0.1635, 0.1966
0.1628, 0.3901, 0.2281, 0.6930, 0.4545, 0.7500, 0.8430, 0.7478, 0.4008, 0.4171, 0.1732, 0.2430
0.1321, 0.2789, 0.2075, 0.6233, 0.3181, 0.8176, 0.6952, 0.8421, 0.6554, 0.1738, 0.2341, 0.4593
0.1784, 0.3687, 0.2116, 0.5435, 0.4730, 0.6913, 0.5055, 0.6667, 0.6754, 0.2372, 0.3119, 0.1699
0.1368, 0.0578, 0.3867, 0.5797, 0.4754, 0.7014, 0.7769, 0.5909, 0.4699, 0.2488, 0.1421, 0.1231
0.2527, 0.2829, 0.3454, 0.5593, 0.2680, 0.6598, 0.7057, 0.8501, 0.3736, 0.2851, 0.1716, 0.2989
0.0646, 0.1370, 0.2048, 0.6378, 0.5201, 0.7707, 0.7428, 0.5582, 0.5038, 0.2188, 0.3439, 0.3686
0.2534, 0.0499, 0.2882, 0.6946, 0.5793, 0.8580, 0.5607, 0.7557, 0.5263, 0.2875, 0.1712, 0.3397
0.3400, 0.3004, 0.3317, 0.6699, 0.2259, 0.9965, 0.5212, 0.5798, 0.4691, 0.1430, 0.2495, 0.1192
0.1138, 0.0244, 0.3814, 0.5674, 0.3514, 0.6753, 0.7988, 0.6362, 0.6181, 0.2952, 0.2103, 0.1114
0.2577, 0.1403, 0.1917, 0.4736, 0.3530, 0.7879, 0.8918, 0.6458, 0.6098, 0.3211, 0.3557, 0.2420
0.0982, 0.3644, 0.1174, 0.6803, 0.4226, 0.7505, 0.8980, 0.5233, 0.5067, 0.1124, 0.2285, 0.1722
0.2524, 0.3924, 0.4500, 0.4807, 0.4834, 0.9110, 0.6979, 0.7114, 0.3603, 0.2478, 0.0569, 0.3908
0.1908, 0.1796, 0.4544, 0.5110, 0.3636, 0.7076, 0.5288, 0.6673, 0.3103, 0.2165, 0.2014, 0.4864
0.0438, 0.2692, 0.3000, 0.6108, 0.2574, 0.6333, 0.6597, 0.8188, 0.3767, 0.4071, 0.1161, 0.1868
0.0067, 0.1595, 0.2524, 0.5637, 0.2284, 0.6610, 0.5066, 0.5455, 0.5607, 0.2611, 0.1284, 0.3232
0.3974, 0.3338, 0.3798, 0.6673, 0.2159, 0.6281, 0.6896, 0.6397, 0.6749, 0.2958, 0.2159, 0.4581
0.1787, 0.3508, 0.2014, 0.4095, 0.3313, 0.8190, 0.5881, 0.7686, 0.3571, 0.1376, 0.3481, 0.1947
0.1544, 0.2286, 0.3103, 0.3304, 0.5497, 0.9805, 0.8250, 0.6135, 0.5111, 0.2358, 0.2219, 0.4898
0.1247, 0.2675, 0.2304, 0.6098, 0.3303, 0.9559, 0.8007, 0.8051, 0.4878, 0.1843, 0.0166, 0.2287
0.0148, 0.2775, 0.3681, 0.4722, 0.5071, 0.8144, 0.5159, 0.5539, 0.3774, 0.2343, 0.0209, 0.3420
0.2048, 0.2470, 0.2729, 0.6391, 0.3816, 0.6062, 0.8492, 0.7625, 0.6292, 0.4807, 0.0204, 0.1940
0.0253, 0.1687, 0.4455, 0.3326, 0.3892, 0.6502, 0.8092, 0.8366, 0.3173, 0.2946, 0.0958, 0.4810
0.3776, 0.2456, 0.4894, 0.4379, 0.5591, 0.7738, 0.5943, 0.8763, 0.5737, 0.1260, 0.3482, 0.3806
0.2420, 0.2929, 0.2014, 0.5402, 0.5258, 0.6216, 0.5522, 0.8370, 0.5473, 0.3125, 0.0993, 0.2180
0.3491, 0.1687, 0.1258, 0.6588, 0.2814, 0.9305, 0.8527, 0.6947, 0.5394, 0.3109, 0.2499, 0.4420
0.1129, 0.3535, 0.3271, 0.3460, 0.2908, 0.8384, 0.5958, 0.5526, 0.3647, 0.4379, 0.2409, 0.4854
0.1383, 0.2383, 0.3396, 0.5463, 0.2237, 0.9001, 0.8793, 0.7139, 0.3770, 0.4012, 0.0029, 0.2313
0.3670, 0.2353, 0.4421, 0.5419, 0.5290, 0.9518, 0.6284, 0.5492, 0.5885, 0.2761, 0.0507, 0.3359
0.0144, 0.0801, 0.4153, 0.3048, 0.3213, 0.6086, 0.8990, 0.7328, 0.4174, 0.4716, 0.2028, 0.2819
0.2351, 0.1057, 0.2221, 0.4487, 0.2978, 0.8338, 0.7783, 0.5288, 0.6884, 0.4012, 0.3225, 0.4007
0.0320, 0.1927, 0.2783, 0.5690, 0.3795, 0.8817, 0.7727, 0.7789, 0.5474, 0.1604, 0.3043, 0.4124
0.3616, 0.0935, 0.1707, 0.4564, 0.3282, 0.9262, 0.7454, 0.8040, 0.4711, 0.1398, 0.0460, 0.2494
0.0775, 0.3283, 0.3399, 0.5755, 0.3964, 0.6353, 0.5940, 0.6846, 0.3794, 0.1102, 0.2918, 0.3900
0.1322, 0.3374, 0.2714, 0.6459, 0.4628, 0.8324, 0.5803, 0.7118, 0.6578, 0.2220, 0.3484, 0.4635
0.1319, 0.2732, 0.4597, 0.3303, 0.5514, 0.6763, 0.8399, 0.7668, 0.4377, 0.1606, 0.2541, 0.4391
0.3287, 0.2513, 0.4825, 0.5360, 0.2791, 0.7717, 0.6347, 0.8968, 0.4521, 0.4971, 0.2075, 0.1689
0.0298, 0.1481, 0.1494, 0.5538, 0.3656, 0.9964, 0.8720, 0.5597, 0.4580, 0.2846, 0.2244, 0.4121
0.1949, 0.1680, 0.2048, 0.6643, 0.2089, 0.9284, 0.5754, 0.7743, 0.4421, 0.4897, 0.0491, 0.1750
0.3558, 0.2334, 0.2237, 0.3003, 0.2910, 0.6582, 0.5814, 0.8585, 0.6492, 0.3801, 0.1882, 0.4309
0.1983, 0.1454, 0.2102, 0.6699, 0.3548, 0.7972, 0.6018, 0.8540, 0.4533, 0.2190, 0.2870, 0.1763
0.0473, 0.3348, 0.3977, 0.5362, 0.2972, 0.8493, 0.7553, 0.6310, 0.3270, 0.4522, 0.1840, 0.4055
0.1016, 0.2366, 0.2715, 0.4528, 0.2507, 0.6977, 0.5317, 0.6211, 0.5967, 0.3460, 0.2690, 0.1034
0.2714, 0.2013, 0.1924, 0.3700, 0.2740, 0.9377, 0.8930, 0.8655, 0.4389, 0.4121, 0.2186, 0.4266
0.1935, 0.2360, 0.4149, 0.3401, 0.4148, 0.7464, 0.7417, 0.8835, 0.4571, 0.2572, 0.3163, 0.3580
0.1576, 0.2756, 0.2616, 0.3544, 0.3803, 0.7338, 0.5872, 0.8703, 0.5759, 0.3395, 0.2987, 0.3168
0.2802, 0.3722, 0.4450, 0.3670, 0.3053, 0.6286, 0.8915, 0.5946, 0.5642, 0.1359, 0.0843, 0.3011
0.0420, 0.1555, 0.3152, 0.4357, 0.4224, 0.8147, 0.6562, 0.7785, 0.5714, 0.3749, 0.2246, 0.2432
0.2452, 0.3743, 0.3388, 0.6918, 0.3764, 0.8958, 0.5150, 0.8059, 0.5073, 0.1021, 0.1109, 0.3139
0.3072, 0.0212, 0.3196, 0.6204, 0.4598, 0.9726, 0.5299, 0.6107, 0.6677, 0.4060, 0.2399, 0.4332
0.3584, 0.3891, 0.4994, 0.3559, 0.2282, 0.6294, 0.5059, 0.8887, 0.3379, 0.4367, 0.2741, 0.2950
0.1387, 0.1415, 0.2015, 0.6644, 0.4903, 0.6104, 0.6846, 0.6125, 0.3116, 0.4539, 0.3084, 0.2319
0.3186, 0.1299, 0.2232, 0.6712, 0.5908, 0.8094, 0.8808, 0.8552, 0.5072, 0.2491, 0.2841, 0.2823
0.2421, 0.3962, 0.4096, 0.4337, 0.2356, 0.6740, 0.7107, 0.6668, 0.6203, 0.4733, 0.0711, 0.4440
0.3830, 0.3838, 0.1151, 0.3230, 0.2023, 0.7251, 0.5223, 0.6175, 0.4424, 0.4815, 0.1908, 0.2113
0.2012, 0.2573, 0.1445, 0.6032, 0.5408, 0.9377, 0.8562, 0.6527, 0.4775, 0.1406, 0.0903, 0.4885
0.1138, 0.3630, 0.4562, 0.6690, 0.2625, 0.9519, 0.7528, 0.5843, 0.4380, 0.1640, 0.1777, 0.1335
0.0661, 0.0789, 0.3764, 0.5295, 0.5493, 0.6979, 0.7518, 0.5138, 0.5065, 0.4453, 0.3937, 0.1039
0.1011, 0.0895, 0.1190, 0.3041, 0.3961, 0.6182, 0.6113, 0.7560, 0.4179, 0.2079, 0.2362, 0.2522
0.2810, 0.1984, 0.3533, 0.4414, 0.3148, 0.6533, 0.8748, 0.8219, 0.6752, 0.1742, 0.3733, 0.4743
0.1134, 0.1099, 0.3206, 0.3741, 0.3768, 0.6741, 0.8790, 0.6975, 0.6856, 0.3580, 0.1937, 0.4871
0.0573, 0.2529, 0.3648, 0.4735, 0.2237, 0.7971, 0.6861, 0.8227, 0.4027, 0.2566, 0.0961, 0.3746
0.3960, 0.0710, 0.4286, 0.3924, 0.2232, 0.6555, 0.8741, 0.8554, 0.4157, 0.4791, 0.3400, 0.2739
0.1872, 0.2519, 0.1632, 0.3059, 0.3062, 0.6062, 0.7698, 0.7206, 0.4287, 0.4121, 0.0583, 0.1980
0.1169, 0.0784, 0.1352, 0.6480, 0.2353, 0.8735, 0.5482, 0.5043, 0.5229, 0.4628, 0.3442, 0.2354
0.0109, 0.3203, 0.4224, 0.6474, 0.4679, 0.9231, 0.8590, 0.6815, 0.5231, 0.3025, 0.2768, 0.3732
0.2081, 0.3314, 0.3023, 0.6299, 0.3127, 0.6714, 0.8879, 0.7968, 0.4039, 0.3325, 0.3821, 0.1323
0.0334, 0.2477, 0.1898, 0.6061, 0.4273, 0.8665, 0.5431, 0.5337, 0.5500, 0.2639, 0.0349, 0.2484
0.2689, 0.0758, 0.4583, 0.6799, 0.5846, 0.8920, 0.6625, 0.7975, 0.4152, 0.2258, 0.2424, 0.3379
0.3515, 0.1018, 0.4065, 0.6764, 0.2004, 0.7904, 0.7628, 0.8373, 0.3735, 0.4425, 0.1457, 0.4569
0.0111, 0.0342, 0.4926, 0.5438, 0.3671, 0.6677, 0.7600, 0.5148, 0.4246, 0.2294, 0.2430, 0.3603
0.3384, 0.3710, 0.3642, 0.5313, 0.3595, 0.9866, 0.5616, 0.8580, 0.4244, 0.3194, 0.2728, 0.1946
0.0671, 0.2034, 0.4167, 0.5770, 0.2476, 0.9603, 0.6919, 0.8787, 0.5214, 0.1339, 0.0810, 0.4417
0.2824, 0.3580, 0.2317, 0.5112, 0.4602, 0.8377, 0.5926, 0.6707, 0.3992, 0.4382, 0.3947, 0.1266
0.2856, 0.1320, 0.3502, 0.4033, 0.4604, 0.7485, 0.6179, 0.8647, 0.6770, 0.3454, 0.0858, 0.4758
0.3002, 0.3004, 0.1847, 0.6321, 0.2960, 0.8526, 0.7843, 0.7970, 0.4561, 0.4039, 0.3612, 0.3350
0.0362, 0.0516, 0.1421, 0.3691, 0.2506, 0.9113, 0.5158, 0.5966, 0.6516, 0.4894, 0.2422, 0.4905
0.0174, 0.3794, 0.2245, 0.6196, 0.5243, 0.9440, 0.6834, 0.8723, 0.4032, 0.4738, 0.2476, 0.4942
0.0131, 0.2901, 0.3262, 0.4970, 0.3037, 0.7307, 0.5998, 0.5877, 0.6199, 0.3010, 0.0333, 0.4108
0.2135, 0.2958, 0.3062, 0.4600, 0.5945, 0.6113, 0.8731, 0.8723, 0.4564, 0.1858, 0.2477, 0.1712
0.3213, 0.1016, 0.2163, 0.6664, 0.5612, 0.8142, 0.8451, 0.6410, 0.6992, 0.2735, 0.1179, 0.1166
0.3959, 0.1860, 0.3938, 0.5563, 0.4892, 0.6204, 0.8680, 0.8707, 0.5208, 0.4856, 0.1124, 0.3409
0.1586, 0.2512, 0.2203, 0.6277, 0.2279, 0.6168, 0.5198, 0.5602, 0.4581, 0.4822, 0.0443, 0.3590
0.2110, 0.1413, 0.1793, 0.3882, 0.2175, 0.8853, 0.7615, 0.6775, 0.5876, 0.1440, 0.3755, 0.4391
0.2636, 0.1515, 0.2666, 0.4929, 0.5741, 0.9454, 0.6912, 0.7218, 0.6502, 0.4797, 0.2557, 0.3994
0.1406, 0.3672, 0.4347, 0.5208, 0.5471, 0.9399, 0.8234, 0.5523, 0.5144, 0.4603, 0.3083, 0.2683
0.3912, 0.1003, 0.2334, 0.6817, 0.5235, 0.9601, 0.5046, 0.6519, 0.3942, 0.2184, 0.2952, 0.3896
0.1847, 0.1461, 0.3339, 0.5135, 0.4202, 0.8462, 0.6583, 0.8087, 0.4005, 0.3623, 0.3842, 0.1014
0.2893, 0.0436, 0.3175, 0.5508, 0.2972, 0.9655, 0.7489, 0.5927, 0.6081, 0.1422, 0.2221, 0.1380
0.2324, 0.1012, 0.3598, 0.5863, 0.4097, 0.8630, 0.8253, 0.8230, 0.5479, 0.2804, 0.1632, 0.4499
0.2812, 0.0740, 0.3263, 0.6635, 0.2603, 0.9382, 0.8281, 0.6997, 0.3150, 0.1590, 0.3691, 0.1177
0.1490, 0.2476, 0.1788, 0.6924, 0.2624, 0.8159, 0.7472, 0.5924, 0.4865, 0.2964, 0.3067, 0.3537
0.1870, 0.2668, 0.2814, 0.5103, 0.4634, 0.6725, 0.8152, 0.7411, 0.6921, 0.1890, 0.1865, 0.4583
0.2406, 0.2904, 0.2655, 0.5737, 0.5784, 0.6801, 0.8229, 0.8878, 0.3058, 0.1861, 0.0470, 0.4070
0.0325, 0.3994, 0.2558, 0.6621, 0.2754, 0.7763, 0.8379, 0.7381, 0.4849, 0.3051, 0.2296, 0.2761
0.0774, 0.0456, 0.1605, 0.3210, 0.4235, 0.9391, 0.6436, 0.5301, 0.4703, 0.3408, 0.1702, 0.4243
0.3768, 0.3480, 0.3816, 0.4881, 0.4731, 0.6054, 0.8930, 0.5688, 0.6625, 0.4228, 0.3201, 0.2076
0.0877, 0.1715, 0.4397, 0.4802, 0.5742, 0.7411, 0.7478, 0.8923, 0.5574, 0.1182, 0.1854, 0.2339
0.2129, 0.1139, 0.1489, 0.5673, 0.4725, 0.9469, 0.5530, 0.8194, 0.6307, 0.3586, 0.0820, 0.2046
0.1674, 0.2167, 0.2910, 0.4870, 0.5359, 0.9480, 0.8267, 0.8511, 0.5284, 0.4856, 0.2426, 0.3416
0.1277, 0.2725, 0.1208, 0.3538, 0.2521, 0.8621, 0.5701, 0.6365, 0.3177, 0.1935, 0.3857, 0.3037
0.0604, 0.2092, 0.4774, 0.6463, 0.3568, 0.7135, 0.8047, 0.5962, 0.4017, 0.1336, 0.3457, 0.2792
0.2247, 0.2947, 0.4186, 0.4790, 0.2737, 0.9315, 0.5124, 0.8787, 0.5308, 0.4502, 0.2434, 0.2007
0.1185, 0.2132, 0.4848, 0.3738, 0.4040, 0.7375, 0.8079, 0.8211, 0.4698, 0.1816, 0.0268, 0.1795
0.1090, 0.2395, 0.4492, 0.3513, 0.5833, 0.8733, 0.7968, 0.8932, 0.4664, 0.3126, 0.2717, 0.3052
0.1196, 0.0422, 0.2140, 0.6075, 0.4563, 0.9205, 0.7068, 0.5928, 0.5512, 0.2216, 0.0118, 0.4613
0.1681, 0.1705, 0.3965, 0.6794, 0.2290, 0.6691, 0.6289, 0.5994, 0.4982, 0.4298, 0.3416, 0.3383
0.0961, 0.0746, 0.3974, 0.3949, 0.4158, 0.8997, 0.5913, 0.5698, 0.3159, 0.1590, 0.3406, 0.1419
0.0895, 0.0131, 0.3705, 0.3927, 0.3654, 0.6557, 0.6509, 0.6698, 0.4892, 0.2691, 0.0411, 0.4090
0.0549, 0.1697, 0.2088, 0.4204, 0.4690, 0.8071, 0.5760, 0.6877, 0.4358, 0.3818, 0.0904, 0.4380
0.2105, 0.2002, 0.2015, 0.3745, 0.3887, 0.9927, 0.5532, 0.6134, 0.6203, 0.3659, 0.1115, 0.2259
0.1680, 0.2411, 0.3877, 0.6429, 0.4724, 0.6948, 0.8703, 0.8125, 0.4230, 0.2220, 0.3525, 0.1504
0.2534, 0.1111, 0.4309, 0.4458, 0.4933, 0.6770, 0.6926, 0.8214, 0.4588, 0.1646, 0.2596, 0.4013
0.1121, 0.1805, 0.4602, 0.6488, 0.4829, 0.8364, 0.8270, 0.8631, 0.3010, 0.2589, 0.2246, 0.3936
0.0350, 0.1599, 0.2118, 0.4694, 0.3992, 0.8640, 0.6985, 0.7482, 0.5330, 0.2713, 0.0020, 0.1778
0.1281, 0.0588, 0.3395, 0.5446, 0.4000, 0.7283, 0.7613, 0.5761, 0.3024, 0.3940, 0.1774, 0.3791
0.0740, 0.0250, 0.2512, 0.5784, 0.2411, 0.6783, 0.6816, 0.7485, 0.6000, 0.1439, 0.2498, 0.2549
0.2680, 0.0814, 0.3112, 0.3689, 0.2075, 0.7948, 0.5737, 0.7553, 0.5146, 0.4100, 0.1572, 0.4958
0.2061, 0.1915, 0.3998, 0.5291, 0.3450, 0.7957, 0.5757, 0.6574, 0.3120, 0.2850, 0.1098, 0.3107
0.0117, 0.2220, 0.2172, 0.5310, 0.4931, 0.7761, 0.7653, 0.5956, 0.6994, 0.1972, 0.3763, 0.1869
0.1990, 0.3285, 0.3866, 0.5822, 0.3762, 0.9017, 0.8680, 0.6765, 0.5112, 0.1264, 0.1563, 0.2869
0.3581, 0.0442, 0.3925, 0.5182, 0.4426, 0.6119, 0.5587, 0.6136, 0.3019, 0.3677, 0.3481, 0.3188
0.2173, 0.2463, 0.2209, 0.4467, 0.4300, 0.9237, 0.5806, 0.6310, 0.5972, 0.2364, 0.0190, 0.1625
0.0775, 0.1980, 0.3540, 0.6521, 0.5610, 0.7229, 0.8014, 0.6130, 0.4474, 0.2171, 0.1655, 0.1859
0.1276, 0.0488, 0.4852, 0.5016, 0.5692, 0.8985, 0.6831, 0.8018, 0.5512, 0.2215, 0.2087, 0.3849
0.1221, 0.0050, 0.2073, 0.6187, 0.5720, 0.8501, 0.5531, 0.8030, 0.5108, 0.4015, 0.3434, 0.4790
0.2618, 0.3417, 0.3970, 0.5908, 0.5435, 0.9692, 0.8608, 0.6583, 0.3336, 0.4318, 0.2156, 0.3168
0.3301, 0.0128, 0.4512, 0.3139, 0.4773, 0.8350, 0.7567, 0.6496, 0.4102, 0.3038, 0.3543, 0.3261
0.2653, 0.1766, 0.4889, 0.5970, 0.3420, 0.8614, 0.7170, 0.8536, 0.4100, 0.1432, 0.0765, 0.2548
0.3416, 0.1083, 0.3505, 0.5494, 0.3632, 0.6201, 0.7979, 0.6183, 0.4594, 0.2509, 0.2654, 0.1345
0.2200, 0.0062, 0.4932, 0.5394, 0.3536, 0.6587, 0.7788, 0.8623, 0.4272, 0.4066, 0.1150, 0.4829
0.2809, 0.2500, 0.4723, 0.4076, 0.5694, 0.8712, 0.8085, 0.7287, 0.6336, 0.3793, 0.0586, 0.3450
0.1117, 0.3664, 0.1793, 0.4143, 0.2191, 0.7790, 0.7230, 0.7294, 0.6622, 0.2390, 0.1790, 0.4405
0.2307, 0.2616, 0.2113, 0.4950, 0.4484, 0.6534, 0.5132, 0.5454, 0.4910, 0.1096, 0.2505, 0.1390
0.1004, 0.1706, 0.1463, 0.4082, 0.2084, 0.9940, 0.7446, 0.6513, 0.3106, 0.2559, 0.1810, 0.4724
0.1114, 0.2459, 0.3661, 0.3744, 0.4023, 0.9146, 0.5386, 0.7424, 0.3104, 0.1028, 0.0238, 0.2926
Posted in PyTorch, Transformers | Leave a comment

Anomaly Detection for Mixed Numeric and Categorical Data Using DBSCAN Clustering with C#

Data clustering with the DBSCAN (density-based spatial clustering of applications with noise) algorithm can be easily used to identify anomalous data items. DBSCAN clustering assigns each data item of the source data to a cluster ID, except for data items that are not near other items. Those far-away items are labeled with -1, indicating “noise” — these are anomalous items.

DBSCAN clustering uses Euclidean distance between data items and so the implication is that DBSCAN applies only to strictly numeric data. But I’ve been experimenting with an encoding technique for categorical data that I call one-over-n-hot encoding. For example, if a data column Color has three possible values, then one-over-n-hot encoding is red = (0.3333, 0, 0), blue = (0, 0.3333, 0), green = (0, 0, 0.3333).

For categorical items that have an inherent ordering, I use equal-interval encoding. For example, for Height, short = 0.25, medium = 0.50, tall = 0.75.

I put together a demo using the C# language. I made a 240-item set of synthetic data that looks like:

F  short   24  arkansas  29500  liberal
M  tall    39  delaware  51200  moderate
F  short   63  colorado  75800  conservative
M  medium  36  illinois  44500  moderate
F  short   27  colorado  28600  liberal
. . .

Each line represents a person. The fields are sex, height, age, State, income, political leaning.

I used min-max normalization on the age (min = 18, max = 68) and income (min = $20,300, max = $81,800) columns. I used one-over-n-hot encoding on the sex, State, and political leaning columns. I used equal-interval encoding for the height column.

The resulting normalized and encoded data looks like:

0.5, 0.25, 0.1200, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4200, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.9000, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.3600, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1800, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
. . .

When using DBSCAN clustering, you don’t explicitly specify the number of clusters. Instead, you specify an epsilon value and a min_points value. These implicitly determine the resulting number of clusters. DBSCAN clustering is extremely sensitive to the values of epsilon and min_points. After a lot of trial and error, I used epsilon = 0.4790 and min_points = 24.

The result was three clusters, plus 12 anomalous items in the noise cluster. Each noise item is examined by counting the number of data items that are less than the epsilon value (near neighbors):

number clusters =  3
cluster counts
0 : 116
1 : 89
2 : 23

number noise items = 12

[  17] : F  tall    25  delaware  30000  moderate       :  near neighbors = 1
[  50] : M  tall    36  illinois  53500  conservative   :  near neighbors = 8
[  58] : M  tall    50  illinois  62900  conservative   :  near neighbors = 3
[  75] : F  short   26  colorado  40400  conservative   :  near neighbors = 3
[ 124] : F  tall    29  colorado  37100  conservative   :  near neighbors = 0
[ 169] : M  short   44  delaware  63000  conservative   :  near neighbors = 3
[ 170] : M  tall    65  delaware  81800  conservative   :  near neighbors = 1
[ 175] : F  medium  68  arkansas  72600  liberal        :  near neighbors = 0
[ 226] : M  tall    65  arkansas  76900  conservative   :  near neighbors = 3
[ 227] : M  short   46  colorado  58000  conservative   :  near neighbors = 6
[ 229] : M  short   47  arkansas  63600  conservative   :  near neighbors = 5
[ 232] : M  medium  20  arkansas  28700  liberal        :  near neighbors = 1

In this example, the most anomalous data items are [124] and [175] because they have zero near neighbors. The next most anomalous data items are [17], [170], [232] because they have only one near neighbor. And so on. In a non-demo scenario, the anomalous data items would be examined closely to try and determine why they’re anomalous.

Two other clustering-based anomaly detection techniques are k-means clustering anomaly detection and self-organizing maps clustering anomaly detection. I suspect that the three clustering anomaly techniques give different results, but I haven’t explored this question thoroughly.



I loved the “Freddy the Pig” series of books when I was a young man. Freddy is the lead character in 26 books written between 1927 and 1958 by Walter R. Brooks with illustrations by Kurt Wiese. The books focus on the adventures of a group of animals living on a rural farm. The animals can talk to each other and humans — an anomaly that is remarked upon by humans but never really questioned other than a comment like, “The animals can talk — that’s odd.”

#26 Freddy and the Dragon (1958) – Freddy and his sidekick, Jinx the cat, defeat a gang of criminals, and help a traveling circus.

#3 Freddy the Detective (1932) – Freddy and his friends solve a series of mysterious crimes on the Bean family farm — Simon the rat and his gang are the culprits. The first one of the series I read and so it has a special place in my memory.

#14 Freddy the Magician (1947) – Freddy and his farmyard friends deal with Zingo, a criminal magician.


Demo code. Replace “lt” (less than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols.

using System;
using System.IO;
using System.Collections.Generic;

namespace AnomalyDBSCAN
{
  internal class AnomalyDBSCANProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin anomaly detection" +
        " using DBSCAN clustering ");

      // 1. load data
      Console.WriteLine("\nLoading 240-item" +
        " synthetic People subset ");

      string rf = 
        "..\\..\\..\\Data\\people_raw.txt";
      string[] rawFileArray =
        AnomalyDBSCAN.FileLoad(rf, "#");
      Console.WriteLine("\nFirst three rows" +
        " of raw data: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine("[" + i.ToString().
          PadLeft(3) + "]  " + rawFileArray[i]);

      string fn = "..\\..\\..\\Data\\people_240.txt";
      double[][] X = AnomalyDBSCAN.MatLoad(fn,
        new int[] { 0, 1, 2, 3, 4, 5, 6,
          7, 8, 9, 10 }, ',', "#");
      Console.WriteLine("\nFirst three rows" +
        " of normalized and encoded data: ");
      AnomalyDBSCAN.MatShow(X, 4, 8, 3, true);

      // 2. create AnomalyDBSCAN object and cluster
      double epsilon = 0.479;
      int minPoints = 24;  // 4 noise

      Console.WriteLine("\nSetting epsilon = " +
          epsilon.ToString("F4"));
      Console.WriteLine("Setting minPoints = " +
        minPoints);
      Console.WriteLine("\nClustering with DBSCAN ");
      AnomalyDBSCAN dbscan =
        new AnomalyDBSCAN(epsilon, minPoints);
      int[] clustering = dbscan.Cluster(X);
      Console.WriteLine("Done ");

      // Console.WriteLine("\nClustering results: ");
      // AnomalyDBSCAN.VecShow(clustering, 4);

      Console.WriteLine("\nAnalyzing");
      dbscan.Analyze(rawFileArray);

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main

  } // Program

  public class AnomalyDBSCAN
  {
    public double eps;
    public int minPts;
    public double[][] data;  // supplied in cluster()
    public int[] labels;  // supplied in cluster()

    public AnomalyDBSCAN(double eps, int minPts)
    {
      this.eps = eps;
      this.minPts = minPts;
    }

    public void Analyze(string[] rawFileArray)
    {
      // assumes Cluster() has been called so that
      // this.labels[] is computed

      int maxClusterID = -1;
      int numNoise = 0;
      for (int i = 0; i "lt" this.labels.Length; ++i)
      {
        if (this.labels[i] == -1)
        {
          ++numNoise;
        }
        if (this.labels[i] "gt" maxClusterID)
        {
          maxClusterID = this.labels[i];
        }
      }

      int numClusters = maxClusterID + 1;
      Console.WriteLine("\nnumber clusters =  " +
        numClusters);

      int[] clusterCounts = new int[numClusters];
      for (int i = 0; i "lt" this.labels.Length; ++i)
      {
        int clusterID = this.labels[i];
        if (clusterID != -1)
          ++clusterCounts[clusterID];
      }
      Console.WriteLine("\ncluster counts ");
      for (int cid = 0; cid "lt" clusterCounts.Length;
        ++cid)
      {
        Console.WriteLine(cid + " : " +
          clusterCounts[cid]);
      }

      Console.WriteLine("\nnumber noise items = " +
        numNoise + "\n");
      for (int i = 0; i "lt" this.labels.Length; ++i)
      {
        if (this.labels[i] == -1) // noise
        {
          Console.Write("[" + i.ToString().
            PadLeft(4) + "] : " +
            rawFileArray[i].ToString().
            PadRight(46)); // associated raw data

          double[] distances = 
            new double[this.data.Length];
          int countLessThanEpsilon = 0;
          for (int j = 0; j "lt" this.data.Length; ++j)
          {
            distances[j] = 
              AnomalyDBSCAN.EucDistance(this.data[i],
              this.data[j]);
            if (j != i "and" distances[j] "lt" this.eps)
            {
              ++countLessThanEpsilon;
            }
          }
          Console.WriteLine(" :  near neighbors = " +
            countLessThanEpsilon);
        } // noise item
      } // i
    } // Analyze()

    public int[] Cluster(double[][] data)
    {
      this.data = data;  // by reference
      this.labels = new int[this.data.Length];
      for (int i = 0; i "lt" labels.Length; ++i)
        this.labels[i] = -2;  // unprocessed

      int cid = -1;  // offset the start
      for (int i = 0; i "lt" this.data.Length; ++i)
      {
        if (this.labels[i] != -2)  
          continue;  // item has been processed

        List"lt"int"gt" neighbors = this.RegionQuery(i);
        if (neighbors.Count "lt" this.minPts)
        {
          this.labels[i] = -1;  // noise
        }
        else
        {
          ++cid;
          this.Expand(i, neighbors, cid);
        }
      }

      return this.labels;
    }

    private List"lt"int"gt" RegionQuery(int p)
    {
      // List of idxs close to data[p]
      List"lt"int"gt" result = new List"lt"int"gt"();
      for (int i = 0; i "lt" this.data.Length; ++i)
      {
        double dist = EucDistance(this.data[p],
          this.data[i]);
        if (dist "lt" this.eps)
          result.Add(i);
      }
      return result;
    }

    private void Expand(int p, List"lt"int"gt" neighbors,
      int cid)
    {
      this.labels[p] = cid;
      //int i = 0;
      //while(i "lt" neighbors.Count)
      for (int i = 0; i "lt" neighbors.Count; ++i)
      {
        int pn = neighbors[i];
        if (this.labels[pn] == -1)  // noise
          this.labels[pn] = cid;
        else if (this.labels[pn] == -2)  // unprocessed
        {
          this.labels[pn] = cid;
          List"lt"int"gt" newNeighbors = 
            this.RegionQuery(pn);
          // loop is modified!
          if (newNeighbors.Count "gte" this.minPts)
            neighbors.AddRange(newNeighbors); 
        }
        //++i;
      }
    }

    private static double EucDistance(double[] x1,
      double[] x2)
    {
      int dim = x1.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" dim; ++i)
        sum += (x1[i] - x2[i]) * (x1[i] - x2[i]);
      return Math.Sqrt(sum);
    }

    // ------------------------------------------------------

    // misc. public utility functions for convenience
    // MatLoad(), FileLoad, VecLoad(), MatShow(),
    // VecShow(), ListShow()

    // ------------------------------------------------------

    public static double[][] MatLoad(string fn,
      int[] usecols, char sep, string comment)
    {
      // count number of non-comment lines
      int nRows = 0;
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
        if (line.StartsWith(comment) == false)
          ++nRows;
      sr.Close(); ifs.Close();

      // make result matrix
      int nCols = usecols.Length;
      double[][] result = new double[nRows][];
      for (int r = 0; r "lt" nRows; ++r)
        result[r] = new double[nCols];

      line = "";
      string[] tokens = null;
      ifs = new FileStream(fn, FileMode.Open);
      sr = new StreamReader(ifs);

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        tokens = line.Split(sep);
        for (int j = 0; j "lt" nCols; ++j)
        {
          int k = usecols[j];  // into tokens
          result[i][j] = double.Parse(tokens[k]);
        }
        ++i;
      }
      sr.Close(); ifs.Close();
      return result;
    }

    // ------------------------------------------------------

    public static string[] FileLoad(string fn,
      string comment)
    {
      List"lt"string"gt" lst = new List"lt"string"gt"();
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      string line = "";
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment)) continue;
        line = line.Trim();
        lst.Add(line);
      }
      sr.Close(); ifs.Close();
      string[] result = lst.ToArray();
      return result;
    }

    // ------------------------------------------------------

    public static int[] VecLoad(string fn, int usecol,
      string comment)
    {
      char dummySep = ',';
      double[][] tmp = MatLoad(fn, new int[] { usecol },
        dummySep, comment);
      int n = tmp.Length;
      int[] result = new int[n];
      for (int i = 0; i "lt" n; ++i)
        result[i] = (int)tmp[i][0];
      return result;
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] M, int dec,
      int wid, int numRows, bool showIndices)
    {
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" numRows; ++i)
      {
        if (showIndices == true)
        {
          int pad = M.Length.ToString().Length;
          Console.Write("[" + i.ToString().
            PadLeft(pad) + "]");
        }
        for (int j = 0; j "lt" M[0].Length; ++j)
        {
          double v = M[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine("");
      }
      if (numRows "lt" M.Length)
        Console.WriteLine(". . . ");
    }

    // ------------------------------------------------------

    public static void VecShow(int[] vec, int wid)
    {
      int n = vec.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        if (i "gt" 0 "and" i % 20 == 0) Console.WriteLine("");
        Console.Write(vec[i].ToString().PadLeft(wid));
      }
      Console.WriteLine("");
    }

    // ------------------------------------------------------

    public static void VecShow(double[] vec, int decimals,
      int wid)
    {
      int n = vec.Length;
      for (int i = 0; i "lt" n; ++i)
        Console.Write(vec[i].ToString("F" + decimals).
          PadLeft(wid));
      Console.WriteLine("");
    }

    // ------------------------------------------------------

    public static void ListShow(List"lt"int"gt" lst)
    {
      int n = lst.Count;
      for (int i = 0; i "lt" n; ++i)
      {
        Console.Write(lst[i] + " ");
      }
      Console.WriteLine("");
    }

  } // AnomalyDBSCAN

} // ns

Raw data:

# people_raw.txt
#
F  short   24  arkansas  29500  liberal
M  tall    39  delaware  51200  moderate
F  short   63  colorado  75800  conservative
M  medium  36  illinois  44500  moderate
F  short   27  colorado  28600  liberal
F  short   50  colorado  56500  moderate
F  medium  50  illinois  55000  moderate
M  tall    19  delaware  32700  conservative
F  short   22  illinois  27700  moderate
M  tall    39  delaware  47100  liberal
F  short   34  arkansas  39400  moderate
M  medium  22  illinois  33500  conservative
F  medium  35  delaware  35200  liberal
M  tall    33  colorado  46400  moderate
F  short   45  colorado  54100  moderate
F  short   42  illinois  50700  moderate
M  tall    33  colorado  46800  moderate
F  tall    25  delaware  30000  moderate
M  medium  31  colorado  46400  conservative
F  short   27  arkansas  32500  liberal
F  short   48  illinois  54000  moderate
M  tall    64  illinois  71300  liberal
F  medium  61  colorado  72400  conservative
F  short   54  illinois  61000  conservative
F  short   29  arkansas  36300  conservative
F  short   50  delaware  55000  moderate
F  medium  55  illinois  62500  conservative
F  medium  40  illinois  52400  conservative
F  short   22  arkansas  23600  liberal
F  short   68  colorado  78400  conservative
M  tall    60  illinois  71700  liberal
M  tall    34  delaware  46500  moderate
M  medium  25  delaware  37100  conservative
M  short   31  illinois  48900  moderate
F  short   43  delaware  48000  moderate
F  short   58  colorado  65400  liberal
M  tall    55  illinois  60700  liberal
M  tall    43  colorado  51100  moderate
M  tall    43  delaware  53200  moderate
M  medium  21  arkansas  37200  conservative
F  short   55  delaware  64600  conservative
F  short   64  colorado  74800  conservative
M  tall    41  illinois  58800  moderate
F  medium  64  delaware  72700  conservative
M  medium  56  illinois  66600  liberal
F  short   31  delaware  36000  moderate
M  tall    65  delaware  70100  liberal
F  tall    55  illinois  64300  conservative
M  short   25  arkansas  40300  conservative
F  short   46  delaware  51000  moderate
M  tall    36  illinois  53500  conservative
F  short   52  illinois  58100  moderate
F  short   61  delaware  67900  conservative
F  short   57  delaware  65700  conservative
M  tall    46  colorado  52600  moderate
M  tall    62  arkansas  66800  liberal
F  short   55  illinois  62700  conservative
M  medium  22  delaware  27700  moderate
M  tall    50  illinois  62900  conservative
M  tall    32  illinois  41800  moderate
M  short   21  delaware  35600  conservative
F  medium  44  colorado  52000  moderate
F  short   46  illinois  51700  moderate
F  short   62  colorado  69700  conservative
F  short   57  illinois  66400  conservative
M  medium  67  illinois  75800  liberal
F  short   29  arkansas  34300  liberal
F  short   53  illinois  60100  conservative
M  tall    44  arkansas  54800  moderate
F  medium  46  colorado  52300  moderate
M  tall    20  illinois  30100  moderate
M  medium  38  illinois  53500  moderate
F  short   50  colorado  58600  moderate
F  short   33  colorado  42500  moderate
M  tall    33  colorado  39300  moderate
F  short   26  colorado  40400  conservative
F  short   58  arkansas  70700  conservative
F  tall    43  illinois  48000  moderate
M  medium  46  arkansas  64400  conservative
F  short   60  arkansas  71700  conservative
M  tall    42  arkansas  48900  moderate
M  tall    56  delaware  56400  liberal
M  short   62  colorado  66300  liberal
M  short   50  arkansas  64800  moderate
F  short   47  illinois  52000  moderate
M  tall    67  colorado  80400  liberal
M  tall    40  delaware  50400  moderate
F  short   42  colorado  48400  moderate
F  short   64  arkansas  72000  conservative
M  medium  47  arkansas  58700  liberal
F  medium  45  colorado  52800  moderate
M  tall    25  delaware  40900  conservative
F  short   38  arkansas  48400  conservative
F  short   55  delaware  60000  moderate
M  tall    44  arkansas  60600  moderate
F  medium  33  arkansas  41000  moderate
F  short   34  delaware  39000  moderate
F  short   27  colorado  33700  liberal
F  short   32  colorado  40700  moderate
F  tall    42  illinois  47000  moderate
M  short   24  delaware  40300  conservative
F  short   42  colorado  50300  moderate
F  short   25  delaware  28000  liberal
F  short   51  colorado  58000  moderate
M  medium  55  colorado  63500  liberal
F  short   44  arkansas  47800  liberal
M  short   18  arkansas  39800  conservative
M  tall    67  colorado  71600  liberal
F  short   45  delaware  50000  moderate
F  short   48  arkansas  55800  moderate
M  short   25  colorado  39000  moderate
M  tall    67  arkansas  78300  moderate
F  short   37  delaware  42000  moderate
M  short   32  arkansas  42700  moderate
F  short   48  arkansas  57000  moderate
M  tall    66  delaware  75000  liberal
F  tall    61  arkansas  70000  conservative
M  medium  58  delaware  68900  moderate
F  short   19  arkansas  24000  liberal
F  short   38  delaware  43000  moderate
M  medium  27  arkansas  36400  moderate
F  short   42  arkansas  48000  moderate
F  short   60  arkansas  71300  conservative
M  tall    27  delaware  34800  conservative
F  tall    29  colorado  37100  conservative
M  medium  43  arkansas  56700  moderate
F  medium  48  arkansas  56700  moderate
F  medium  27  delaware  29400  liberal
M  tall    44  arkansas  55200  conservative
F  short   23  colorado  26300  liberal
M  tall    36  colorado  53000  liberal
F  short   64  delaware  72500  conservative
F  short   29  delaware  30000  liberal
M  short   33  arkansas  49300  moderate
M  tall    66  colorado  75000  liberal
M  medium  21  delaware  34300  conservative
F  short   27  arkansas  32700  liberal
F  short   29  arkansas  31800  liberal
M  tall    31  arkansas  48600  moderate
F  short   36  delaware  41000  moderate
F  short   49  colorado  55700  moderate
M  short   28  arkansas  38400  conservative
M  medium  43  delaware  56600  moderate
M  medium  46  colorado  58800  moderate
F  short   57  arkansas  69800  conservative
M  short   52  delaware  59400  moderate
M  tall    31  delaware  43500  moderate
M  tall    55  arkansas  62000  liberal
F  short   50  arkansas  56400  moderate
F  short   48  colorado  55900  moderate
M  medium  22  delaware  34500  conservative
F  short   59  delaware  66700  conservative
F  short   34  arkansas  42800  liberal
M  tall    64  arkansas  77200  liberal
F  short   29  delaware  33500  liberal
M  medium  34  colorado  43200  moderate
M  medium  61  arkansas  75000  liberal
F  short   64  delaware  71100  conservative
M  short   29  arkansas  41300  conservative
F  short   63  colorado  70600  conservative
M  medium  29  colorado  40000  conservative
M  tall    51  arkansas  62700  moderate
M  tall    24  delaware  37700  conservative
F  medium  48  colorado  57500  moderate
F  short   18  arkansas  27400  conservative
F  short   18  arkansas  20300  liberal
F  short   33  colorado  38200  liberal
M  medium  20  delaware  34800  conservative
F  short   29  delaware  33000  liberal
M  short   44  delaware  63000  conservative
M  tall    65  delaware  81800  conservative
M  tall    56  arkansas  63700  liberal
M  medium  52  delaware  58400  moderate
M  medium  29  colorado  48600  conservative
M  tall    47  colorado  58900  moderate
F  medium  68  arkansas  72600  liberal
F  short   31  delaware  36000  moderate
F  short   61  colorado  62500  liberal
F  short   19  colorado  21500  liberal
F  tall    38  delaware  43000  moderate
M  tall    26  arkansas  42300  conservative
F  short   61  colorado  67400  conservative
F  short   40  arkansas  46500  moderate
M  medium  49  arkansas  65200  moderate
F  medium  56  arkansas  67500  conservative
M  short   48  colorado  66000  moderate
F  short   52  arkansas  56300  liberal
M  tall    18  arkansas  29800  conservative
M  tall    56  delaware  59300  liberal
M  medium  52  colorado  64400  moderate
M  medium  18  colorado  28600  moderate
M  tall    58  arkansas  66200  liberal
M  tall    39  colorado  55100  moderate
M  tall    46  arkansas  62900  moderate
M  medium  40  colorado  46200  moderate
M  medium  60  arkansas  72700  liberal
F  short   36  colorado  40700  liberal
F  short   44  arkansas  52300  moderate
F  short   28  arkansas  31300  liberal
F  short   54  delaware  62600  conservative
M  medium  51  arkansas  61200  moderate
M  short   32  colorado  46100  moderate
F  short   55  arkansas  62700  conservative
F  short   25  delaware  26200  liberal
F  medium  33  delaware  37300  liberal
M  medium  29  colorado  46200  conservative
F  short   65  arkansas  72700  conservative
M  tall    43  colorado  51400  moderate
M  short   54  colorado  64800  liberal
F  short   61  colorado  72700  conservative
F  short   52  colorado  63600  conservative
F  short   30  colorado  33500  liberal
F  short   29  arkansas  31400  liberal
M  tall    47  delaware  59400  moderate
F  short   39  colorado  47800  moderate
F  short   47  delaware  52000  moderate
M  medium  49  arkansas  58600  moderate
M  tall    63  delaware  67400  liberal
M  medium  30  arkansas  39200  conservative
M  tall    61  delaware  69600  liberal
M  medium  47  delaware  58700  moderate
F  short   30  delaware  34500  liberal
M  medium  51  delaware  58000  moderate
M  medium  24  arkansas  38800  moderate
M  short   49  arkansas  64500  moderate
F  medium  66  delaware  74500  conservative
M  tall    65  arkansas  76900  conservative
M  short   46  colorado  58000  conservative
M  tall    45  delaware  51800  moderate
M  short   47  arkansas  63600  conservative
M  tall    29  arkansas  44800  conservative
M  tall    57  delaware  69300  liberal
M  medium  20  arkansas  28700  liberal
M  medium  35  arkansas  43400  moderate
M  tall    61  delaware  67000  liberal
M  short   31  delaware  37300  moderate
F  short   18  arkansas  20800  liberal
F  medium  26  delaware  29200  liberal
M  medium  28  arkansas  36400  liberal
M  tall    59  delaware  69400  liberal

Normalized and encoded data:

# people_240.txt
#
# sex (M = 0.0, F = 0.5)
# height (short, medium, tall)
# age (min = 18, max = 68)
# State (Arkansas, Colorado, Delaware, Illinois)
# income (min = $20,300, max = $81,800)
# political leaning (conservative, moderate, liberal)
#
0.5, 0.25, 0.1200, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4200, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.9000, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.3600, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1800, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6400, 0.00, 0.25, 0.00, 0.00, 0.5886, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.6400, 0.00, 0.00, 0.00, 0.25, 0.5642, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.0200, 0.00, 0.00, 0.25, 0.00, 0.2016, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.0800, 0.00, 0.00, 0.00, 0.25, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.4200, 0.00, 0.00, 0.25, 0.00, 0.4358, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.3200, 0.25, 0.00, 0.00, 0.00, 0.3106, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0800, 0.00, 0.00, 0.00, 0.25, 0.2146, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.3400, 0.00, 0.00, 0.25, 0.00, 0.2423, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5400, 0.00, 0.25, 0.00, 0.00, 0.5496, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4800, 0.00, 0.00, 0.00, 0.25, 0.4943, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.4309, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.2600, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1800, 0.25, 0.00, 0.00, 0.00, 0.1984, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6000, 0.00, 0.00, 0.00, 0.25, 0.5480, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9200, 0.00, 0.00, 0.00, 0.25, 0.8293, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.8472, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7200, 0.00, 0.00, 0.00, 0.25, 0.6618, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.2602, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.6400, 0.00, 0.00, 0.25, 0.00, 0.5642, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.6862, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.4400, 0.00, 0.00, 0.00, 0.25, 0.5220, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.0800, 0.25, 0.00, 0.00, 0.00, 0.0537, 0.0000, 0.0000, 0.3333
0.5, 0.25, 1.0000, 0.00, 0.25, 0.00, 0.00, 0.9447, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.8400, 0.00, 0.00, 0.00, 0.25, 0.8358, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.3200, 0.00, 0.00, 0.25, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.2600, 0.00, 0.00, 0.00, 0.25, 0.4650, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5000, 0.00, 0.00, 0.25, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8000, 0.00, 0.25, 0.00, 0.00, 0.7333, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.6569, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.5000, 0.00, 0.25, 0.00, 0.00, 0.5008, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.5000, 0.00, 0.00, 0.25, 0.00, 0.5350, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0600, 0.25, 0.00, 0.00, 0.00, 0.2748, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7400, 0.00, 0.00, 0.25, 0.00, 0.7203, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.9200, 0.00, 0.25, 0.00, 0.00, 0.8862, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.4600, 0.00, 0.00, 0.00, 0.25, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.9200, 0.00, 0.00, 0.25, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.7600, 0.00, 0.00, 0.00, 0.25, 0.7528, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9400, 0.00, 0.00, 0.25, 0.00, 0.8098, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.7154, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.1400, 0.25, 0.00, 0.00, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.5600, 0.00, 0.00, 0.25, 0.00, 0.4992, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.3600, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.6800, 0.00, 0.00, 0.00, 0.25, 0.6146, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8600, 0.00, 0.00, 0.25, 0.00, 0.7740, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7800, 0.00, 0.00, 0.25, 0.00, 0.7382, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.5252, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.8800, 0.25, 0.00, 0.00, 0.00, 0.7561, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.7400, 0.00, 0.00, 0.00, 0.25, 0.6894, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.0800, 0.00, 0.00, 0.25, 0.00, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.6400, 0.00, 0.00, 0.00, 0.25, 0.6927, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.2800, 0.00, 0.00, 0.00, 0.25, 0.3496, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.0600, 0.00, 0.00, 0.25, 0.00, 0.2488, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.5200, 0.00, 0.25, 0.00, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5600, 0.00, 0.00, 0.00, 0.25, 0.5106, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8800, 0.00, 0.25, 0.00, 0.00, 0.8033, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7800, 0.00, 0.00, 0.00, 0.25, 0.7496, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.9800, 0.00, 0.00, 0.00, 0.25, 0.9024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.2276, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.7000, 0.00, 0.00, 0.00, 0.25, 0.6472, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.5610, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.0400, 0.00, 0.00, 0.00, 0.25, 0.1593, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.4000, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6400, 0.00, 0.25, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.3610, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.3089, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1600, 0.00, 0.25, 0.00, 0.00, 0.3268, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8000, 0.25, 0.00, 0.00, 0.00, 0.8195, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.5000, 0.00, 0.00, 0.00, 0.25, 0.4504, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.5600, 0.25, 0.00, 0.00, 0.00, 0.7171, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8400, 0.25, 0.00, 0.00, 0.00, 0.8358, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.4800, 0.25, 0.00, 0.00, 0.00, 0.4650, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.7600, 0.00, 0.00, 0.25, 0.00, 0.5870, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.8800, 0.00, 0.25, 0.00, 0.00, 0.7480, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.6400, 0.25, 0.00, 0.00, 0.00, 0.7236, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5800, 0.00, 0.00, 0.00, 0.25, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9800, 0.00, 0.25, 0.00, 0.00, 0.9772, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4400, 0.00, 0.00, 0.25, 0.00, 0.4894, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4800, 0.00, 0.25, 0.00, 0.00, 0.4569, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.9200, 0.25, 0.00, 0.00, 0.00, 0.8407, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.5800, 0.25, 0.00, 0.00, 0.00, 0.6244, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.5400, 0.00, 0.25, 0.00, 0.00, 0.5285, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.3350, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.4000, 0.25, 0.00, 0.00, 0.00, 0.4569, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.7400, 0.00, 0.00, 0.25, 0.00, 0.6455, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.6553, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.3000, 0.25, 0.00, 0.00, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3200, 0.00, 0.00, 0.25, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1800, 0.00, 0.25, 0.00, 0.00, 0.2179, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2800, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.4800, 0.00, 0.00, 0.00, 0.25, 0.4341, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.1200, 0.00, 0.00, 0.25, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.4800, 0.00, 0.25, 0.00, 0.00, 0.4878, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.1252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6600, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.7400, 0.00, 0.25, 0.00, 0.00, 0.7024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.4472, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.3171, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.9800, 0.00, 0.25, 0.00, 0.00, 0.8341, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.5400, 0.00, 0.00, 0.25, 0.00, 0.4829, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6000, 0.25, 0.00, 0.00, 0.00, 0.5772, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.1400, 0.00, 0.25, 0.00, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9800, 0.25, 0.00, 0.00, 0.00, 0.9431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3800, 0.00, 0.00, 0.25, 0.00, 0.3528, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.2800, 0.25, 0.00, 0.00, 0.00, 0.3642, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6000, 0.25, 0.00, 0.00, 0.00, 0.5967, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9600, 0.00, 0.00, 0.25, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.8600, 0.25, 0.00, 0.00, 0.00, 0.8081, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.8000, 0.00, 0.00, 0.25, 0.00, 0.7902, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.0200, 0.25, 0.00, 0.00, 0.00, 0.0602, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.4000, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.1800, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4800, 0.25, 0.00, 0.00, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8400, 0.25, 0.00, 0.00, 0.00, 0.8293, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.1800, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.5000, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.6000, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.1800, 0.00, 0.00, 0.25, 0.00, 0.1480, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.5675, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1000, 0.00, 0.25, 0.00, 0.00, 0.0976, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.3600, 0.00, 0.25, 0.00, 0.00, 0.5317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.9200, 0.00, 0.00, 0.25, 0.00, 0.8488, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2200, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.3000, 0.25, 0.00, 0.00, 0.00, 0.4715, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9600, 0.00, 0.25, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.0600, 0.00, 0.00, 0.25, 0.00, 0.2276, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1800, 0.25, 0.00, 0.00, 0.00, 0.2016, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.1870, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.2600, 0.25, 0.00, 0.00, 0.00, 0.4602, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.3600, 0.00, 0.00, 0.25, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6200, 0.00, 0.25, 0.00, 0.00, 0.5756, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.2000, 0.25, 0.00, 0.00, 0.00, 0.2943, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.5000, 0.00, 0.00, 0.25, 0.00, 0.5902, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.7800, 0.25, 0.00, 0.00, 0.00, 0.8049, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.6800, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.3772, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.7400, 0.25, 0.00, 0.00, 0.00, 0.6780, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.6400, 0.25, 0.00, 0.00, 0.00, 0.5870, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6000, 0.00, 0.25, 0.00, 0.00, 0.5789, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0800, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8200, 0.00, 0.00, 0.25, 0.00, 0.7545, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.3200, 0.25, 0.00, 0.00, 0.00, 0.3659, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.9200, 0.25, 0.00, 0.00, 0.00, 0.9252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.00, 0.00, 0.25, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.3200, 0.00, 0.25, 0.00, 0.00, 0.3724, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.8600, 0.25, 0.00, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.9200, 0.00, 0.00, 0.25, 0.00, 0.8260, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.3415, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.9000, 0.00, 0.25, 0.00, 0.00, 0.8179, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.3203, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.6600, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.1200, 0.00, 0.00, 0.25, 0.00, 0.2829, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.6000, 0.00, 0.25, 0.00, 0.00, 0.6049, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.1154, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.0000, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.3000, 0.00, 0.25, 0.00, 0.00, 0.2911, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.0400, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2200, 0.00, 0.00, 0.25, 0.00, 0.2065, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.5200, 0.00, 0.00, 0.25, 0.00, 0.6943, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.9400, 0.00, 0.00, 0.25, 0.00, 1.0000, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.7600, 0.25, 0.00, 0.00, 0.00, 0.7057, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.6800, 0.00, 0.00, 0.25, 0.00, 0.6195, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.4602, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5800, 0.00, 0.25, 0.00, 0.00, 0.6276, 0.0000, 0.3333, 0.0000
0.5, 0.50, 1.0000, 0.25, 0.00, 0.00, 0.00, 0.8504, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.6862, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.0200, 0.00, 0.25, 0.00, 0.00, 0.0195, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.4000, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.1600, 0.25, 0.00, 0.00, 0.00, 0.3577, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.7659, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.4400, 0.25, 0.00, 0.00, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.6200, 0.25, 0.00, 0.00, 0.00, 0.7301, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.7600, 0.25, 0.00, 0.00, 0.00, 0.7675, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.6000, 0.00, 0.25, 0.00, 0.00, 0.7431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.6800, 0.25, 0.00, 0.00, 0.00, 0.5854, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.1545, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.7600, 0.00, 0.00, 0.25, 0.00, 0.6341, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.6800, 0.00, 0.25, 0.00, 0.00, 0.7171, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.0000, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.8000, 0.25, 0.00, 0.00, 0.00, 0.7463, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.4200, 0.00, 0.25, 0.00, 0.00, 0.5659, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.5600, 0.25, 0.00, 0.00, 0.00, 0.6927, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.4400, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.8400, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.3600, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.5200, 0.25, 0.00, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.2000, 0.25, 0.00, 0.00, 0.00, 0.1789, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.7200, 0.00, 0.00, 0.25, 0.00, 0.6878, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.6600, 0.25, 0.00, 0.00, 0.00, 0.6650, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.2800, 0.00, 0.25, 0.00, 0.00, 0.4195, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.7400, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.1400, 0.00, 0.00, 0.25, 0.00, 0.0959, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.3000, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.2200, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.9400, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5000, 0.00, 0.25, 0.00, 0.00, 0.5057, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.7200, 0.00, 0.25, 0.00, 0.00, 0.7236, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.8600, 0.00, 0.25, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.6800, 0.00, 0.25, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.2400, 0.00, 0.25, 0.00, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.1805, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.5800, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.4200, 0.00, 0.25, 0.00, 0.00, 0.4472, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.5800, 0.00, 0.00, 0.25, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.6200, 0.25, 0.00, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.9000, 0.00, 0.00, 0.25, 0.00, 0.7659, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.2400, 0.25, 0.00, 0.00, 0.00, 0.3073, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.8600, 0.00, 0.00, 0.25, 0.00, 0.8016, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.5800, 0.00, 0.00, 0.25, 0.00, 0.6244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.2400, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.6600, 0.00, 0.00, 0.25, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.1200, 0.25, 0.00, 0.00, 0.00, 0.3008, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.6200, 0.25, 0.00, 0.00, 0.00, 0.7187, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.9600, 0.00, 0.00, 0.25, 0.00, 0.8813, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.9400, 0.25, 0.00, 0.00, 0.00, 0.9203, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.5600, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.5400, 0.00, 0.00, 0.25, 0.00, 0.5122, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.5800, 0.25, 0.00, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.2200, 0.25, 0.00, 0.00, 0.00, 0.3984, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.7800, 0.00, 0.00, 0.25, 0.00, 0.7967, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.0400, 0.25, 0.00, 0.00, 0.00, 0.1366, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.3400, 0.25, 0.00, 0.00, 0.00, 0.3756, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.8600, 0.00, 0.00, 0.25, 0.00, 0.7593, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.2600, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.0000, 0.25, 0.00, 0.00, 0.00, 0.0081, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.1600, 0.00, 0.00, 0.25, 0.00, 0.1447, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.2000, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.8200, 0.00, 0.00, 0.25, 0.00, 0.7984, 0.0000, 0.0000, 0.3333
Posted in Machine Learning | 1 Comment

A Lightweight Five-Card Poker Library Using JavaScript

One evening, I just couldn’t fall asleep. So I decided to implement a lightweight five-card poker library using JavaScript. My library has a Card class, a Hand class, and a SingleDeck class. The three main functions are: 1.) classify a hand (like “FullHouse”), 2.) compare two hands to determine which hand is better, 3.) deal a hand from a deck of 52 cards.

I didn’t implement my poker library starting from nothing — I refactored my existing C# poker library. The C# poker library took many hours to create, but my JavaScript version only took about four hours of work.

There are two ways to create a Card object:

  let c1 = Card.fromInts(14,3); // Ace of spades
  console.log(c1.toString());

  let c2 = Card.fromStr("Td");  // Ten of diamonds
  console.log(c2.toString());

The first pseudo-constructor accepts a rank and a suit as integers/numbers. The rank values are 2 = Two, 3 = Three, . . 10 = Ten, 11 = Jack, 12 = Queen, 13 = King, 14 = Ace. Rank values of 0 and 1 are not used. The suit values are 0 = clubs, 1 = diamonds, 2 = hearts, 3 = spades. The second pseudo-constructor accepts a string like “Td”. Because JavaScript doesn’t allow function/method overloading, to simulate overloading I defined two static methods.

There are three main ways to create a five-card Hand object:

  let h1 = Hand.fromStr("7cTsJc8d9hd");
  console.log(h1.toString());  // 7c8d9hTsJc
  
  let h2 = Hand.fromCards(Card.fromStr("6s"),
    Card.fromStr("Ah"), Card.fromStr("6h"),
    Card.fromStr("Ac"), Card.fromStr("6d"));
  console.log(h2.toString());  // 6d6h6sAcAh
  
  let lst = [];
  lst.push(Card.fromStr("5c")); lst.push(Card.fromStr("5d"));
  lst.push(Card.fromStr("9c")); lst.push(Card.fromStr("9d"));
  lst.push(Card.fromStr("Qh"));
  let h3 = Hand.fromList(lst);
  console.log(h3.toString());  // 5c5d9c9dQh

The first pseudo-constructor accepts an easy-to-interpret string such as “7cTsJc8d9h”. The second pseudo-constructor accepts five individual Card objects. The third pseudo-constructor accepts a List of five Card objects.

Hand objects are sorted from low card (“2c”) to high card (“As”). The sorting makes a hand easier to interpret, and much easier to classify and compare.

There are two methods to classify a Hand object. The getHandTypeStr() method returns one of ten strings: “HighCard”, “OnePair”, “TwoPair” , “ThreeKind” , “Straight”, “Flush” , “FullHouse”, “FourKind”, “StraightFlush”, “RoyalFlush”. The getHandTypeInt() method returns integer 0 (high card) through 9 (royal flush).

  console.log(h1.getHandTypeStr())  // Straight
  console.log(h1.getHandTypeInt().toString())  // 4

  console.log(h2.getHandTypeStr())  // FullHouse
  console.log(h2.getHandTypeInt().toString())  // 6

  console.log(h3.getHandTypeStr())  // TwoPair
  console.log(h3.getHandTypeInt().toString())  // 2

There is a static Hand.compare(h1, h2) function. It returns -1 if h1 is less than h2, returns +1 if h1 is greater than h2, returns 0 if h1 equals h2.

  let cmp1 = Hand.compare(h1, h2);  // -1: Straight  2P
  console.log("\nHand.compare(h2, h3) = ");
  console.log(cmp2.toString());

The SingleDeck class has a dealHand() method and a dealListCards() method. The dealHand() method returns a Hand object containing five Card objects. The dealListCards(n) method return a List/Array of n Card objects.

  d1 = new SingleDeck(1);
  d1.shuffle();
  d1.show();

  h4 = d1.dealHand();
  console.log(h4.toString());

  listCards = d1.dealListCards(38);
  console.log("Deck is now: ");  // 9 cards left
  d1.show();

To shuffle the deck, I implemented a poor man’s random number generator using the decimal part of the sin() function.

The JavaScript poker library can be used in several ways. You can compute the probabilities of different hands using a simulation. You can find the best five-card hand from seven cards. And so on. I’ll post some examples at some point if I run into another sleepless night.



I have loved cards and card games for as long as I can remember. Here are three examples I found on the Internet that I remember using when I was a young man in the 1960s. Left: The Lane company was a leading maker of plastic coated cards in the 1950s and 60s. The oriental theme seemed exotic and mysterious to me. I don’t think Lane is still around. Center: The KEM company was another leading maker of high-quality plastic coated cards. KEM is still in existence. I liked the geometry and colors of this set of two decks. Right: The Fournier company wasn’t as popular as Lane and KEM, but Fournier made some interesting and offbeat cards. Fournier is still in existence too. I think my family’s Fournier deck came from my grandfather on my mother’s side. Fournier is a Spanish company. My grandfather was French and always brought us interesting gifts from Europe.


Demo code. Replace “lt” (less-than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols.

// poker.js
// ES6  node.js

// ----------------------------------------------------------

class Card
{
  constructor()
  {
    // returns dummy Card for fromInts(), fromStr()
    this.rank = -1;  // 2 = Two, . . 14 = Ace
    this.suit = -1;  // 0=clubs, diamonds, hearts, 3=spades
  }

  static fromInts(rnk, sut) {
    let result = new Card();
    result.rank = rnk;
    result.suit = sut;
    return result;
  }

  static fromStr(str) {
    let result = new Card();
    let rnk = str.charAt(0);
    let sut = str.charAt(1);

    if (rnk == 'A') result.rank = 14;
    else if (rnk == 'K') result.rank = 13;
    else if (rnk == 'Q') result.rank = 12;
    else if (rnk == 'J') result.rank = 11;
    else if (rnk == 'T') result.rank = 10;
    else result.rank = parseInt(rnk);
   
    if (sut == 'c') result.suit = 0;
    else if (sut == 'd') result.suit = 1;
    else if (sut == 'h') result.suit = 2;
    else if (sut == 's') result.suit = 3;

    return result;
  }

  toString() {
    let rnk = ""; let sut = "";
    if (this.rank == 10) rnk = "T";
    else if (this.rank == 11) rnk = "J";
    else if (this.rank == 12) rnk = "Q";
    else if (this.rank == 13) rnk = "K";
    else if (this.rank == 14) rnk = "A";
    else rnk = this.rank.toString();

    if (this.suit == 0) sut = "c";
    else if (this.suit == 1) sut = "d";
    else if (this.suit == 2) sut = "h";
    else if (this.suit == 3) sut = "s";

    return rnk + sut;
  }

} // class Card

// ----------------------------------------------------------

class Hand
{
  constructor()
  {
    this.cards = [];  // make dummy 2c, 3c, 4c, 5c, 6c
    for (let i = 0; i "lt" 5; ++i)
      this.cards[i] = Card.fromInts(i+2, 0);
  }

  static fromStr(str) {  // like "Js3h7d7cAd"
    let result = new Hand();  // dummy hand
    result.cards[0] = Card.fromStr(str.substring(0,2));
    result.cards[1] = Card.fromStr(str.substring(2,4));
    result.cards[2] = Card.fromStr(str.substring(4,6));
    result.cards[3] = Card.fromStr(str.substring(6,8));
    result.cards[4] = Card.fromStr(str.substring(8,10));

    // sort the Hand low to high by rank then by suit
    result.cards.sort((a,b) ="gt" a.rank - b.rank || 
      a.suit - b.suit);
    return result;
  }

  static fromCards(c0, c1, c2, c3, c4) {
    let result = new Hand();  // dummy hand
    result.cards[0] = c0;
    result.cards[1] = c1;
    result.cards[2] = c2;
    result.cards[3] = c3;
    result.cards[4] = c4;
    result.cards.sort((a,b) ="gt" a.rank - b.rank || 
      a.suit - b.suit);
    return result;
  }

  static fromList(lst) {
    let result = new Hand();  // dummy hand
    result.cards[0] = lst[0];
    result.cards[1] = lst[1];
    result.cards[2] = lst[2];
    result.cards[3] = lst[3];
    result.cards[4] = lst[4];
    result.cards.sort((a,b) ="gt" a.rank - b.rank || 
      a.suit - b.suit);
    return result;
  }

  toString() {
    let result = "";
    for (let i = 0; i "lt" 5; ++i)
      result += this.cards[i].toString();
    return result;
  }

  // Hand type functions
  // getHandTypeStr(), getHandTypeInt(),
  //
  // isRoyalFlush(), isStraightFlush(), 
  // isFourKind(), isFullHouse(), isFlush(),
  // isStraight(), isThreeKind(), isTwoPair(),
  // isOnePair(), isHighCard()
  //
  // helpers: hasFlush(), hasStraight()

  // --------------------------------------------------------

  getHandTypeStr() {
    if (Hand.isRoyalFlush(this) == true)
      return "RoyalFlush";
    else if (Hand.isStraightFlush(this) == true)
      return "StraightFlush";
    else if (Hand.isFourKind(this) == true)
      return "FourKind";
    else if (Hand.isFullHouse(this) == true)
      return "FullHouse";
    else if (Hand.isFlush(this) == true)
      return "Flush";
    else if (Hand.isStraight(this) == true)
      return "Straight";
    else if (Hand.isThreeKind(this) == true)
      return "ThreeKind";
    else if (Hand.isTwoPair(this) == true)
      return "TwoPair";
    else if (Hand.isOnePair(this) == true)
      return "OnePair";
    else if (Hand.isHighCard(this) == true)
      return "HighCard";
    else
      return "Unknown";
  }

  // --------------------------------------------------------

  getHandTypeInt() {
    if (Hand.isRoyalFlush(this) == true)
      return 9;
    else if (Hand.isStraightFlush(this) == true)
      return 8;
    else if (Hand.isFourKind(this) == true)
      return 7;
    else if (Hand.isFullHouse(this) == true)
      return 6;
    else if (Hand.isFlush(this) == true)
      return 5;
    else if (Hand.isStraight(this) == true)
      return 4;
    else if (Hand.isThreeKind(this) == true)
      return 3;
    else if (Hand.isTwoPair(this) == true)
      return 2;
    else if (Hand.isOnePair(this) == true)
      return 1;
    else if (Hand.isHighCard(this) == true)
      return 0;
    else
      return -1;
  }

  // --------------------------------------------------------

  static hasFlush(h) {
    if ((h.cards[0].suit == h.cards[1].suit) "and"
      (h.cards[1].suit == h.cards[2].suit) "and"
      (h.cards[2].suit == h.cards[3].suit) "and"
      (h.cards[3].suit == h.cards[4].suit))
    return true;

    return false;
  }

  // --------------------------------------------------------

  static hasStraight(h) {
    // check special case of Ace-low straight
    // 2, 3, 4, 5, A when sorted
    if (h.cards[0].rank == 2 "and"
      h.cards[1].rank == 3 "and"
      h.cards[2].rank == 4 "and"
      h.cards[3].rank == 5 "and"
      h.cards[4].rank == 14)
      return true;

    // otherwise, check for 5 consecutive
    if ((h.cards[0].rank == h.cards[1].rank - 1) "and"
      (h.cards[1].rank == h.cards[2].rank - 1) "and"
      (h.cards[2].rank == h.cards[3].rank - 1) "and"
      (h.cards[3].rank == h.cards[4].rank - 1))
      return true;

    return false;
  }

  // --------------------------------------------------------

  static isRoyalFlush(h) {
    if (Hand.hasStraight(h) == true "and" 
      Hand.hasFlush(h) == true "and"
      h.cards[0].rank == 10)
      return true;
    else
      return false;
  }

  // --------------------------------------------------------

  static isStraightFlush(h) {
    if (Hand.hasStraight(h) == true "and"
     Hand.hasFlush(h) == true "and"
     h.cards[0].rank != 10)
     return true;
    else
      return false;
  }

  // --------------------------------------------------------

  static isFourKind(h) {
    // AAAA B or B AAAA if sorted
    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;

    if ((h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[0].rank != h.cards[1].rank))
      return true;

    return false;
  }

  // --------------------------------------------------------

  static isFullHouse(h) {
    // AAA BB or BB AAA if sorted
    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[2].rank != h.cards[3].rank))
      return true;

    // BB AAA
    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[1].rank != h.cards[2].rank))
      return true;

    return false;
  }

  // --------------------------------------------------------

  static isFlush(h) {
    if (Hand.hasFlush(h) == true "and" 
      Hand.hasStraight(h) == false)
      return true; // no StraightFlush or RoyalFlush
    else
      return false;
  }

  // --------------------------------------------------------

  static isStraight(h) {
    if (Hand.hasStraight(h) == true "and" 
      Hand.hasFlush(h) == false) // no SF or RF
      return true;
    else
      return false;
  }

  // --------------------------------------------------------

  static isThreeKind(h) {
    // AAA B C or B AAA C or B C AAA if sorted
    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[2].rank != h.cards[3].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;

    if ((h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[0].rank != h.cards[1].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;

    if ((h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[0].rank != h.cards[1].rank) "and"
      (h.cards[1].rank != h.cards[2].rank))
      return true;

    return false;
  }

  // --------------------------------------------------------

  static isTwoPair(h) {
    // AA BB C or AA C BB or C AA BB if sorted
    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[1].rank != h.cards[2].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;  // AA BB C

    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[1].rank != h.cards[2].rank) "and"
      (h.cards[2].rank != h.cards[3].rank))
      return true;  // AA C BB

    if ((h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[0].rank != h.cards[1].rank) "and"
      (h.cards[2].rank != h.cards[3].rank))
      return true;  // C AA BB

    return false;
  }

  // --------------------------------------------------------

  static isOnePair(h) {
    // AA B C D or B AA C D or B C AA D or B C D AA
    if ((h.cards[0].rank == h.cards[1].rank) "and"
      (h.cards[1].rank != h.cards[2].rank) "and"
      (h.cards[2].rank != h.cards[3].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;  // AA B C D

    if ((h.cards[1].rank == h.cards[2].rank) "and"
      (h.cards[0].rank != h.cards[1].rank) "and"
      (h.cards[2].rank != h.cards[3].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;  // B AA C D

    if ((h.cards[2].rank == h.cards[3].rank) "and"
      (h.cards[0].rank != h.cards[1].rank) "and"
      (h.cards[1].rank != h.cards[2].rank) "and"
      (h.cards[3].rank != h.cards[4].rank))
      return true;  // B C AA D

    if ((h.cards[3].rank == h.cards[4].rank) "and"
      (h.cards[0].rank != h.cards[1].rank) "and"
      (h.cards[1].rank != h.cards[2].rank) "and"
      (h.cards[2].rank != h.cards[3].rank))
      return true;  // B C D AA

    return false;
  }

  // --------------------------------------------------------

  static isHighCard(h) {
    if (Hand.hasFlush(h) == true)
      return false;
    else if (Hand.hasStraight(h) == true)
      return false;
    else  {
      // all remaining have at least one pair
      if ((h.cards[0].rank == h.cards[1].rank) ||
        (h.cards[1].rank == h.cards[2].rank) ||
        (h.cards[2].rank == h.cards[3].rank) ||
        (h.cards[3].rank == h.cards[4].rank))
        return false;
    }

    return true;
  }

  // --------------------------------------------------------

  // Hand comparison methods
  // Hand.compare() calls:
  // breakTieStraightFlush(), breakTieFourKind(),
  // breakTieFullHouse(), breakTieFlush(),
  // breakTieStraight(), breakTieThreeKind(),
  // breakTieTwoPair(), breakTieOnePair(),
  // breakTieHighCard()

  // --------------------------------------------------------

  static compare(h1, h2) {
    // -1 if h1 "lt" h2, +1 if h1 "gt" h2, 0 if h1 == h2

    let h1Idx = h1.getHandTypeInt();  // like 6
    let h2Idx = h2.getHandTypeInt();

    // different hand types - easy
    if (h1Idx "lt" h2Idx)
      return -1;
    else if (h1Idx "gt" h2Idx)
      return +1;
    else // same hand types so break tie
    {
      let h1HandType = h1.getHandTypeStr();
      let h2HandType = h2.getHandTypeStr();

      if (h1HandType != h2HandType)
        console.log("Logic error in Hand.compare() ");

      if (h1HandType == "RoyalFlush")
        return 0; // two Royal Flush always tie
      else if (h1HandType == "StraightFlush")
        return Hand.breakTieStraightFlush(h1, h2);
      else if (h1HandType == "FourKind")
        return Hand.breakTieFourKind(h1, h2);
      else if (h1HandType == "FullHouse")
        return Hand.breakTieFullHouse(h1, h2);
      else if (h1HandType == "Flush")
        return Hand.breakTieFlush(h1, h2);
      else if (h1HandType == "Straight")
        return Hand.breakTieStraight(h1, h2);
      else if (h1HandType == "ThreeKind")
        return Hand.breakTieThreeKind(h1, h2);
      else if (h1HandType == "TwoPair")
        return Hand.breakTieTwoPair(h1, h2);
      else if (h1HandType == "OnePair")
        return Hand.breakTieOnePair(h1, h2);
      else if (h1HandType == "HighCard")
        return Hand.breakTieHighCard(h1, h2);
    }
    return -2;  // error
  }

  // --------------------------------------------------------

  static breakTieStraightFlush(h1, h2) {
    // check special case of Ace-low straight flush
    // check one or two Ace-low hands
    // h1 is Ace - low, h2 not Ace - low. h1 is less
    if ((h1.cards[0].rank == 2 "and"
      h1.cards[4].rank == 14) "and"  // because sorted!
      !(h2.cards[0].rank == 2 "and"
      h2.cards[4].rank == 14))
      return -1;
 
    //  h1 not Ace - low, h2 is Ace - low, h1 is better
    else if (!(h1.cards[0].rank == 2 "and"
      h1.cards[4].rank == 14) "and"
      (h2.cards[0].rank == 2 "and"
      h2.cards[4].rank == 14))
      return +1;
    //  two Ace-low hands
    else if ((h1.cards[0].rank == 2 "and"
      h1.cards[4].rank == 14) "and"  // Ace-low
      (h2.cards[0].rank == 2 "and"
      h2.cards[4].rank == 14))  // Ace-low
      return 0;

    //  no Ace-low straight flush so check high cards
    if (h1.cards[4].rank "lt" h2.cards[4].rank)
      return -1;
    else if (h1.cards[4].rank "gt" h2.cards[4].rank)
      return 1;
    else
      return 0;
  }

  // --------------------------------------------------------

  static breakTieFourKind(h1, h2) {
    // AAAA-B or B-AAAA
    // the off-card is at [0] or at [4] (hand is sorted)
    // find h1 four-card and off-card ranks
    let h1FourRank; let h1OffRank;
    if (h1.cards[0].rank == h1.cards[1].rank) {
      // 1st two cards same so off-rank at [4]
      h1FourRank = h1.cards[0].rank;
      h1OffRank = h1.cards[4].rank;
    }
    else {
      // 1st two cards diff so off-rank at [0]
      h1FourRank = h1.cards[4].rank;
      h1OffRank = h1.cards[0].rank;
    }

    let h2FourRank; let h2OffRank;
    if (h2.cards[0].rank == h2.cards[1].rank) {
      h2FourRank = h2.cards[0].rank;
      h2OffRank = h2.cards[4].rank;
    }
    else {
      h2FourRank = h2.cards[4].rank;
      h2OffRank = h2.cards[0].rank;
    }

    if (h1FourRank "lt" h2FourRank) // like 4K, 4A
      return -1;
    else if (h1FourRank "gt" h2FourRank)
      return +1;
    else { // both hands have same four-kind (mult. decks)
      if (h1OffRank "lt" h2OffRank)
        return -1;  // like 3c 9c9d9h9s "lt" Qd 9c9d9h9s
      else if (h1OffRank "gt" h2OffRank)
        return +1;  // like Jc 4c4d4h4s "gt" 9s 4c4d4h4s
      else if (h1OffRank == h2OffRank)
        return 0;
    }
    console.log("Fatal logic in breakTieFourKind");
  }

  // --------------------------------------------------------

  static breakTieFullHouse(h1, h2) {
    // determine high rank (3 kind) and low rank (2 kind)
    // AAA BB or AA BBB
    // if [1] == [2] 3 kind at [0][1][2]
    // if [1] != [2] 3 kind at [2][3][4]
    let h1ThreeRank; let h1TwoRank;
    if (h1.cards[1].rank == h1.cards[2].rank) {
      // if [1] == [2] 3 kind at [0][1][2]
      h1ThreeRank = h1.cards[0].rank;
      h1TwoRank = h1.cards[4].rank;
    }
    else  {
      // if [1] != [2] 3 kind at [2][3][4]
      h1ThreeRank = h1.cards[4].rank;
      h1TwoRank = h1.cards[0].rank;
    }

    let h2ThreeRank; let h2TwoRank;
    if (h2.cards[1].rank == h2.cards[2].rank) {
      // if [1] == [2] 3 kind at [0][1][2]
      h2ThreeRank = h2.cards[0].rank;
      h2TwoRank = h2.cards[4].rank;
    }
    else {
      // if [1] != [2] 3 kind at [2][3][4]
      h2ThreeRank = h2.cards[4].rank;
      h2TwoRank = h2.cards[0].rank;
    }

    if (h1ThreeRank "lt" h2ThreeRank)
      return -1;
    else if (h1ThreeRank "gt" h2ThreeRank)
      return +1;
    else { // both hands same three-kind (mult. decks)
      if (h1TwoRank "lt" h2TwoRank)
        return -1;  // like 3c3d 9c9d9h "lt" QdQs 9c9d9h
      else if (h1TwoRank "gt" h2TwoRank)
        return +1;  // like 3c3d 9c9d9h "gt" 2d2s 9c9d9h
      else if (h1TwoRank == h2TwoRank)
        return 0;
    }
    console.log("Fatal logic in breakTieFullHouse");
  }

  // --------------------------------------------------------

  static breakTieFlush(h1, h2) {
    // compare rank of high cards
    if (h1.cards[4].rank "lt" h2.cards[4].rank)
      return -1;
    else if (h1.cards[4].rank "gt" h2.cards[4].rank)
      return +1;
    // high cards equal so check at [3]
    else if (h1.cards[3].rank "lt" h2.cards[3].rank)
      return -1;
    else if (h1.cards[3].rank "gt" h2.cards[3].rank)
      return +1;
    // and so on
    else if (h1.cards[2].rank "lt" h2.cards[2].rank)
      return -1;
    else if (h1.cards[2].rank "gt" h2.cards[2].rank)
      return +1;
    //
    else if (h1.cards[1].rank "lt" h2.cards[1].rank)
      return -1;
    else if (h1.cards[1].rank "gt" h2.cards[1].rank)
      return +1;
    //
    else if (h1.cards[0].rank "lt" h2.cards[0].rank)
      return -1;
    else if (h1.cards[0].rank "gt" h2.cards[0].rank)
      return +1;
    //
    else
      return 0; // all ranks the same!
  }

  // --------------------------------------------------------

  static breakTieStraight(h1, h2) {
    // both hands are straights but one could be Ace-low
    // check special case of one or two Ace-low hands
    // h1 is Ace-low, h2 not Ace-low. h1 is less
    if ((h1.cards[0].rank == 2 "and"  // Ace-low (sorted!)
      h1.cards[4].rank == 14) "and"
      !(h2.cards[0].rank == 2 "and"
      h2.cards[4].rank == 14))
      return -1;
    // h1 not Ace-low, h2 is Ace-low, h1 is better
    else if (!(h1.cards[0].rank == 2 "and"
      h1.cards[4].rank == 14) "and"
      (h2.cards[0].rank == 2 "and"
      h2.cards[4].rank == 14))
      return +1;
    // two Ace-low hands
    else if ((h1.cards[0].rank == 2 "and"
      h1.cards[4].rank == 14) "and"
      (h2.cards[0].rank == 2 "and"
      h2.cards[4].rank == 14))
      return 0;

    // no Ace-low hands so just check high card
    if (h1.cards[4].rank "lt" h2.cards[4].rank)
      return -1;
    else if (h1.cards[4].rank "gt" h2.cards[4].rank)
      return +1;
    else if (h1.cards[4].rank == h2.cards[4].rank)
      return 0;
    else
      console.log("Fatal logic in breakTieStraight");
  }

  // --------------------------------------------------------

  static breakTieThreeKind(h1, h2) {
    // assumes multiple decks possible
    // (TTT L H) or (L TTT H) or (L H TTT)
    let h1ThreeRank = 0; let h1LowRank = 0;
    let h1HighRank = 0;
    if (h1.cards[0].rank == h1.cards[1].rank "and"
      h1.cards[1].rank == h1.cards[2].rank) {
      h1ThreeRank = h1.cards[0].rank;
      h1LowRank = h1.cards[3].rank;
      h1HighRank = h1.cards[4].rank;
    }
    else if (h1.cards[1].rank == h1.cards[2].rank "and"
      h1.cards[2].rank == h1.cards[3].rank) {
      h1LowRank = h1.cards[0].rank;
      h1ThreeRank = h1.cards[1].rank;
      h1HighRank = h1.cards[4].rank;
    }
    else if (h1.cards[2].rank == h1.cards[3].rank "and"
      h1.cards[3].rank == h1.cards[4].rank) {
      h1LowRank = h1.cards[0].rank;
      h1HighRank = h1.cards[1].rank;
      h1ThreeRank = h1.cards[4].rank;
    }

    let h2ThreeRank = 0; let h2LowRank = 0;
    let h2HighRank = 0;
    if (h2.cards[0].rank == h2.cards[1].rank "and"
      h2.cards[1].rank == h2.cards[2].rank) {
      h2ThreeRank = h2.cards[0].rank;
      h2LowRank = h2.cards[3].rank;
      h2HighRank = h2.cards[4].rank;
    }
    else if (h2.cards[1].rank == h2.cards[2].rank "and"
      h2.cards[2].rank == h2.cards[3].rank) {
      h2LowRank = h2.cards[0].rank;
      h2ThreeRank = h2.cards[1].rank;
      h2HighRank = h2.cards[4].rank;
    }
    else if (h2.cards[2].rank == h2.cards[3].rank "and"
      h2.cards[3].rank == h2.cards[4].rank) {
      h2LowRank = h2.cards[0].rank;
      h2HighRank = h2.cards[1].rank;
      h2ThreeRank = h2.cards[4].rank;
    }

    if (h1ThreeRank "lt" h2ThreeRank)
      return -1;
    else if (h1ThreeRank "gt" h2ThreeRank)
      return +1;
    // both hands three-kind same (mult. decks)
    else if (h1HighRank "lt" h2HighRank)
      return -1;
    else if (h1HighRank "gt" h2HighRank)
      return +1;
    //
    else if (h1LowRank "lt" h2LowRank)
      return -1;
    else if (h1LowRank "gt" h2LowRank)
      return +1;
    //
    else // wow!
      return 0;
  }

  // --------------------------------------------------------

  static breakTieTwoPair(h1, h2) {
    // (LL X HH) or (LL HH X) or (X LL HH)
    let h1LowRank = 0; let h1HighRank = 0;
    let h1OffRank = 0;
    if (h1.cards[0].rank == h1.cards[1].rank "and"
      h1.cards[3].rank == h1.cards[4].rank) {
      // (LL X HH)
      h1LowRank = h1.cards[0].rank;
      h1HighRank = h1.cards[4].rank;
      h1OffRank = h1.cards[2].rank;
    }
    else if (h1.cards[0].rank == h1.cards[1].rank "and"
      h1.cards[2].rank == h1.cards[3].rank) {
      // (LL HH X)
      h1LowRank = h1.cards[0].rank;
      h1HighRank = h1.cards[2].rank;
      h1OffRank = h1.cards[4].rank;
    }
    else if (h1.cards[1].rank == h1.cards[2].rank "and"
      h1.cards[3].rank == h1.cards[4].rank) {
      // (X LL HH)
      h1LowRank = h1.cards[1].rank;
      h1HighRank = h1.cards[3].rank;
      h1OffRank = h1.cards[0].rank;
    }

    let h2LowRank = 0; let h2HighRank = 0;
    let h2OffRank = 0;
    if (h2.cards[0].rank == h2.cards[1].rank "and"
      h2.cards[3].rank == h2.cards[4].rank) {
      // (LL X HH)
      h2LowRank = h2.cards[0].rank;
      h2HighRank = h2.cards[4].rank;
      h2OffRank = h2.cards[2].rank;
    }
    else if (h2.cards[0].rank == h2.cards[1].rank "and"
      h2.cards[2].rank == h2.cards[3].rank) {
      // (LL HH X)
      h2LowRank = h2.cards[0].rank;
      h2HighRank = h2.cards[2].rank;
      h2OffRank = h2.cards[4].rank;
    }
    else if (h2.cards[1].rank == h2.cards[2].rank "and"
      h2.cards[3].rank == h2.cards[4].rank) {
      // (X LL HH)
      h2LowRank = h2.cards[1].rank;
      h2HighRank = h2.cards[3].rank;
      h2OffRank = h2.cards[0].rank;
    }

    if (h1HighRank "lt" h2HighRank)
      return -1;
    else if (h1HighRank "gt" h2HighRank)
      return +1;
    else if (h1LowRank "lt" h2LowRank)
      return -1;
    else if (h1LowRank "gt" h2LowRank)
      return +1;
    else if (h1OffRank "lt" h2OffRank)
      return -1;
    else if (h1OffRank "gt" h2OffRank)
      return +1;
    else
      return 0;
  }

  // --------------------------------------------------------

  static breakTieOnePair(h1, h2) {
    // (PP L M H) or (L PP M H)
    // or (L M PP H) or (L M H PP)
    let h1PairRank = 0; let h1LowRank = 0;
    let h1MediumRank = 0; let h1HighRank = 0;
    if (h1.cards[0].rank == h1.cards[1].rank) {
      // (PP L M H)
      h1PairRank = h1.cards[0].rank;
      h1LowRank = h1.cards[2].rank;
      h1MediumRank = h1.cards[3].rank;
      h1HighRank = h1.cards[4].rank;
    }
    else if (h1.cards[1].rank == h1.cards[2].rank) {
      // (L PP M H)
      h1PairRank = h1.cards[1].rank;
      h1LowRank = h1.cards[0].rank;
      h1MediumRank = h1.cards[3].rank;
      h1HighRank = h1.cards[4].rank;
    }
    else if (h1.cards[2].rank == h1.cards[3].rank) {
      // (L M PP H)
      h1PairRank = h1.cards[2].rank;
      h1LowRank = h1.cards[0].rank;
      h1MediumRank = h1.cards[1].rank;
      h1HighRank = h1.cards[4].rank;
    }
    else if (h1.cards[3].rank == h1.cards[4].rank) {
      // (L M H PP)
      h1PairRank = h1.cards[4].rank;
      h1LowRank = h1.cards[0].rank;
      h1MediumRank = h1.cards[1].rank;
      h1HighRank = h1.cards[2].rank;
    }

    let h2PairRank = 0; let h2LowRank = 0;
    let h2MediumRank = 0; let h2HighRank = 0;
    if (h2.cards[0].rank == h2.cards[1].rank) {
      // (PP L M H)
      h2PairRank = h2.cards[0].rank;
      h2LowRank = h2.cards[2].rank;
      h2MediumRank = h2.cards[3].rank;
      h2HighRank = h2.cards[4].rank;
    }
    else if (h2.cards[1].rank == h2.cards[2].rank) {
      // (L PP M H)
      h2PairRank = h2.cards[1].rank;
      h2LowRank = h2.cards[0].rank;
      h2MediumRank = h2.cards[3].rank;
      h2HighRank = h2.cards[4].rank;
    }
    else if (h2.cards[2].rank == h2.cards[3].rank) {
      // (L M PP H)
      h2PairRank = h2.cards[2].rank;
      h2LowRank = h2.cards[0].rank;
      h2MediumRank = h2.cards[1].rank;
      h2HighRank = h2.cards[4].rank;
    }
    else if (h2.cards[3].rank == h2.cards[4].rank) {
      // (L M H PP)
      h2PairRank = h2.cards[4].rank;
      h2LowRank = h2.cards[0].rank;
      h2MediumRank = h2.cards[1].rank;
      h2HighRank = h2.cards[2].rank;
    }

    if (h1PairRank "lt" h2PairRank)
      return -1;
    else if (h1PairRank "gt" h2PairRank)
      return +1;
    //
    else if (h1HighRank "lt" h2HighRank)
      return -1;
    else if (h1HighRank "gt" h2HighRank)
      return +1;
    //
    else if (h1MediumRank "lt" h2MediumRank)
      return -1;
    else if (h1MediumRank "gt" h2MediumRank)
      return +1;
    //
    else if (h1LowRank "lt" h2LowRank)
      return -1;
    else if (h1LowRank "gt" h2LowRank)
      return +1;
    //
    else
      return 0;
  }

  // --------------------------------------------------------

  static breakTieHighCard(h1, h2) {
    if (h1.cards[4].rank "lt" h2.cards[4].rank)
      return -1;
    else if (h1.cards[4].rank "gt" h2.cards[4].rank)
      return +1;
    //
    else if (h1.cards[3].rank "lt" h2.cards[3].rank)
      return -1;
    else if (h1.cards[3].rank "gt" h2.cards[3].rank)
      return +1;
    //
    else if (h1.cards[2].rank "lt" h2.cards[2].rank)
      return -1;
    else if (h1.cards[2].rank "gt" h2.cards[2].rank)
      return +1;
    //
    else if (h1.cards[1].rank "lt" h2.cards[1].rank)
      return -1;
    else if (h1.cards[1].rank "gt" h2.cards[1].rank)
      return +1;
    //
    else if (h1.cards[0].rank "lt" h2.cards[0].rank)
      return -1;
    else if (h1.cards[0].rank "gt" h2.cards[0].rank)
      return +1;
    //
    else
      return 0;
  }

  // --------------------------------------------------------

} // class Hand

// ----------------------------------------------------------

class SingleDeck
{
  constructor(seed)
  {
    this.deck = [];
    this.seed = seed + 0.5;  // avoid 0
    this.currCardIdx = 0;
    for (let rnk = 2; rnk "lt" 15; ++rnk) {
      for (let sut = 0; sut "lt" 4; ++sut) {
        let c = Card.fromInts(rnk, sut);
        this.deck.push(c);
      }
    }
  }

  shuffle() {
    for (let i = 0; i "lt" 52; ++i) {
      let rix = this.nextInt(i, 52);
      let tmp = this.deck[i];  // Card object
      this.deck[i] = this.deck[rix];
      this.deck[rix] = tmp;
    }
    this.currCardIdx = 0;
  }

  nextInt(lo, hi) {  // poor man's Random
    let x = Math.sin(this.seed) * 1000;
    let z = x - Math.floor(x);  // [0.0,1.0)
    this.seed = z;  // for next call
    return Math.trunc((hi - lo) * z + lo);
  }

  dealHand() {
    // TODO: check if at least 5 cards left in deck
    let lst = [];
    for (let i = 0; i "lt" 5; ++i) {
      let c = this.deck[this.currCardIdx++];
      lst.push(c);
    }
    let h = Hand.fromList(lst);
    return h;
  }

  dealListCards(n) {
   // TODO: check if at least n cards left in deck
    let lst = [];
    for (let i = 0; i "lt" n; ++i) {
      let c = this.deck[this.currCardIdx++];
      lst.push(c);
    }   
    return lst;
  }

  show() {
    let ct = 0;
    for (let i = this.currCardIdx; i "lt" 52; ++i) {
      if (ct "gt" 0 "and" ct % 10 == 0) console.log("");
      process.stdout.write(this.deck[i].toString() + " ");
      ++ct;
    }
    console.log("");
  }

} // class SingleDeck

// ----------------------------------------------------------

function main()
{
  console.log("\nBegin JavaScript poker lib demo ");

  // ----- Card ---------------------------------------------

  let c1 = Card.fromInts(14,3); // Ace of spades
  console.log("\nCard c1 = ");
  console.log(c1.toString());

  let c2 = Card.fromStr("Td");  // Ten of diamonds
  console.log("\nCard c2 = ");
  console.log(c2.toString());

  // ----- Hand ---------------------------------------------

  let h1 = Hand.fromStr("7cTsJc8d9hd");
  console.log("\nHand h1 = ");
  console.log(h1.toString());  // 7c8d9hTsJc
  console.log(h1.getHandTypeStr())  // Straight
  console.log(h1.getHandTypeInt().toString())  // 4

  let h2 = Hand.fromCards(Card.fromStr("6s"),
    Card.fromStr("Ah"), Card.fromStr("6h"),
    Card.fromStr("Ac"), Card.fromStr("6d"));
  console.log("\nHand h2 = ");
  console.log(h2.toString());  // 6d6h6sAcAh
  console.log(h2.getHandTypeStr())  // FullHouse
  console.log(h2.getHandTypeInt().toString())  // 6

  let lst = [];
  lst.push(Card.fromStr("5c")); lst.push(Card.fromStr("5d"));
  lst.push(Card.fromStr("9c")); lst.push(Card.fromStr("9d"));
  lst.push(Card.fromStr("Qh"));
  let h3 = Hand.fromList(lst);
  console.log("\nHand h3 = ");
  console.log(h3.toString());  // 5c5d9c9dQh
  console.log(h3.getHandTypeStr())  // TwoPair
  console.log(h3.getHandTypeInt().toString())  // 2

  // ----- Compare Hands

  let cmp1 = Hand.compare(h1, h2);  // -1: Straight "lt" FH
  console.log("\nHand.compare(h1, h2) = ");
  console.log(cmp1.toString());

  let cmp2 = Hand.compare(h2, h3);  // +1 FH "gt" 2P
  console.log("\nHand.compare(h2, h3) = ");
  console.log(cmp2.toString());

  // ----- Deck ---------------------------------------------

  console.log("\nCreating and shuffling deck ");
  d1 = new SingleDeck(1);
  d1.shuffle();
  d1.show();

  h4 = d1.dealHand();
  console.log("\nDealing Hand from deck: ");
  console.log(h4.toString());

  console.log("\nDealing 38 cards from deck");
  listCards = d1.dealListCards(38);
  console.log("Deck is now: ");
  d1.show();

  console.log("\nEnd demo ");
}

main()
Posted in Poker | Leave a comment

Updating My JavaScript Multi-Class Classification Neural Network

Once or twice a year, I revisit my JavaScript implementations of a neural network. The system has enough complexity that there are dozens of ideas that can be explored.

My latest multi-class classification version makes many small changes from previous versions. The primary change was that I refactored the train() method from a very large single function, to one that calls three helper functions — zeroOutGrads(), accumGrads(y), updateWeights(lrnRate). This change required me to store the hidden node and output node gradients as class matrices and vectors rather than as objects local to the train() method.

For my demo program, I used one of my standard synthetic datasets. The goal is to predict a person’s political leaning from sex, age, State, and income. The 240-item tab-delimited raw data looks like:

F   24   michigan   29500.00   liberal
M   39   oklahoma   51200.00   moderate
F   63   nebraska   75800.00   conservative
M   36   michigan   44500.00   moderate
F   27   nebraska   28600.00   liberal
. . .

I encoded sex as M = -1, F = 1, and State as Michigan = 100, Nebraska = 010, Oklahoma = 001. I used ordinal encoding on politics: conservative = 0, moderate = 1, liberal = 2 (to sync with my PyTorch implementation), and programmatically encoded as conservative = 100, moderate = 010, liberal = 001. I normalized the numeric data. I divided age values by 100, and divided income values by 100,000. The resulting encoded and normalized comma-delimited data looks like:

 1, 0.24, 1, 0, 0, 0.2950, 2
-1, 0.39, 0, 0, 1, 0.5120, 1
 1, 0.63, 0, 1, 0, 0.7580, 0
-1, 0.36, 1, 0, 0, 0.4450, 1
 1, 0.27, 0, 1, 0, 0.2860, 2
. . .

I split the data into a 200-item set of training data and a 40-item set of test data.

My neural architecture was 6-25-3 with tanh() hidden node activation and softmax() output node activation. For training I used a batch size of 10, a learning rate of 0.10, and 10,000 epochs.

The resulting model scored 0.9500 accuracy on the training data (190 out of 200 correct) and 0.7500 accuracy on the test data (30 out of 40 correct). These results are similar to those achieved by a PyTorch neural network and a LightGBM tree-based system.

Accuracy on training data = 0.9500
Accuracy on test data     = 0.7500

Computing confusion matrix
actual 0:    6   4   1
actual 1:    1  12   1
actual 2:    0   3  12

Good fun!



Whenever computer code is refactored, the feel and appearance of the code changes a bit. When the cover art for a novel is refactored, the look and feel of the novel is changed quite a bit. One of my favorite science fiction novels is “Starship Troopers” (1959) by Robert Heinlein. Left: The hardcover 1959 first edition with art by Jerry Robinson. Center: A 1968 softcover edition with art by Paul Lehr. Right: A 2006 e-book edition with cover art by Steve Stone.


Demo code. Very long! Replace “lt” (less than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols. (My lame blog editor often chokes on symbols.)

// people_politics.js
// node.js  ES6

// multi-class one-hot predictors, ordinal targets
// softmax activation, MCEE loss

let U = require("..\\Utils\\utilities_lib.js")
let FS = require("fs")

// ----------------------------------------------------------

class NeuralNet
{
  constructor(numInput, numHidden, numOutput, seed)
  {
    this.rnd = new U.Erratic(seed);  // pseudo-random

    this.ni = numInput; 
    this.nh = numHidden;
    this.no = numOutput;

    this.iNodes = U.vecMake(this.ni, 0.0);
    this.hNodes = U.vecMake(this.nh, 0.0);
    this.oNodes = U.vecMake(this.no, 0.0);

    this.ihWeights = U.matMake(this.ni, this.nh, 0.0);
    this.hoWeights = U.matMake(this.nh, this.no, 0.0);

    this.hBiases = U.vecMake(this.nh, 0.0);
    this.oBiases = U.vecMake(this.no, 0.0);

    this.ihGrads = U.matMake(this.ni, this.nh, 0.0);
    this.hbGrads = U.vecMake(this.nh, 0.0);
    this.hoGrads = U.matMake(this.nh, this.no, 0.0);
    this.obGrads = U.vecMake(this.no, 0.0);

    this.initWeights();
  }

  initWeights()
  {
    let lo = -0.10;
    let hi = 0.10;
    for (let i = 0; i "lt" this.ni; ++i) {
      for (let j = 0; j "lt" this.nh; ++j) {
        this.ihWeights[i][j] = (hi - lo) * this.rnd.next() + lo;
      }
    }

    for (let j = 0; j "lt" this.nh; ++j) {
      for (let k = 0; k "lt" this.no; ++k) {
        this.hoWeights[j][k] = (hi - lo) * this.rnd.next() + lo;
      }
    }
  } 

  // --------------------------------------------------------

  computeOutputs(X)
  {
    let hSums = U.vecMake(this.nh, 0.0);
    let oSums = U.vecMake(this.no, 0.0);
    
    this.iNodes = X;

    for (let j = 0; j "lt" this.nh; ++j) {
      for (let i = 0; i "lt" this.ni; ++i) {
        hSums[j] += this.iNodes[i] * this.ihWeights[i][j];
      }
      hSums[j] += this.hBiases[j];
      this.hNodes[j] = U.hyperTan(hSums[j]);
    }

    for (let k = 0; k "lt" this.no; ++k) {
      for (let j = 0; j "lt" this.nh; ++j) {
        oSums[k] += this.hNodes[j] * this.hoWeights[j][k];
      }
      oSums[k] += this.oBiases[k];
    }

    this.oNodes = U.softmax(oSums);

    let result = [];
    for (let k = 0; k "lt" this.no; ++k) {
      result[k] = this.oNodes[k];
    }
    return result;
  } // eval()

  // --------------------------------------------------------

  setWeights(wts)
  {
    // order: ihWts, hBiases, hoWts, oBiases
    let p = 0;

    for (let i = 0; i "lt" this.ni; ++i) {
      for (let j = 0; j "lt" this.nh; ++j) {
        this.ihWeights[i][j] = wts[p++];
      }
    }

    for (let j = 0; j "lt" this.nh; ++j) {
      this.hBiases[j] = wts[p++];
    }

    for (let j = 0; j "lt" this.nh; ++j) {
      for (let k = 0; k "lt" this.no; ++k) {
        this.hoWeights[j][k] = wts[p++];
      }
    }

    for (let k = 0; k "lt" this.no; ++k) {
      this.oBiases[k] = wts[p++];
    }
  } // setWeights()

  getWeights()
  {
    // order: ihWts, hBiases, hoWts, oBiases
    let numWts = (this.ni * this.nh) + this.nh +
      (this.nh * this.no) + this.no;
    let result = U.vecMake(numWts, 0.0);
    let p = 0;
    for (let i = 0; i "lt" this.ni; ++i) {
      for (let j = 0; j "lt" this.nh; ++j) {
        result[p++] = this.ihWeights[i][j];
      }
    }

    for (let j = 0; j "lt" this.nh; ++j) {
      result[p++] = this.hBiases[j];
    }

    for (let j = 0; j "lt" this.nh; ++j) {
      for (let k = 0; k "lt" this.no; ++k) {
        result[p++] = this.hoWeights[j][k];
      }
    }

    for (let k = 0; k "lt" this.no; ++k) {
      result[p++] = this.oBiases[k];
    }
    return result;
  } // getWeights()

  shuffle(v)
  {
    // Fisher-Yates
    let n = v.length;
    for (let i = 0; i "lt" n; ++i) {
      let r = this.rnd.nextInt(i, n);
      let tmp = v[r];
      v[r] = v[i];
      v[i] = tmp;
    }
  }

  // --------------------------------------------------------
  // helpers for train(): zeroOutGrads(), accumGrads(y),
  //   updateWeights(lrnRate)
  // --------------------------------------------------------

  zeroOutGrads()
  {
    for (let i = 0; i "lt" this.ni; ++i)
      for (let j = 0; j "lt" this.nh; ++j)
        this.ihGrads[i][j] = 0.0;

    for (let j = 0; j "lt" this.nh; ++j)
      this.hbGrads[j] = 0.0;

    for (let j = 0; j "lt" this.nh; ++j)
      for (let k = 0; k "lt" this.no; ++k)
        this.hoGrads[j][k] = 0.0;

    for (let k = 0; k "lt" this.no; ++k)
      this.obGrads[k] = 0.0;
  }

  accumGrads(y)
  {
    // y is target vector
    let oSignals = U.vecMake(this.no, 0.0);
    let hSignals = U.vecMake(this.nh, 0.0);

    // 1. compute output node scratch signals 
    for (let k = 0; k "lt" this.no; ++k) {
      let derivative = 1.0;  // CEE
      // let derivative =
      //  this.oNodes[k] * (1 - this.oNodes[k]); // MSE
      oSignals[k] = derivative *
        (this.oNodes[k] - y[k]);  // CEE
    }

    // 2. accum hidden-to-output gradients 
    for (let j = 0; j "lt" this.nh; ++j)
      for (let k = 0; k "lt" this.no; ++k)
        this.hoGrads[j][k] += oSignals[k] * this.hNodes[j];

    // 3. accum output node bias gradients
    for (let k = 0; k "lt" this.no; ++k)
      this.obGrads[k] += oSignals[k] * 1.0;  // 1.0 dummy 

    // 4. compute hidden node signals
    for (let j = 0; j "lt" this.nh; ++j) {
      let sum = 0.0;
      for (let k = 0; k "lt" this.no; ++k)
        sum += oSignals[k] * this.hoWeights[j][k];

      let derivative =
        (1 - this.hNodes[j]) *
        (1 + this.hNodes[j]);  // assumes tanh
      hSignals[j] = derivative * sum;
    }

    // 5. accum input-to-hidden gradients
    for (let i = 0; i "lt" this.ni; ++i)
      for (let j = 0; j "lt" this.nh; ++j)
        this.ihGrads[i][j] += hSignals[j] * this.iNodes[i];

    // 6. accum hidden node bias gradients
    for (let j = 0; j "lt" this.nh; ++j)
      this.hbGrads[j] += hSignals[j] * 1.0;  // 1.0 dummy
  } // accumGrads
  
  updateWeights(lrnRate)
  {
    // assumes all gradients computed
    // 1. update input-to-hidden weights
    for (let i = 0; i "lt" this.ni; ++i) {
      for (let j = 0; j "lt" this.nh; ++j) {
        let delta = -1.0 * lrnRate * this.ihGrads[i][j];
        this.ihWeights[i][j] += delta;
      }
    }

    // 2. update hidden node biases
    for (let j = 0; j "lt" this.nh; ++j) {
      let delta = -1.0 * lrnRate * this.hbGrads[j];
      this.hBiases[j] += delta;
    }

    // 3. update hidden-to-output weights
    for (let j = 0; j "lt" this.nh; ++j) {
      for (let k = 0; k "lt" this.no; ++k) {
        let delta = -1.0 * lrnRate * this.hoGrads[j][k];
        this.hoWeights[j][k] += delta;
      }
    }

    // 4. update output node biases
    for (let k = 0; k "lt" this.no; ++k) {
      let delta = -1.0 * lrnRate * this.obGrads[k];
      this.oBiases[k] += delta;
    }
  } // updateWeights()

  // --------------------------------------------------------

  train(trainX, trainY, lrnRate, batSize, maxEpochs)
  {
    let n = trainX.length;  // 200
    let batchesPerEpoch = Math.trunc(n / batSize);  // 20
    let freq = Math.trunc(maxEpochs / 10);  // progress
    let indices = U.arange(n);

    // ----------------------------------------------------
    //
    // n = 200; bs = 10
    // batches per epoch = 200 / 10 = 20

    // for epoch = 0; epoch "lt" maxEpochs; ++epoch
    //   for batch = 0; batch "lt" bpe; ++batch
    //     for item = 0; item "lt" bs; ++item
    //       compute output
    //       accum grads
    //     end-item
    //     update weights
    //     zero-out grads
    //   end-batches
    //   shuffle indices
    // end-epochs
    //
    // ----------------------------------------------------

    for (let epoch = 0; epoch "lt" maxEpochs; ++epoch) {
      this.shuffle(indices);
      let ptr = 0;  // points into indices
      for (let batIdx = 0; batIdx "lt" batchesPerEpoch;
        ++batIdx) // 0, 1, . . 19
      {
        for (let i = 0; i "lt" batSize; ++i) { // 0 . . 9
          let ii = indices[ptr++];  // compute output
          let x = trainX[ii];
          let y = trainY[ii];
          this.computeOutputs(x);  // into this.oNodes
          this.accumGrads(y);
        }
        this.updateWeights(lrnRate);
        this.zeroOutGrads(); // prep for next batch
      } // batches

      if (epoch % freq == 0) {
        // let mse = 
        // this.meanSqErr(trainX, trainY).toFixed(4);
        let mcee = 
          this.meanCrossEntErr(trainX, trainY).toFixed(4);
        let acc = this.accuracy(trainX, trainY).toFixed(4);

        let s1 = "epoch: " +
          epoch.toString().padStart(6, ' ');
        let s2 = "   MCEE = " + 
          mcee.toString().padStart(8, ' ');
        let s3 = "   acc = " + acc.toString();

        console.log(s1 + s2 + s3);
      }
    } // epoch
  } // train

  // -------------------------------------------------------- 

  meanCrossEntErr(dataX, dataY)
  {
    let sumCEE = 0.0;  // cross entropy errors
    for (let i = 0; i "lt" dataX.length; ++i) { 
      let X = dataX[i];
      let Y = dataY[i];  // target like (0, 1, 0)
      let oupt = this.computeOutputs(X); 
      let idx = U.argmax(Y);  // find loc of 1 in target
      sumCEE += Math.log(oupt[idx]);
    }
    sumCEE *= -1;
    return sumCEE / dataX.length;
  }

  meanSqErr(dataX, dataY)
  {
    let sumSE = 0.0;
    for (let i = 0; i "lt" dataX.length; ++i) {
      let X = dataX[i];
      let Y = dataY[i];  // target output like (0, 1, 0)
      let oupt = this.eval(X);  // (0.23, 0.66, 0.11)
      for (let k = 0; k "lt" this.no; ++k) {
        let err = Y[k] - oupt[k]  // target - computed
        sumSE += err * err;
      }
    }
    return sumSE / dataX.length;  // consider Root MSE
  } 

  accuracy(dataX, dataY)
  {
    let nc = 0; let nw = 0;
    for (let i = 0; i "lt" dataX.length; ++i) { 
      let X = dataX[i];
      let Y = dataY[i];  // target like (0, 1, 0)
      let oupt = this.computeOutputs(X); 
      let computedIdx = U.argmax(oupt);
      let targetIdx = U.argmax(Y);
      if (computedIdx == targetIdx) {
        ++nc;
      }
      else {
        ++nw;
      }
    }
    return nc / (nc + nw);
  }

  // --------------------------------------------------------

  confusionMatrix(dataX, dataY)
  {
    let n = this.no;
    let result = U.matMake(n, n, 0.0);  // 3x3
    
    for (let i = 0; i "lt" dataX.length; ++i) {
      let X = dataX[i];
      let Y = dataY[i];  // target like (0, 1, 0)
      let oupt = this.computeOutputs(X);  // probs
      let targetK = U.argmax(Y);
      let predK = U.argmax(oupt);
      ++result[targetK][predK];
    }
    return result;
  }

  showConfusion(cm)
  {
    let n = cm.length;
    for (let i = 0; i "lt" n; ++i) {
      process.stdout.write("actual " + 
        i.toString() + ": ");
      for (let j = 0; j "lt" n; ++j) {
        process.stdout.write(cm[i][j].toString().
          padStart(4, " "));
      }
      console.log("");
    }
  }

  // --------------------------------------------------------

  saveWeights(fn)
  {
    let wts = this.getWeights();
    let n = wts.length;
    let s = "";
    for (let i = 0; i "lt" n-1; ++i) {
      s += wts[i].toString() + ",";
    }
    s += wts[n-1];

    FS.writeFileSync(fn, s);
  }

  loadWeights(fn)
  {
    let n = (this.ni * this.nh) + this.nh +
      (this.nh * this.no) + this.no;
    let wts = U.vecMake(n, 0.0);
    let all = FS.readFileSync(fn, "utf8");
    let strVals = all.split(",");
    let nn = strVals.length;
    if (n != nn) {
      throw("Size error in NeuralNet.loadWeights()");
    }
    for (let i = 0; i "lt" n; ++i) {
      wts[i] = parseFloat(strVals[i]);
    }
    this.setWeights(wts);
  }

} // NeuralNet

// ----------------------------------------------------------

function main()
{
  // process.stdout.write("\033[0m");  // reset
  // process.stdout.write("\x1b[1m" + "\x1b[37m");  // white
  console.log("\nBegin JavaScript NN demo ");
  console.log("Politics from sex, age, State, income ");
  console.log("con = 0, mod = 1, lib = 2 ");

  // 1. load data
  // -1  0.29  1 0 0  0.65400  2
  //  1  0.36  0 0 1  0.58300  0
  console.log("\nLoading data into memory ");
  let trainX = U.loadTxt(".\\Data\\people_train.txt", ",",
    [0,1,2,3,4,5], "#");
  let trainY = U.loadTxt(".\\Data\\people_train.txt", ",",
    [6], "#");
  trainY = U.matToOneHot(trainY, 3);
  let testX = U.loadTxt(".\\Data\\people_test.txt", ",",
    [0,1,2,3,4,5], "#");
  let testY = U.loadTxt(".\\Data\\people_test.txt", ",",
    [6], "#");
  testY = U.matToOneHot(testY, 3);

  // 2. create network
  console.log("\nCreating 6-25-3 tanh, softmax CEE NN ");
  let seed = 0;
  let nn = new NeuralNet(6, 25, 3, seed);

  // 3. train network
  let lrnRate = 0.01;
  let maxEpochs = 10000;
  console.log("\nSetting learn rate = 0.01 ");
  console.log("Setting bat size = 10 ");
  // nn.train(trainX, trainY, lrnRate, maxEpochs);
  nn.train(trainX, trainY, lrnRate, 10, maxEpochs);
  console.log("Training complete ");

  // 4. evaluate model
  let trainAcc = nn.accuracy(trainX, trainY);
  let testAcc = nn.accuracy(testX, testY);
  console.log("\nAccuracy on training data = " +
    trainAcc.toFixed(4).toString()); 
  console.log("Accuracy on test data     = " +
    testAcc.toFixed(4).toString());

  // 4b. confusion
  console.log("\nComputing confusion matrix ");
  let cm = nn.confusionMatrix(testX, testY);
  //U.matShow(cm, 0);
  nn.showConfusion(cm);

  // 5. save trained model
  fn = ".\\Models\\people_wts.txt";
  console.log("\nSaving model weights and biases to: ");
  console.log(fn);
  nn.saveWeights(fn);

  // 6. use trained model
  console.log("\nPredict for M 46 Oklahoma $66,400 ");
  let x = [-1, 0.46, 0, 0, 1, 0.6640];
  let predicted = nn.computeOutputs(x);
  // console.log("\nPredicting politics for: ");
  // U.vecShow(x, 4, 12);
  console.log("\nPredicted pseudo-probabilities: ");
  U.vecShow(predicted, 4, 10); 

  //process.stdout.write("\033[0m");  // reset
  console.log("\n\nEnd demo");
}

main()

Code for utility functions:

// utilities_lib.js
// ES6

let FS = require('fs');

// ----------------------------------------------------------

function loadTxt(fn, delimit, usecols, comment) {
  // efficient but mildly complicated
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");  // array of lines

  // count number non-comment lines
  let nRows = 0;
  for (let i = 0; i "lt" lines.length; ++i) {
    if (!lines[i].startsWith(comment))
      ++nRows;
  }
  let nCols = usecols.length;
  let result = matMake(nRows, nCols, 0.0); 
 
  let r = 0;  // into lines
  let i = 0;  // into result[][]
  while (r "lt" lines.length) {
    if (lines[r].startsWith(comment)) {
      ++r;  // next row
    }
    else {
      let tokens = lines[r].split(delimit);
      for (let j = 0; j "lt" nCols; ++j) {
        result[i][j] = parseFloat(tokens[usecols[j]]);
      }
      ++r;
      ++i;
    }
  }

  return result;
}

// ----------------------------------------------------------

function arange(n)
{
  let result = [];
  for (let i = 0; i "lt" n; ++i) {
    result[i] = Math.trunc(i);
  }
  return result;
}

// ----------------------------------------------------------

class Erratic
{
  constructor(seed)
  {
    this.seed = seed + 0.5;  // avoid 0
  }

  next()
  {
    let x = Math.sin(this.seed) * 1000;
    let result = x - Math.floor(x);  // [0.0,1.0)
    this.seed = result;  // for next call
    return result;
  }

  nextInt(lo, hi)
  {
    let x = this.next();
    return Math.trunc((hi - lo) * x + lo);
  }
}

// ----------------------------------------------------------

function vecMake(n, val)
{
  let result = [];
  for (let i = 0; i "lt" n; ++i) {
    result[i] = val;
  }
  return result;
}

function matMake(rows, cols, val)
{
  let result = [];
  for (let i = 0; i "lt" rows; ++i) {
    result[i] = [];
    for (let j = 0; j "lt" cols; ++j) {
      result[i][j] = val;
    }
  }
  return result;
}

function matToOneHot(m, n)
{
  // convert ordinal (0,1,2 . .) to one-hot
  let rows = m.length;
  let cols = m[0].length;
  let result = matMake(rows, n);
  for (let i = 0; i "lt" rows; ++i) {
    let k = Math.trunc(m[i][0]);  // 0,1,2 . .
    result[i] = vecMake(n, 0.0);  // [0.0  0.0  0.0]
    result[i][k] = 1.0;  // [ 0.0  1.0  0.0]
  }

  return result;
}

function matToVec(m)
{
  let r = m.length;
  let c = m[0].length;
  let result = 	vecMake(r*c, 0.0);
  let k = 0;
  for (let i = 0; i "lt" r; ++i) {
    for (let j = 0; j "lt" c; ++j) {
      result[k++] = m[i][j];
    }
  }
  return result;
}

function vecShow(v, dec, len)
{
  for (let i = 0; i "lt" v.length; ++i) {
    if (i != 0 "and" i % len == 0) {
      process.stdout.write("\n");
    }
    if (v[i] "gte" 0.0) {
      process.stdout.write(" ");  // + or - space
    }
    process.stdout.write(v[i].toFixed(dec));
    process.stdout.write("  ");
  }
  process.stdout.write("\n");
}

function vecShow(vec, dec, wid, nl)
{
  for (let i = 0; i "lt" vec.length; ++i) {
    let x = vec[i];
    if (Math.abs(x) "lt" 0.000001) x = 0.0  // avoid -0.00
    let xx = x.toFixed(dec);
    let s = xx.toString().padStart(wid, ' ');
    process.stdout.write(s);
    process.stdout.write(" ");
  }

  if (nl == true)
    process.stdout.write("\n");
}


function matShow(m, dec, wid)
{
  let rows = m.length;
  let cols = m[0].length;
  for (let i = 0; i "lt" rows; ++i) {
    for (let j = 0; j "lt" cols; ++j) {
      if (m[i][j] "gte" 0.0) {
        process.stdout.write(" ");  // + or - space
      }
      process.stdout.write(m[i][j].toFixed(dec));
      process.stdout.write("  ");
    }
    process.stdout.write("\n");
  }
}

function argmax(v)
{
  let result = 0;
  let m = v[0];
  for (let i = 0; i "lt" v.length; ++i) {
    if (v[i] "gt" m) {
      m = v[i];
      result = i;
    }
  }
  return result;
}

function hyperTan(x)
{
  if (x "lt" -10.0) {
    return -1.0;
  }
  else if (x "gt" 10.0) {
    return 1.0;
  }
  else {
    return Math.tanh(x);
  }
}

function logSig(x)
{
  if (x "lt" -10.0) {
    return 0.0;
  }
  else if (x "gt" 10.0) {
    return 1.0;
  }
  else {
    return 1.0 / (1.0 + Math.exp(-x));
  }
}

function vecMax(vec)
{
  let mx = vec[0];
  for (let i = 0; i "lt" vec.length; ++i) {
    if (vec[i] "gt" mx) {
      mx = vec[i];
    }
  }
  return mx;
}

function softmax(vec)
{
  //let m = Math.max(...vec);  // or 'spread' operator
  let m = vecMax(vec);
  let result = [];
  let sum = 0.0;
  for (let i = 0; i "lt" vec.length; ++i) {
    result[i] = Math.exp(vec[i] - m);
    sum += result[i];
  }
  for (let i = 0; i "lt" result.length; ++i) {
    result[i] = result[i] / sum;
  }
  return result;
}

module.exports = {
  vecMake,
  matMake,
  matToOneHot,
  matToVec,
  vecShow,
  matShow,
  argmax,
  loadTxt,
  arange,
  Erratic,
  hyperTan,
  logSig,
  vecMax,
  softmax
};

Training data:

# people_train.txt
# sex (M=-1, F=1)  age  state (michigan, 
# nebraska, oklahoma) income
# politics (consrvative, moderate, liberal)
#
1, 0.24, 1, 0, 0, 0.2950, 2
-1, 0.39, 0, 0, 1, 0.5120, 1
1, 0.63, 0, 1, 0, 0.7580, 0
-1, 0.36, 1, 0, 0, 0.4450, 1
1, 0.27, 0, 1, 0, 0.2860, 2
1, 0.50, 0, 1, 0, 0.5650, 1
1, 0.50, 0, 0, 1, 0.5500, 1
-1, 0.19, 0, 0, 1, 0.3270, 0
1, 0.22, 0, 1, 0, 0.2770, 1
-1, 0.39, 0, 0, 1, 0.4710, 2
1, 0.34, 1, 0, 0, 0.3940, 1
-1, 0.22, 1, 0, 0, 0.3350, 0
1, 0.35, 0, 0, 1, 0.3520, 2
-1, 0.33, 0, 1, 0, 0.4640, 1
1, 0.45, 0, 1, 0, 0.5410, 1
1, 0.42, 0, 1, 0, 0.5070, 1
-1, 0.33, 0, 1, 0, 0.4680, 1
1, 0.25, 0, 0, 1, 0.3000, 1
-1, 0.31, 0, 1, 0, 0.4640, 0
1, 0.27, 1, 0, 0, 0.3250, 2
1, 0.48, 1, 0, 0, 0.5400, 1
-1, 0.64, 0, 1, 0, 0.7130, 2
1, 0.61, 0, 1, 0, 0.7240, 0
1, 0.54, 0, 0, 1, 0.6100, 0
1, 0.29, 1, 0, 0, 0.3630, 0
1, 0.50, 0, 0, 1, 0.5500, 1
1, 0.55, 0, 0, 1, 0.6250, 0
1, 0.40, 1, 0, 0, 0.5240, 0
1, 0.22, 1, 0, 0, 0.2360, 2
1, 0.68, 0, 1, 0, 0.7840, 0
-1, 0.60, 1, 0, 0, 0.7170, 2
-1, 0.34, 0, 0, 1, 0.4650, 1
-1, 0.25, 0, 0, 1, 0.3710, 0
-1, 0.31, 0, 1, 0, 0.4890, 1
1, 0.43, 0, 0, 1, 0.4800, 1
1, 0.58, 0, 1, 0, 0.6540, 2
-1, 0.55, 0, 1, 0, 0.6070, 2
-1, 0.43, 0, 1, 0, 0.5110, 1
-1, 0.43, 0, 0, 1, 0.5320, 1
-1, 0.21, 1, 0, 0, 0.3720, 0
1, 0.55, 0, 0, 1, 0.6460, 0
1, 0.64, 0, 1, 0, 0.7480, 0
-1, 0.41, 1, 0, 0, 0.5880, 1
1, 0.64, 0, 0, 1, 0.7270, 0
-1, 0.56, 0, 0, 1, 0.6660, 2
1, 0.31, 0, 0, 1, 0.3600, 1
-1, 0.65, 0, 0, 1, 0.7010, 2
1, 0.55, 0, 0, 1, 0.6430, 0
-1, 0.25, 1, 0, 0, 0.4030, 0
1, 0.46, 0, 0, 1, 0.5100, 1
-1, 0.36, 1, 0, 0, 0.5350, 0
1, 0.52, 0, 1, 0, 0.5810, 1
1, 0.61, 0, 0, 1, 0.6790, 0
1, 0.57, 0, 0, 1, 0.6570, 0
-1, 0.46, 0, 1, 0, 0.5260, 1
-1, 0.62, 1, 0, 0, 0.6680, 2
1, 0.55, 0, 0, 1, 0.6270, 0
-1, 0.22, 0, 0, 1, 0.2770, 1
-1, 0.50, 1, 0, 0, 0.6290, 0
-1, 0.32, 0, 1, 0, 0.4180, 1
-1, 0.21, 0, 0, 1, 0.3560, 0
1, 0.44, 0, 1, 0, 0.5200, 1
1, 0.46, 0, 1, 0, 0.5170, 1
1, 0.62, 0, 1, 0, 0.6970, 0
1, 0.57, 0, 1, 0, 0.6640, 0
-1, 0.67, 0, 0, 1, 0.7580, 2
1, 0.29, 1, 0, 0, 0.3430, 2
1, 0.53, 1, 0, 0, 0.6010, 0
-1, 0.44, 1, 0, 0, 0.5480, 1
1, 0.46, 0, 1, 0, 0.5230, 1
-1, 0.20, 0, 1, 0, 0.3010, 1
-1, 0.38, 1, 0, 0, 0.5350, 1
1, 0.50, 0, 1, 0, 0.5860, 1
1, 0.33, 0, 1, 0, 0.4250, 1
-1, 0.33, 0, 1, 0, 0.3930, 1
1, 0.26, 0, 1, 0, 0.4040, 0
1, 0.58, 1, 0, 0, 0.7070, 0
1, 0.43, 0, 0, 1, 0.4800, 1
-1, 0.46, 1, 0, 0, 0.6440, 0
1, 0.60, 1, 0, 0, 0.7170, 0
-1, 0.42, 1, 0, 0, 0.4890, 1
-1, 0.56, 0, 0, 1, 0.5640, 2
-1, 0.62, 0, 1, 0, 0.6630, 2
-1, 0.50, 1, 0, 0, 0.6480, 1
1, 0.47, 0, 0, 1, 0.5200, 1
-1, 0.67, 0, 1, 0, 0.8040, 2
-1, 0.40, 0, 0, 1, 0.5040, 1
1, 0.42, 0, 1, 0, 0.4840, 1
1, 0.64, 1, 0, 0, 0.7200, 0
-1, 0.47, 1, 0, 0, 0.5870, 2
1, 0.45, 0, 1, 0, 0.5280, 1
-1, 0.25, 0, 0, 1, 0.4090, 0
1, 0.38, 1, 0, 0, 0.4840, 0
1, 0.55, 0, 0, 1, 0.6000, 1
-1, 0.44, 1, 0, 0, 0.6060, 1
1, 0.33, 1, 0, 0, 0.4100, 1
1, 0.34, 0, 0, 1, 0.3900, 1
1, 0.27, 0, 1, 0, 0.3370, 2
1, 0.32, 0, 1, 0, 0.4070, 1
1, 0.42, 0, 0, 1, 0.4700, 1
-1, 0.24, 0, 0, 1, 0.4030, 0
1, 0.42, 0, 1, 0, 0.5030, 1
1, 0.25, 0, 0, 1, 0.2800, 2
1, 0.51, 0, 1, 0, 0.5800, 1
-1, 0.55, 0, 1, 0, 0.6350, 2
1, 0.44, 1, 0, 0, 0.4780, 2
-1, 0.18, 1, 0, 0, 0.3980, 0
-1, 0.67, 0, 1, 0, 0.7160, 2
1, 0.45, 0, 0, 1, 0.5000, 1
1, 0.48, 1, 0, 0, 0.5580, 1
-1, 0.25, 0, 1, 0, 0.3900, 1
-1, 0.67, 1, 0, 0, 0.7830, 1
1, 0.37, 0, 0, 1, 0.4200, 1
-1, 0.32, 1, 0, 0, 0.4270, 1
1, 0.48, 1, 0, 0, 0.5700, 1
-1, 0.66, 0, 0, 1, 0.7500, 2
1, 0.61, 1, 0, 0, 0.7000, 0
-1, 0.58, 0, 0, 1, 0.6890, 1
1, 0.19, 1, 0, 0, 0.2400, 2
1, 0.38, 0, 0, 1, 0.4300, 1
-1, 0.27, 1, 0, 0, 0.3640, 1
1, 0.42, 1, 0, 0, 0.4800, 1
1, 0.60, 1, 0, 0, 0.7130, 0
-1, 0.27, 0, 0, 1, 0.3480, 0
1, 0.29, 0, 1, 0, 0.3710, 0
-1, 0.43, 1, 0, 0, 0.5670, 1
1, 0.48, 1, 0, 0, 0.5670, 1
1, 0.27, 0, 0, 1, 0.2940, 2
-1, 0.44, 1, 0, 0, 0.5520, 0
1, 0.23, 0, 1, 0, 0.2630, 2
-1, 0.36, 0, 1, 0, 0.5300, 2
1, 0.64, 0, 0, 1, 0.7250, 0
1, 0.29, 0, 0, 1, 0.3000, 2
-1, 0.33, 1, 0, 0, 0.4930, 1
-1, 0.66, 0, 1, 0, 0.7500, 2
-1, 0.21, 0, 0, 1, 0.3430, 0
1, 0.27, 1, 0, 0, 0.3270, 2
1, 0.29, 1, 0, 0, 0.3180, 2
-1, 0.31, 1, 0, 0, 0.4860, 1
1, 0.36, 0, 0, 1, 0.4100, 1
1, 0.49, 0, 1, 0, 0.5570, 1
-1, 0.28, 1, 0, 0, 0.3840, 0
-1, 0.43, 0, 0, 1, 0.5660, 1
-1, 0.46, 0, 1, 0, 0.5880, 1
1, 0.57, 1, 0, 0, 0.6980, 0
-1, 0.52, 0, 0, 1, 0.5940, 1
-1, 0.31, 0, 0, 1, 0.4350, 1
-1, 0.55, 1, 0, 0, 0.6200, 2
1, 0.50, 1, 0, 0, 0.5640, 1
1, 0.48, 0, 1, 0, 0.5590, 1
-1, 0.22, 0, 0, 1, 0.3450, 0
1, 0.59, 0, 0, 1, 0.6670, 0
1, 0.34, 1, 0, 0, 0.4280, 2
-1, 0.64, 1, 0, 0, 0.7720, 2
1, 0.29, 0, 0, 1, 0.3350, 2
-1, 0.34, 0, 1, 0, 0.4320, 1
-1, 0.61, 1, 0, 0, 0.7500, 2
1, 0.64, 0, 0, 1, 0.7110, 0
-1, 0.29, 1, 0, 0, 0.4130, 0
1, 0.63, 0, 1, 0, 0.7060, 0
-1, 0.29, 0, 1, 0, 0.4000, 0
-1, 0.51, 1, 0, 0, 0.6270, 1
-1, 0.24, 0, 0, 1, 0.3770, 0
1, 0.48, 0, 1, 0, 0.5750, 1
1, 0.18, 1, 0, 0, 0.2740, 0
1, 0.18, 1, 0, 0, 0.2030, 2
1, 0.33, 0, 1, 0, 0.3820, 2
-1, 0.20, 0, 0, 1, 0.3480, 0
1, 0.29, 0, 0, 1, 0.3300, 2
-1, 0.44, 0, 0, 1, 0.6300, 0
-1, 0.65, 0, 0, 1, 0.8180, 0
-1, 0.56, 1, 0, 0, 0.6370, 2
-1, 0.52, 0, 0, 1, 0.5840, 1
-1, 0.29, 0, 1, 0, 0.4860, 0
-1, 0.47, 0, 1, 0, 0.5890, 1
1, 0.68, 1, 0, 0, 0.7260, 2
1, 0.31, 0, 0, 1, 0.3600, 1
1, 0.61, 0, 1, 0, 0.6250, 2
1, 0.19, 0, 1, 0, 0.2150, 2
1, 0.38, 0, 0, 1, 0.4300, 1
-1, 0.26, 1, 0, 0, 0.4230, 0
1, 0.61, 0, 1, 0, 0.6740, 0
1, 0.40, 1, 0, 0, 0.4650, 1
-1, 0.49, 1, 0, 0, 0.6520, 1
1, 0.56, 1, 0, 0, 0.6750, 0
-1, 0.48, 0, 1, 0, 0.6600, 1
1, 0.52, 1, 0, 0, 0.5630, 2
-1, 0.18, 1, 0, 0, 0.2980, 0
-1, 0.56, 0, 0, 1, 0.5930, 2
-1, 0.52, 0, 1, 0, 0.6440, 1
-1, 0.18, 0, 1, 0, 0.2860, 1
-1, 0.58, 1, 0, 0, 0.6620, 2
-1, 0.39, 0, 1, 0, 0.5510, 1
-1, 0.46, 1, 0, 0, 0.6290, 1
-1, 0.40, 0, 1, 0, 0.4620, 1
-1, 0.60, 1, 0, 0, 0.7270, 2
1, 0.36, 0, 1, 0, 0.4070, 2
1, 0.44, 1, 0, 0, 0.5230, 1
1, 0.28, 1, 0, 0, 0.3130, 2
1, 0.54, 0, 0, 1, 0.6260, 0

Test data:

# people_test.txt
#
-1, 0.51, 1, 0, 0, 0.6120, 1
-1, 0.32, 0, 1, 0, 0.4610, 1
1, 0.55, 1, 0, 0, 0.6270, 0
1, 0.25, 0, 0, 1, 0.2620, 2
1, 0.33, 0, 0, 1, 0.3730, 2
-1, 0.29, 0, 1, 0, 0.4620, 0
1, 0.65, 1, 0, 0, 0.7270, 0
-1, 0.43, 0, 1, 0, 0.5140, 1
-1, 0.54, 0, 1, 0, 0.6480, 2
1, 0.61, 0, 1, 0, 0.7270, 0
1, 0.52, 0, 1, 0, 0.6360, 0
1, 0.30, 0, 1, 0, 0.3350, 2
1, 0.29, 1, 0, 0, 0.3140, 2
-1, 0.47, 0, 0, 1, 0.5940, 1
1, 0.39, 0, 1, 0, 0.4780, 1
1, 0.47, 0, 0, 1, 0.5200, 1
-1, 0.49, 1, 0, 0, 0.5860, 1
-1, 0.63, 0, 0, 1, 0.6740, 2
-1, 0.30, 1, 0, 0, 0.3920, 0
-1, 0.61, 0, 0, 1, 0.6960, 2
-1, 0.47, 0, 0, 1, 0.5870, 1
1, 0.30, 0, 0, 1, 0.3450, 2
-1, 0.51, 0, 0, 1, 0.5800, 1
-1, 0.24, 1, 0, 0, 0.3880, 1
-1, 0.49, 1, 0, 0, 0.6450, 1
1, 0.66, 0, 0, 1, 0.7450, 0
-1, 0.65, 1, 0, 0, 0.7690, 0
-1, 0.46, 0, 1, 0, 0.5800, 0
-1, 0.45, 0, 0, 1, 0.5180, 1
-1, 0.47, 1, 0, 0, 0.6360, 0
-1, 0.29, 1, 0, 0, 0.4480, 0
-1, 0.57, 0, 0, 1, 0.6930, 2
-1, 0.20, 1, 0, 0, 0.2870, 2
-1, 0.35, 1, 0, 0, 0.4340, 1
-1, 0.61, 0, 0, 1, 0.6700, 2
-1, 0.31, 0, 0, 1, 0.3730, 1
1, 0.18, 1, 0, 0, 0.2080, 2
1, 0.26, 0, 0, 1, 0.2920, 2
-1, 0.28, 1, 0, 0, 0.3640, 2
-1, 0.59, 0, 0, 1, 0.6940, 2
Posted in JavaScript | Leave a comment