Data clustering is the process of grouping data items together so that similar items are in the same group/cluster. For strictly numeric data, the k-means clustering technique is simplest, and the most commonly used. For non-numeric, i.e. categorical data, there are fairly complicated techniques that use entropy or Bayesian probability or categorical utility. But clustering mixed categorical and numeric data is very tricky.
I use a technique for clustering mixed data that I haven’t seen described anywhere. Briefly, for numeric data, I use min-max normalization. For standard nominal categorical data, I encode using one-over-n-hot encoding. For binary categorical data, I use reduced one-over-n-hot encoding (zero-zero-point-five). For ordinal categorical data, I encode using equal-interval encoding. After normalizing and encoding this way, all values will be between 0.0 and 1.0 so that k-means can be used without modification.
The normalization and encoding is best explained using a concrete example. I created a synthtic 240-item dataset that looks like:
F short 24 arkansas 29500 liberal
M tall 39 delaware 51200 moderate
F short 63 colorado 75800 conservative
M medium 36 illinois 44500 moderate
F short 27 colorado 28600 liberal
. . .
Each line represents a person. The fields are sex, height, age, State, income, political leaning.
The encoded and normalized data looks like:
0.5, 0.25, 0.12, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.42, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.90, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.36, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.18, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
. . .
The sex variable is binary categorical so I encode M = 0.0 and F = 0.5.
The height variable is ordinal categorical so I use equal-interval encoding as short = 0.25, medium = 0.50, tall = 0.75.
The age variable is numeric so I use min-max normalization. The min age in the datset is 18 and the max age is 68, so normalized age = (age – 18) / (68 – 18).
There are four possible values for the nominal categorical State variable so I encode them as Arkansas = 0.25 0 0 0, Colorado = 0 0.25 0 0, Delaware = 0 0 0.25 0, Illinois = 0 0 0 0.25.
The income variable is numeric so I use min-max normalization. The min income in the dataset is $20,300 and the max income is $81,800, so normalized income = (income – 20300) / (81800 – 20300).
There are three possible values for the nominal categorical variable so I encode them as conservative = 0.3333 0 0, moderate = 0 0.3333 0, liberal = 0 0 0.3333.
I fed the encoded and normalized data to a C# implementation of k–means clustering. I used k = 3 clusters and got this clustering:
Clustering with k=3 seed=0
Done
Result clustering:
0 1 2 0 0 2 2 0 0 1 0 0 0 1 2 2 . . .
Result WCSS = 49.3195
The seed value controls the initial random cluster assignments. Different seed values should give very similar (but not necessarily identical) results. If different seed values give significantly different results, the k-means technique is not a good choice for the dataset.
The clustering result means item [0] belongs to cluster 0, item[1] belongs to cluster 1, item [2] belongs to cluster 2, item [3] belongs to cluster 0, and so on. The WCSS (within cluster sum of squares) is the value that k-means attempts to minimize, so smaller values are better.
Another way to view the clustering results is by-cluster:
cluster 0 | count = 89 :
0 3 4 7 8 10 11 12 17 18 19 . . .
cluster 1 | count = 77 :
1 9 13 16 21 30 31 36 37 38 42 . . .
cluster 2 | count = 74 :
2 5 6 14 15 20 22 23 25 26 27 . . .
This means data items [0], [3], [4], etc. are in cluster 0, and so on. A third way to view the results is source data by cluster. For cluster 1:
cluster 1:
[ 1] M tall 39 delaware 51200 moderate
[ 9] M tall 39 delaware 47100 liberal
[ 13] M tall 33 colorado 46400 moderate
. . .
So cluster 1 looks like the “tall male mid-30s” cluster. The demo program concludes by displaying the 3 means/centroids for the clusters:
Means:
[ 0] 0.3 0.40 0.19 0.09 0.06 0.09 0.02 0.2568 0.1049 0.1123 0.1161
[ 1] 0.0 0.63 0.66 0.08 0.06 0.08 0.03 0.6758 0.0390 0.1645 0.1299
[ 2] 0.5 0.32 0.69 0.07 0.08 0.05 0.05 0.6542 0.1576 0.1531 0.0225
The data items assigned to cluster 0 average to (0.3, 0.40, 0.19, 0.09, 0.06, 0.09, 0.02, 0.2568, 0.1049, 0.1123, 0.1161). All the data items assigned to cluster 0 are closer to that mean/centroid vector than to the other two means/centroids. And so on.
Compared to specialized techniques for clustering mixed categorical and numeric data (such as k-prototypes clustering) an advantage of the technique described here is that you can use any k-means implementation. For example, I passed the normalized and encoded data to the scikit-learn library KMeans module and got identical results. I’ve listed that Python program at the very bottom of this post.
I’m a big fan of old 1950s science fiction movies. Here’s a cluster of three movies that I like, which feature very slow-moving threats.
Left: In “Caltiki the Immortal Monster” (1959), Caltiki is a big blob monster that lives in ancient Mayan ruins. He moves at a glacial pace, yet somehow manages to trap several archeologists. I watched this at least one hundred times on TV when I was young.
Center: In “From Hell It Came” (1957), native Kimo is falsely accused of murder and is executed. Kimo’s body is placed in a hollow tree stump — that has been exposed to radiation from atomic tests. Bad idea. Tree-Kimo may be the slowest threat in sci fi movie history, but I like this film anyway.
Right: In “The Creeping Unknown” (1955), also known as “The Quatermass Xperiment”, Dr. Quatermass oversees a first-men-into-space effort. Three go up. Only one returns. He’s infected with something and becomes a blob-like creature that threatens to grow until it overwhelms the planet.
Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor often chokes on these symbols).
using System;
using System.IO;
using System.Collections.Generic;
namespace ClusterMixedKMeans
{
internal class ClusterMixedProgram
{
static void Main(string[] args)
{
Console.WriteLine("\nBegin mixed data k-means" +
" using C# ");
string rf =
"..\\..\\..\\Data\\people_raw_space.txt";
string[] rawFileArray = FileLoad(rf, "#");
Console.WriteLine("\nRaw source data: ");
for (int i = 0; i "lt" 4; ++i)
{
Console.Write("[" + i.ToString().PadLeft(3) + "] ");
Console.WriteLine(rawFileArray[i]);
}
Console.WriteLine(" . . . ");
// preprocessed data version
string fn =
"..\\..\\..\\Data\\people_encoded.txt";
double[][] X = MatLoad(fn,
new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 },
',', "#");
// programmatic version
//string rf =
// "..\\..\\..\\Data\\people_raw_space.txt";
//double[][] X = NormAndEncode(rf, ' ', "#");
Console.WriteLine("\nEncoded data: ");
// decimals to display
int[] decs = new int[] { 1, 2,2,2,2,2,2, 4,4,4,4 };
MatShow(X, decs, 4, true);
Console.WriteLine("\nClustering with k=3 seed=0");
KMeans km = new KMeans(X, k:3, seed:0);
// km.trials = X.Length * 5; // set n trials explicit
int[] clustering = km.Cluster();
Console.WriteLine("Done ");
Console.WriteLine("\nResult clustering: ");
VecShow(clustering, 3, 16);
Console.WriteLine("Result WCSS = " +
km.bestWCSS.ToString("F4"));
List"lt"int"gt"[] clusterLists =
ItemsByCluster(clustering, k:3);
Console.WriteLine("\nItem indices by cluster ID: ");
ShowItemIndicesByCluster(clusterLists, 12);
Console.WriteLine("\nSource data by cluster ID: ");
ShowItemsByCluster(clusterLists, rawFileArray, 3);
Console.WriteLine("\nMeans: ");
MatShow(km.bestMeans, decs, nRows:3,
showIndices:true);
Console.WriteLine("\nEnd demo ");
Console.ReadLine();
} // Main
// ------------------------------------------------------
// helper: NormAndEncode() for this data only
// ------------------------------------------------------
static double[][] NormAndEncode(string fn, char delim,
string comment)
{
// specific to this demo data
// F,short,24,arkansas,29500,liberal
// M,tall,39,delaware,51200,moderate
FileStream ifs = new FileStream(fn, FileMode.Open);
StreamReader sr = new StreamReader(ifs);
string line = "";
string[] tokens = null;
double[][] result = new double[240][];
for (int k = 0; k "lt" 240; ++k)
result[k] = new double[11];
int i = 0;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(comment) == true) continue;
line = line.Trim();
tokens = line.Split(delim);
// sex
string sexStr = tokens[0].Trim();
if (sexStr == "M") result[i][0] = 0.0;
else if (sexStr == "F") result[i][0] = 0.5;
// height
string heightStr = tokens[1].Trim();
if (heightStr == "short") result[i][1] = 0.25;
else if (heightStr == "medium") result[i][1] = 0.50;
else if (heightStr == "tall") result[i][1] = 0.75;
// age
double age = double.Parse(tokens[2].Trim());
double ageMin = 18.0;
double ageMax = 68.0;
result[i][2] = (age - ageMin) / (ageMax - ageMin);
// State
string stateStr = tokens[3].Trim();
if (stateStr == "arkansas") result[i][3] = 0.25;
else if (stateStr == "colorado") result[i][4] = 0.25;
else if (stateStr == "delaware") result[i][5] = 0.25;
else if (stateStr == "illinois") result[i][6] = 0.25;
// income
double income = double.Parse(tokens[4]);
double incomeMin = 20300.0;
double incomeMax = 81800.0;
result[i][7] =
(income - incomeMin) / (incomeMax - incomeMin);
// political leaning
string politicsStr = tokens[5].Trim();
if (politicsStr == "conservative")
result[i][8] = 0.3333;
else if (politicsStr == "moderate")
result[i][9] = 0.3333;
else if (politicsStr == "liberal")
result[i][10] = 0.3333;
++i; // next row
}
return result;
}
// ------------------------------------------------------
// helpers specifically for k-means: ItemsByCluster(),
// ShowItemIndicesByCluster(), ShowItemsByCluster()
// ------------------------------------------------------
static List"lt"int"gt"[] ItemsByCluster(int[] clustering,
int k)
{
// this.clustering is like [2, 0, 1, 1, . . ]
List"lt"int"gt"[] result = new List"lt"int"gt"[k];
// array of Lists of int
for (int cid = 0; cid "lt" k; ++cid)
result[cid] = new List"lt"int"gt"();
int n = clustering.Length;
for (int i = 0; i "lt" n; ++i)
{
int clusterID = clustering[i];
result[clusterID].Add(i);
}
return result;
}
// ------------------------------------------------------
static void ShowItemIndicesByCluster(List"lt"int"gt"[]
arr, int nItemsPerCluster)
{
// nItemsPerCluster limits display
for (int cid = 0; cid "lt" arr.Length; ++cid)
{
Console.WriteLine("\ncluster " + cid +
" | count = " + arr[cid].Count + " : ");
if (arr[cid].Count "lt" nItemsPerCluster)
nItemsPerCluster = arr[cid].Count;
for (int i = 0; i "lt" nItemsPerCluster; ++i)
{
Console.Write(arr[cid][i].ToString().
PadLeft(4) + " ");
}
if (nItemsPerCluster "lt" arr[cid].Count)
Console.Write(" . . . ");
Console.WriteLine("");
}
}
// ------------------------------------------------------
static void ShowItemsByCluster(List"lt"int"gt"[] arr,
string[] rawData, int nItemsPerCluster)
{
// nItemsPerCluster limits display
for (int cid = 0; cid "lt" arr.Length; ++cid)
{
Console.WriteLine("\ncluster " + cid + ": ");
if (arr[cid].Count "lt" nItemsPerCluster)
nItemsPerCluster = arr[cid].Count;
for (int i = 0; i "lt" nItemsPerCluster; ++i)
{
int idx = arr[cid][i];
string s = rawData[idx];
Console.Write("[" + idx.ToString().
PadLeft(3) + "] ");
Console.WriteLine(s);
}
if (nItemsPerCluster "lt" arr[cid].Count)
Console.WriteLine(" . . . ");
else Console.WriteLine("");
}
}
// ------------------------------------------------------
// general helpers:
// MatShow(), VecShow(), FileLoad(), MatLoad()
// ------------------------------------------------------
// ------------------------------------------------------
static void MatShow(double[][] m, int[] decs,
int nRows, bool showIndices)
{
// decs[] = number decimals to display for each column
for (int i = 0; i "lt" nRows; ++i)
{
if (showIndices == true)
Console.Write("[" + i.ToString().
PadLeft(3) + "] ");
for (int j = 0; j "lt" m[0].Length; ++j)
{
double v = m[i][j];
Console.Write(v.ToString("F" + decs[j]).
PadLeft(decs[j] + 4));
}
Console.WriteLine("");
}
if (nRows "lt" m.Length)
Console.WriteLine(" . . . ");
}
// ------------------------------------------------------
static void VecShow(int[] vec, int wid, int nItems)
{
if (vec.Length "lt" nItems) nItems = vec.Length;
for (int i = 0; i "lt" nItems; ++i)
{
Console.Write(vec[i].ToString().PadLeft(wid));
}
if (nItems "lt" vec.Length) Console.Write(" . . . ");
Console.WriteLine("");
}
// ------------------------------------------------------
static string[] FileLoad(string fn, string comment)
{
List"lt"string"gt" lst = new List"lt"string"gt"();
FileStream ifs = new FileStream(fn, FileMode.Open);
StreamReader sr = new StreamReader(ifs);
string line = "";
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(comment)) continue;
line = line.Trim();
lst.Add(line);
}
sr.Close(); ifs.Close();
string[] result = lst.ToArray();
return result;
}
// ------------------------------------------------------
static double[][] MatLoad(string fn, int[] usecols,
char sep, string comment)
{
// self-contained
int nRows = 0;
string line = "";
FileStream ifs = new FileStream(fn, FileMode.Open);
StreamReader sr = new StreamReader(ifs);
while ((line = sr.ReadLine()) != null)
if (line.StartsWith(comment) == false)
++nRows;
sr.Close(); ifs.Close(); // could reset fp instead
int nCols = usecols.Length;
double[][] result = new double[nRows][];
for (int r = 0; r "lt" nRows; ++r)
result[r] = new double[nCols];
line = "";
string[] tokens = null;
ifs = new FileStream(fn, FileMode.Open);
sr = new StreamReader(ifs);
int i = 0;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(comment) == true)
continue;
tokens = line.Split(sep);
for (int j = 0; j "lt" nCols; ++j)
{
int k = usecols[j]; // into tokens
result[i][j] = double.Parse(tokens[k]);
}
++i;
}
sr.Close(); ifs.Close();
return result;
}
// ------------------------------------------------------
} // Program
public class KMeans
{
// all members public for easier debugging
public double[][] data;
public int k;
public int N;
public int dim;
public int trials; // to find best
public int maxIter; // inner loop
public Random rnd;
public int[] clustering; // scratch not final
public double[][] means; // scratch not final
public int[] bestClustering;
public double[][] bestMeans; // allocated in Cluster()
public double bestWCSS;
// ------------------------------------------------------
// public methods:
// KMeans(), Cluster()
//
// private methods:
// Initialize(), Shuffle(), SumSquared(), WCSS()
// EucDistance(), ArgMin(), AreEqual(),
// UpdateMeans(), UpdateClustering(), ClusterOnce()
// ------------------------------------------------------
public KMeans(double[][] data, int k, int seed)
{
this.data = data; // by ref
this.k = k; // assumes k is 2 or greater
this.N = data.Length;
this.dim = data[0].Length;
this.trials = N * 5; // for Cluster()
this.maxIter = N * 2; // sanity for ClusterOnce()
this.Initialize(seed); // seed, means, clustering
}
public int[] Cluster()
{
// special case k = 1
if (this.k == 1)
{
// single mean of all data
for (int i = 0; i "lt" this.data.Length; ++i)
for (int j = 0; j "lt" this.dim; ++j)
this.means[0][j] += this.data[i][j];
for (int j = 0; j "lt" this.dim; ++j)
this.means[0][j] /= this.N;
this.bestMeans = Copy(this.means);
// all items belong to cluster 0
for (int i = 0; i "lt" this.N; ++i)
this.clustering[i] = 0;
// WCSS
double wcss = 0.0;
for (int i = 0; i "lt" this.N; ++i)
wcss += SumSquared(this.bestMeans[0],
this.data[i]);
this.bestWCSS = wcss;
return this.clustering;
}
// k = 2 or greater
this.bestWCSS = this.WCSS(); // initial clustering
this.bestClustering = Copy(this.clustering);
this.bestMeans = Copy(this.means);
for (int i = 0; i "lt" this.trials; ++i)
{
this.Initialize(i); // new seed, means, clustering
int[] clustering = this.ClusterOnce();
double wcss = this.WCSS();
if (wcss "lt" this.bestWCSS)
{
this.bestWCSS = wcss;
this.bestClustering = Copy(clustering);
this.bestMeans = Copy(this.means);
}
}
return this.bestClustering;
} // Cluster()
private int[] ClusterOnce()
{
bool ok = true;
int sanityCt = 1;
while (sanityCt "lte" this.maxIter) // N * 2
{
if ((ok = this.UpdateClustering() == false)) break;
if ((ok = this.UpdateMeans() == false)) break;
++sanityCt;
}
// consider warning if sanity "gt" maxIter
return this.clustering;
} // ClusterOnce()
private void Initialize(int seed)
{
this.rnd = new Random(seed);
this.clustering = new int[this.N]; // scratch
this.means = new double[this.k][]; // scratch
for (int i = 0; i "lt" this.k; ++i)
this.means[i] = new double[this.dim];
// initial clustering
// Random Partition (not Forgy or k-means++)
int[] indices = new int[this.N];
for (int i = 0; i "lt" this.N; ++i)
indices[i] = i;
Shuffle(indices);
for (int i = 0; i "lt" this.k; ++i) // first k items
this.clustering[indices[i]] = i;
for (int i = this.k; i "lt" this.N; ++i)
this.clustering[indices[i]] =
this.rnd.Next(0, this.k); // remaining items
this.UpdateMeans();
}
private void Shuffle(int[] indices)
{
// Fisher-Yates mini-algorithm
int n = indices.Length;
for (int i = 0; i "lt" n; ++i)
{
int r = this.rnd.Next(i, n);
int tmp = indices[i];
indices[i] = indices[r];
indices[r] = tmp;
}
}
private static double SumSquared(double[] v1,
double[] v2)
{
// used by EucDistance() and WCSS()
int dim = v1.Length;
double sum = 0.0;
for (int i = 0; i "lt" dim; ++i)
sum += (v1[i] - v2[i]) * (v1[i] - v2[i]);
return sum;
}
private static double EucDistance(double[] item,
double[] mean)
{
double ss = SumSquared(item, mean);
return Math.Sqrt(ss);
}
private static int ArgMin(double[] v)
{
// index of smallest value in v
int dim = v.Length;
int minIdx = 0;
double minVal = v[0];
for (int i = 0; i "lt" v.Length; ++i)
{
if (v[i] "lt" minVal)
{
minVal = v[i];
minIdx = i;
}
}
return minIdx;
}
private static bool AreEqual(int[] a1, int[] a2)
{
// to check if clustering has changed
int dim = a1.Length;
for (int i = 0; i "lt" dim; ++i)
if (a1[i] != a2[i]) return false;
return true;
}
private static int[] Copy(int[] arr)
{
// called by Cluster()
// make a copy of new best clustering
int dim = arr.Length;
int[] result = new int[dim];
for (int i = 0; i "lt" dim; ++i)
result[i] = arr[i];
return result;
}
private static double[][] Copy(double[][] matrix)
{
// make a copy of new best means
int nr = matrix.Length;
int nc = matrix[0].Length;
double[][] result = new double[nr][];
for (int i = 0; i "lt" nr; ++i)
result[i] = new double[nc];
for (int i = 0; i "lt" nr; ++i)
for (int j = 0; j "lt" nc; ++j)
result[i][j] = matrix[i][j];
return result;
}
private bool UpdateMeans()
{
// first, verify no zero-counts
// should never happen
int[] counts = new int[this.k];
for (int i = 0; i "lt" this.N; ++i)
{
int cid = this.clustering[i];
++counts[cid];
}
for (int kk = 0; kk "lt" this.k; ++kk)
{
if (counts[kk] == 0)
throw
new Exception("0-count in UpdateMeans()");
}
// compute proposed new means
for (int kk = 0; kk "lt" this.k; ++kk)
counts[kk] = 0; // reset
double[][] newMeans = new double[this.k][];
for (int i = 0; i "lt" this.k; ++i)
newMeans[i] = new double[this.dim];
for (int i = 0; i "lt" this.N; ++i)
{
int cid = this.clustering[i];
++counts[cid];
for (int j = 0; j "lt" this.dim; ++j)
newMeans[cid][j] += this.data[i][j];
}
for (int kk = 0; kk "lt" this.k; ++kk)
if (counts[kk] == 0)
return false; // bad attempt to update
for (int kk = 0; kk "lt" this.k; ++kk)
for (int j = 0; j "lt" this.dim; ++j)
newMeans[kk][j] /= counts[kk];
// copy new means
for (int kk = 0; kk "lt" this.k; ++kk)
for (int j = 0; j "lt" this.dim; ++j)
this.means[kk][j] = newMeans[kk][j];
return true;
} // UpdateMeans()
private bool UpdateClustering()
{
// first, verify no zero-counts
int[] counts = new int[this.k];
for (int i = 0; i "lt" this.N; ++i)
{
int cid = this.clustering[i];
++counts[cid];
}
// should never happen
for (int kk = 0; kk "lt" this.k; ++kk)
{
if (counts[kk] == 0)
throw new
Exception("0-count in UpdateClustering()");
}
// proposed new clustering
int[] newClustering = new int[this.N];
for (int i = 0; i "lt" this.N; ++i)
newClustering[i] = this.clustering[i];
double[] distances = new double[this.k];
for (int i = 0; i "lt" this.N; ++i)
{
for (int kk = 0; kk "lt" this.k; ++kk)
{
distances[kk] =
EucDistance(this.data[i], this.means[kk]);
int newID = ArgMin(distances);
newClustering[i] = newID;
}
}
if (AreEqual(this.clustering, newClustering) == true)
return false; // no change; short-circuit
// make sure no count went to 0
for (int i = 0; i "lt" this.k; ++i)
counts[i] = 0; // reset
for (int i = 0; i "lt" this.N; ++i)
{
int cid = newClustering[i];
++counts[cid];
}
for (int kk = 0; kk "lt" this.k; ++kk)
if (counts[kk] == 0)
return false; // bad update attempt
// no 0 counts so update
for (int i = 0; i "lt" this.N; ++i)
this.clustering[i] = newClustering[i];
return true;
} // UpdateClustering()
private double WCSS()
{
// within-cluster sum of squares
double sum = 0.0;
for (int i = 0; i "lt" this.N; ++i)
{
int cid = this.clustering[i];
double[] mean = this.means[cid];
double ss = SumSquared(this.data[i], mean);
sum += ss;
}
return sum;
}
} // class KMeans
} // ns
Raw data:
# people_raw_space.txt
# space delimited
#
F short 24 arkansas 29500 liberal
M tall 39 delaware 51200 moderate
F short 63 colorado 75800 conservative
M medium 36 illinois 44500 moderate
F short 27 colorado 28600 liberal
F short 50 colorado 56500 moderate
F medium 50 illinois 55000 moderate
M tall 19 delaware 32700 conservative
F short 22 illinois 27700 moderate
M tall 39 delaware 47100 liberal
F short 34 arkansas 39400 moderate
M medium 22 illinois 33500 conservative
F medium 35 delaware 35200 liberal
M tall 33 colorado 46400 moderate
F short 45 colorado 54100 moderate
F short 42 illinois 50700 moderate
M tall 33 colorado 46800 moderate
F tall 25 delaware 30000 moderate
M medium 31 colorado 46400 conservative
F short 27 arkansas 32500 liberal
F short 48 illinois 54000 moderate
M tall 64 illinois 71300 liberal
F medium 61 colorado 72400 conservative
F short 54 illinois 61000 conservative
F short 29 arkansas 36300 conservative
F short 50 delaware 55000 moderate
F medium 55 illinois 62500 conservative
F medium 40 illinois 52400 conservative
F short 22 arkansas 23600 liberal
F short 68 colorado 78400 conservative
M tall 60 illinois 71700 liberal
M tall 34 delaware 46500 moderate
M medium 25 delaware 37100 conservative
M short 31 illinois 48900 moderate
F short 43 delaware 48000 moderate
F short 58 colorado 65400 liberal
M tall 55 illinois 60700 liberal
M tall 43 colorado 51100 moderate
M tall 43 delaware 53200 moderate
M medium 21 arkansas 37200 conservative
F short 55 delaware 64600 conservative
F short 64 colorado 74800 conservative
M tall 41 illinois 58800 moderate
F medium 64 delaware 72700 conservative
M medium 56 illinois 66600 liberal
F short 31 delaware 36000 moderate
M tall 65 delaware 70100 liberal
F tall 55 illinois 64300 conservative
M short 25 arkansas 40300 conservative
F short 46 delaware 51000 moderate
M tall 36 illinois 53500 conservative
F short 52 illinois 58100 moderate
F short 61 delaware 67900 conservative
F short 57 delaware 65700 conservative
M tall 46 colorado 52600 moderate
M tall 62 arkansas 66800 liberal
F short 55 illinois 62700 conservative
M medium 22 delaware 27700 moderate
M tall 50 illinois 62900 conservative
M tall 32 illinois 41800 moderate
M short 21 delaware 35600 conservative
F medium 44 colorado 52000 moderate
F short 46 illinois 51700 moderate
F short 62 colorado 69700 conservative
F short 57 illinois 66400 conservative
M medium 67 illinois 75800 liberal
F short 29 arkansas 34300 liberal
F short 53 illinois 60100 conservative
M tall 44 arkansas 54800 moderate
F medium 46 colorado 52300 moderate
M tall 20 illinois 30100 moderate
M medium 38 illinois 53500 moderate
F short 50 colorado 58600 moderate
F short 33 colorado 42500 moderate
M tall 33 colorado 39300 moderate
F short 26 colorado 40400 conservative
F short 58 arkansas 70700 conservative
F tall 43 illinois 48000 moderate
M medium 46 arkansas 64400 conservative
F short 60 arkansas 71700 conservative
M tall 42 arkansas 48900 moderate
M tall 56 delaware 56400 liberal
M short 62 colorado 66300 liberal
M short 50 arkansas 64800 moderate
F short 47 illinois 52000 moderate
M tall 67 colorado 80400 liberal
M tall 40 delaware 50400 moderate
F short 42 colorado 48400 moderate
F short 64 arkansas 72000 conservative
M medium 47 arkansas 58700 liberal
F medium 45 colorado 52800 moderate
M tall 25 delaware 40900 conservative
F short 38 arkansas 48400 conservative
F short 55 delaware 60000 moderate
M tall 44 arkansas 60600 moderate
F medium 33 arkansas 41000 moderate
F short 34 delaware 39000 moderate
F short 27 colorado 33700 liberal
F short 32 colorado 40700 moderate
F tall 42 illinois 47000 moderate
M short 24 delaware 40300 conservative
F short 42 colorado 50300 moderate
F short 25 delaware 28000 liberal
F short 51 colorado 58000 moderate
M medium 55 colorado 63500 liberal
F short 44 arkansas 47800 liberal
M short 18 arkansas 39800 conservative
M tall 67 colorado 71600 liberal
F short 45 delaware 50000 moderate
F short 48 arkansas 55800 moderate
M short 25 colorado 39000 moderate
M tall 67 arkansas 78300 moderate
F short 37 delaware 42000 moderate
M short 32 arkansas 42700 moderate
F short 48 arkansas 57000 moderate
M tall 66 delaware 75000 liberal
F tall 61 arkansas 70000 conservative
M medium 58 delaware 68900 moderate
F short 19 arkansas 24000 liberal
F short 38 delaware 43000 moderate
M medium 27 arkansas 36400 moderate
F short 42 arkansas 48000 moderate
F short 60 arkansas 71300 conservative
M tall 27 delaware 34800 conservative
F tall 29 colorado 37100 conservative
M medium 43 arkansas 56700 moderate
F medium 48 arkansas 56700 moderate
F medium 27 delaware 29400 liberal
M tall 44 arkansas 55200 conservative
F short 23 colorado 26300 liberal
M tall 36 colorado 53000 liberal
F short 64 delaware 72500 conservative
F short 29 delaware 30000 liberal
M short 33 arkansas 49300 moderate
M tall 66 colorado 75000 liberal
M medium 21 delaware 34300 conservative
F short 27 arkansas 32700 liberal
F short 29 arkansas 31800 liberal
M tall 31 arkansas 48600 moderate
F short 36 delaware 41000 moderate
F short 49 colorado 55700 moderate
M short 28 arkansas 38400 conservative
M medium 43 delaware 56600 moderate
M medium 46 colorado 58800 moderate
F short 57 arkansas 69800 conservative
M short 52 delaware 59400 moderate
M tall 31 delaware 43500 moderate
M tall 55 arkansas 62000 liberal
F short 50 arkansas 56400 moderate
F short 48 colorado 55900 moderate
M medium 22 delaware 34500 conservative
F short 59 delaware 66700 conservative
F short 34 arkansas 42800 liberal
M tall 64 arkansas 77200 liberal
F short 29 delaware 33500 liberal
M medium 34 colorado 43200 moderate
M medium 61 arkansas 75000 liberal
F short 64 delaware 71100 conservative
M short 29 arkansas 41300 conservative
F short 63 colorado 70600 conservative
M medium 29 colorado 40000 conservative
M tall 51 arkansas 62700 moderate
M tall 24 delaware 37700 conservative
F medium 48 colorado 57500 moderate
F short 18 arkansas 27400 conservative
F short 18 arkansas 20300 liberal
F short 33 colorado 38200 liberal
M medium 20 delaware 34800 conservative
F short 29 delaware 33000 liberal
M short 44 delaware 63000 conservative
M tall 65 delaware 81800 conservative
M tall 56 arkansas 63700 liberal
M medium 52 delaware 58400 moderate
M medium 29 colorado 48600 conservative
M tall 47 colorado 58900 moderate
F medium 68 arkansas 72600 liberal
F short 31 delaware 36000 moderate
F short 61 colorado 62500 liberal
F short 19 colorado 21500 liberal
F tall 38 delaware 43000 moderate
M tall 26 arkansas 42300 conservative
F short 61 colorado 67400 conservative
F short 40 arkansas 46500 moderate
M medium 49 arkansas 65200 moderate
F medium 56 arkansas 67500 conservative
M short 48 colorado 66000 moderate
F short 52 arkansas 56300 liberal
M tall 18 arkansas 29800 conservative
M tall 56 delaware 59300 liberal
M medium 52 colorado 64400 moderate
M medium 18 colorado 28600 moderate
M tall 58 arkansas 66200 liberal
M tall 39 colorado 55100 moderate
M tall 46 arkansas 62900 moderate
M medium 40 colorado 46200 moderate
M medium 60 arkansas 72700 liberal
F short 36 colorado 40700 liberal
F short 44 arkansas 52300 moderate
F short 28 arkansas 31300 liberal
F short 54 delaware 62600 conservative
M medium 51 arkansas 61200 moderate
M short 32 colorado 46100 moderate
F short 55 arkansas 62700 conservative
F short 25 delaware 26200 liberal
F medium 33 delaware 37300 liberal
M medium 29 colorado 46200 conservative
F short 65 arkansas 72700 conservative
M tall 43 colorado 51400 moderate
M short 54 colorado 64800 liberal
F short 61 colorado 72700 conservative
F short 52 colorado 63600 conservative
F short 30 colorado 33500 liberal
F short 29 arkansas 31400 liberal
M tall 47 delaware 59400 moderate
F short 39 colorado 47800 moderate
F short 47 delaware 52000 moderate
M medium 49 arkansas 58600 moderate
M tall 63 delaware 67400 liberal
M medium 30 arkansas 39200 conservative
M tall 61 delaware 69600 liberal
M medium 47 delaware 58700 moderate
F short 30 delaware 34500 liberal
M medium 51 delaware 58000 moderate
M medium 24 arkansas 38800 moderate
M short 49 arkansas 64500 moderate
F medium 66 delaware 74500 conservative
M tall 65 arkansas 76900 conservative
M short 46 colorado 58000 conservative
M tall 45 delaware 51800 moderate
M short 47 arkansas 63600 conservative
M tall 29 arkansas 44800 conservative
M tall 57 delaware 69300 liberal
M medium 20 arkansas 28700 liberal
M medium 35 arkansas 43400 moderate
M tall 61 delaware 67000 liberal
M short 31 delaware 37300 moderate
F short 18 arkansas 20800 liberal
F medium 26 delaware 29200 liberal
M medium 28 arkansas 36400 liberal
M tall 59 delaware 69400 liberal
Encoded and normalized data:
# people_encoded.txt
#
# sex (M = 0.0, F = 0.5)
# height (short = 0.25, medium = 0.50, tall = 0.75)
# age (min = 18, max = 68)
# State [Arkansas = (0.25 0 0 0), Colorado = (0 0.25 0 0),
# Delaware (0 0 0.25 0), Illinois (0 0 0 0.25)]
# income (min = 20,300.00 max = 81,800.00)
# politics [(conservative = 0.3333 0 0), moderate (0 0.3333 0),
# liberal (0 0 0.3333)]
#
0.5, 0.25, 0.12, 0.25, 0.00, 0.00, 0.00, 0.1496, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.42, 0.00, 0.00, 0.25, 0.00, 0.5024, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.90, 0.00, 0.25, 0.00, 0.00, 0.9024, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.36, 0.00, 0.00, 0.00, 0.25, 0.3935, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.18, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.64, 0.00, 0.25, 0.00, 0.00, 0.5886, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.64, 0.00, 0.00, 0.00, 0.25, 0.5642, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.02, 0.00, 0.00, 0.25, 0.00, 0.2016, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.08, 0.00, 0.00, 0.00, 0.25, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.42, 0.00, 0.00, 0.25, 0.00, 0.4358, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.32, 0.25, 0.00, 0.00, 0.00, 0.3106, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.08, 0.00, 0.00, 0.00, 0.25, 0.2146, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.34, 0.00, 0.00, 0.25, 0.00, 0.2423, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.30, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.54, 0.00, 0.25, 0.00, 0.00, 0.5496, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.48, 0.00, 0.00, 0.00, 0.25, 0.4943, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.30, 0.00, 0.25, 0.00, 0.00, 0.4309, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.14, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.26, 0.00, 0.25, 0.00, 0.00, 0.4244, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.18, 0.25, 0.00, 0.00, 0.00, 0.1984, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.60, 0.00, 0.00, 0.00, 0.25, 0.5480, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.92, 0.00, 0.00, 0.00, 0.25, 0.8293, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.86, 0.00, 0.25, 0.00, 0.00, 0.8472, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.72, 0.00, 0.00, 0.00, 0.25, 0.6618, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.2602, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.64, 0.00, 0.00, 0.25, 0.00, 0.5642, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.74, 0.00, 0.00, 0.00, 0.25, 0.6862, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.44, 0.00, 0.00, 0.00, 0.25, 0.5220, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.08, 0.25, 0.00, 0.00, 0.00, 0.0537, 0.0000, 0.0000, 0.3333
0.5, 0.25, 1.00, 0.00, 0.25, 0.00, 0.00, 0.9447, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.84, 0.00, 0.00, 0.00, 0.25, 0.8358, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.32, 0.00, 0.00, 0.25, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.14, 0.00, 0.00, 0.25, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.26, 0.00, 0.00, 0.00, 0.25, 0.4650, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.50, 0.00, 0.00, 0.25, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.80, 0.00, 0.25, 0.00, 0.00, 0.7333, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.74, 0.00, 0.00, 0.00, 0.25, 0.6569, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.50, 0.00, 0.25, 0.00, 0.00, 0.5008, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.50, 0.00, 0.00, 0.25, 0.00, 0.5350, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.06, 0.25, 0.00, 0.00, 0.00, 0.2748, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.74, 0.00, 0.00, 0.25, 0.00, 0.7203, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.92, 0.00, 0.25, 0.00, 0.00, 0.8862, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.46, 0.00, 0.00, 0.00, 0.25, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.92, 0.00, 0.00, 0.25, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.76, 0.00, 0.00, 0.00, 0.25, 0.7528, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.26, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.94, 0.00, 0.00, 0.25, 0.00, 0.8098, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.74, 0.00, 0.00, 0.00, 0.25, 0.7154, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.14, 0.25, 0.00, 0.00, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.56, 0.00, 0.00, 0.25, 0.00, 0.4992, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.36, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.68, 0.00, 0.00, 0.00, 0.25, 0.6146, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.86, 0.00, 0.00, 0.25, 0.00, 0.7740, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.78, 0.00, 0.00, 0.25, 0.00, 0.7382, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.56, 0.00, 0.25, 0.00, 0.00, 0.5252, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.88, 0.25, 0.00, 0.00, 0.00, 0.7561, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.74, 0.00, 0.00, 0.00, 0.25, 0.6894, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.08, 0.00, 0.00, 0.25, 0.00, 0.1203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.64, 0.00, 0.00, 0.00, 0.25, 0.6927, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.28, 0.00, 0.00, 0.00, 0.25, 0.3496, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.06, 0.00, 0.00, 0.25, 0.00, 0.2488, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.52, 0.00, 0.25, 0.00, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.56, 0.00, 0.00, 0.00, 0.25, 0.5106, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.88, 0.00, 0.25, 0.00, 0.00, 0.8033, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.78, 0.00, 0.00, 0.00, 0.25, 0.7496, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.98, 0.00, 0.00, 0.00, 0.25, 0.9024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.2276, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.70, 0.00, 0.00, 0.00, 0.25, 0.6472, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.52, 0.25, 0.00, 0.00, 0.00, 0.5610, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.56, 0.00, 0.25, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.04, 0.00, 0.00, 0.00, 0.25, 0.1593, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.40, 0.00, 0.00, 0.00, 0.25, 0.5398, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.64, 0.00, 0.25, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.30, 0.00, 0.25, 0.00, 0.00, 0.3610, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.30, 0.00, 0.25, 0.00, 0.00, 0.3089, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.16, 0.00, 0.25, 0.00, 0.00, 0.3268, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.80, 0.25, 0.00, 0.00, 0.00, 0.8195, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.50, 0.00, 0.00, 0.00, 0.25, 0.4504, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.56, 0.25, 0.00, 0.00, 0.00, 0.7171, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.84, 0.25, 0.00, 0.00, 0.00, 0.8358, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.48, 0.25, 0.00, 0.00, 0.00, 0.4650, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.76, 0.00, 0.00, 0.25, 0.00, 0.5870, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.88, 0.00, 0.25, 0.00, 0.00, 0.7480, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.64, 0.25, 0.00, 0.00, 0.00, 0.7236, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.58, 0.00, 0.00, 0.00, 0.25, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.98, 0.00, 0.25, 0.00, 0.00, 0.9772, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.44, 0.00, 0.00, 0.25, 0.00, 0.4894, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.48, 0.00, 0.25, 0.00, 0.00, 0.4569, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.92, 0.25, 0.00, 0.00, 0.00, 0.8407, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.58, 0.25, 0.00, 0.00, 0.00, 0.6244, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.54, 0.00, 0.25, 0.00, 0.00, 0.5285, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.14, 0.00, 0.00, 0.25, 0.00, 0.3350, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.40, 0.25, 0.00, 0.00, 0.00, 0.4569, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.74, 0.00, 0.00, 0.25, 0.00, 0.6455, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.52, 0.25, 0.00, 0.00, 0.00, 0.6553, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.30, 0.25, 0.00, 0.00, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.32, 0.00, 0.00, 0.25, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.18, 0.00, 0.25, 0.00, 0.00, 0.2179, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.28, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.3333, 0.0000
0.5, 0.75, 0.48, 0.00, 0.00, 0.00, 0.25, 0.4341, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.12, 0.00, 0.00, 0.25, 0.00, 0.3252, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.48, 0.00, 0.25, 0.00, 0.00, 0.4878, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.14, 0.00, 0.00, 0.25, 0.00, 0.1252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.66, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.74, 0.00, 0.25, 0.00, 0.00, 0.7024, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.52, 0.25, 0.00, 0.00, 0.00, 0.4472, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.3171, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.98, 0.00, 0.25, 0.00, 0.00, 0.8341, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.54, 0.00, 0.00, 0.25, 0.00, 0.4829, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.60, 0.25, 0.00, 0.00, 0.00, 0.5772, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.14, 0.00, 0.25, 0.00, 0.00, 0.3041, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.98, 0.25, 0.00, 0.00, 0.00, 0.9431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.38, 0.00, 0.00, 0.25, 0.00, 0.3528, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.28, 0.25, 0.00, 0.00, 0.00, 0.3642, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.60, 0.25, 0.00, 0.00, 0.00, 0.5967, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.96, 0.00, 0.00, 0.25, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.86, 0.25, 0.00, 0.00, 0.00, 0.8081, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.80, 0.00, 0.00, 0.25, 0.00, 0.7902, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.02, 0.25, 0.00, 0.00, 0.00, 0.0602, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.40, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.18, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.48, 0.25, 0.00, 0.00, 0.00, 0.4504, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.84, 0.25, 0.00, 0.00, 0.00, 0.8293, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.18, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.75, 0.22, 0.00, 0.25, 0.00, 0.00, 0.2732, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.50, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.60, 0.25, 0.00, 0.00, 0.00, 0.5919, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.18, 0.00, 0.00, 0.25, 0.00, 0.1480, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.52, 0.25, 0.00, 0.00, 0.00, 0.5675, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.10, 0.00, 0.25, 0.00, 0.00, 0.0976, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.36, 0.00, 0.25, 0.00, 0.00, 0.5317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.92, 0.00, 0.00, 0.25, 0.00, 0.8488, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.22, 0.00, 0.00, 0.25, 0.00, 0.1577, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.30, 0.25, 0.00, 0.00, 0.00, 0.4715, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.96, 0.00, 0.25, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.06, 0.00, 0.00, 0.25, 0.00, 0.2276, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.18, 0.25, 0.00, 0.00, 0.00, 0.2016, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.1870, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.26, 0.25, 0.00, 0.00, 0.00, 0.4602, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.36, 0.00, 0.00, 0.25, 0.00, 0.3366, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.62, 0.00, 0.25, 0.00, 0.00, 0.5756, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.20, 0.25, 0.00, 0.00, 0.00, 0.2943, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.50, 0.00, 0.00, 0.25, 0.00, 0.5902, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.56, 0.00, 0.25, 0.00, 0.00, 0.6260, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.78, 0.25, 0.00, 0.00, 0.00, 0.8049, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.68, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.26, 0.00, 0.00, 0.25, 0.00, 0.3772, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.74, 0.25, 0.00, 0.00, 0.00, 0.6780, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.64, 0.25, 0.00, 0.00, 0.00, 0.5870, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.60, 0.00, 0.25, 0.00, 0.00, 0.5789, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.08, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.82, 0.00, 0.00, 0.25, 0.00, 0.7545, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.32, 0.25, 0.00, 0.00, 0.00, 0.3659, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.92, 0.25, 0.00, 0.00, 0.00, 0.9252, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.00, 0.00, 0.25, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.32, 0.00, 0.25, 0.00, 0.00, 0.3724, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.86, 0.25, 0.00, 0.00, 0.00, 0.8894, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.92, 0.00, 0.00, 0.25, 0.00, 0.8260, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.3415, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.90, 0.00, 0.25, 0.00, 0.00, 0.8179, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.22, 0.00, 0.25, 0.00, 0.00, 0.3203, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.66, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.12, 0.00, 0.00, 0.25, 0.00, 0.2829, 0.3333, 0.0000, 0.0000
0.5, 0.50, 0.60, 0.00, 0.25, 0.00, 0.00, 0.6049, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.1154, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.0000, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.30, 0.00, 0.25, 0.00, 0.00, 0.2911, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.04, 0.00, 0.00, 0.25, 0.00, 0.2358, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.22, 0.00, 0.00, 0.25, 0.00, 0.2065, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.52, 0.00, 0.00, 0.25, 0.00, 0.6943, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.94, 0.00, 0.00, 0.25, 0.00, 1.0000, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.76, 0.25, 0.00, 0.00, 0.00, 0.7057, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.68, 0.00, 0.00, 0.25, 0.00, 0.6195, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.22, 0.00, 0.25, 0.00, 0.00, 0.4602, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.58, 0.00, 0.25, 0.00, 0.00, 0.6276, 0.0000, 0.3333, 0.0000
0.5, 0.50, 1.00, 0.25, 0.00, 0.00, 0.00, 0.8504, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.26, 0.00, 0.00, 0.25, 0.00, 0.2553, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.86, 0.00, 0.25, 0.00, 0.00, 0.6862, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.02, 0.00, 0.25, 0.00, 0.00, 0.0195, 0.0000, 0.0000, 0.3333
0.5, 0.75, 0.40, 0.00, 0.00, 0.25, 0.00, 0.3691, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.16, 0.25, 0.00, 0.00, 0.00, 0.3577, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.86, 0.00, 0.25, 0.00, 0.00, 0.7659, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.44, 0.25, 0.00, 0.00, 0.00, 0.4260, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.62, 0.25, 0.00, 0.00, 0.00, 0.7301, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.76, 0.25, 0.00, 0.00, 0.00, 0.7675, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.60, 0.00, 0.25, 0.00, 0.00, 0.7431, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.68, 0.25, 0.00, 0.00, 0.00, 0.5854, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.00, 0.25, 0.00, 0.00, 0.00, 0.1545, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.76, 0.00, 0.00, 0.25, 0.00, 0.6341, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.68, 0.00, 0.25, 0.00, 0.00, 0.7171, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.00, 0.00, 0.25, 0.00, 0.00, 0.1350, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.80, 0.25, 0.00, 0.00, 0.00, 0.7463, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.42, 0.00, 0.25, 0.00, 0.00, 0.5659, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.56, 0.25, 0.00, 0.00, 0.00, 0.6927, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.44, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.84, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.36, 0.00, 0.25, 0.00, 0.00, 0.3317, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.52, 0.25, 0.00, 0.00, 0.00, 0.5203, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.20, 0.25, 0.00, 0.00, 0.00, 0.1789, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.72, 0.00, 0.00, 0.25, 0.00, 0.6878, 0.3333, 0.0000, 0.0000
0.0, 0.50, 0.66, 0.25, 0.00, 0.00, 0.00, 0.6650, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.28, 0.00, 0.25, 0.00, 0.00, 0.4195, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.74, 0.25, 0.00, 0.00, 0.00, 0.6894, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.14, 0.00, 0.00, 0.25, 0.00, 0.0959, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.30, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.22, 0.00, 0.25, 0.00, 0.00, 0.4211, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.94, 0.25, 0.00, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.50, 0.00, 0.25, 0.00, 0.00, 0.5057, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.72, 0.00, 0.25, 0.00, 0.00, 0.7236, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.86, 0.00, 0.25, 0.00, 0.00, 0.8520, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.68, 0.00, 0.25, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.5, 0.25, 0.24, 0.00, 0.25, 0.00, 0.00, 0.2146, 0.0000, 0.0000, 0.3333
0.5, 0.25, 0.22, 0.25, 0.00, 0.00, 0.00, 0.1805, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.58, 0.00, 0.00, 0.25, 0.00, 0.6358, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.42, 0.00, 0.25, 0.00, 0.00, 0.4472, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.58, 0.00, 0.00, 0.25, 0.00, 0.5154, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.62, 0.25, 0.00, 0.00, 0.00, 0.6228, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.90, 0.00, 0.00, 0.25, 0.00, 0.7659, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.24, 0.25, 0.00, 0.00, 0.00, 0.3073, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.86, 0.00, 0.00, 0.25, 0.00, 0.8016, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.58, 0.00, 0.00, 0.25, 0.00, 0.6244, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.24, 0.00, 0.00, 0.25, 0.00, 0.2309, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.66, 0.00, 0.00, 0.25, 0.00, 0.6130, 0.0000, 0.3333, 0.0000
0.0, 0.50, 0.12, 0.25, 0.00, 0.00, 0.00, 0.3008, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.62, 0.25, 0.00, 0.00, 0.00, 0.7187, 0.0000, 0.3333, 0.0000
0.5, 0.50, 0.96, 0.00, 0.00, 0.25, 0.00, 0.8813, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.94, 0.25, 0.00, 0.00, 0.00, 0.9203, 0.3333, 0.0000, 0.0000
0.0, 0.25, 0.56, 0.00, 0.25, 0.00, 0.00, 0.6130, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.54, 0.00, 0.00, 0.25, 0.00, 0.5122, 0.0000, 0.3333, 0.0000
0.0, 0.25, 0.58, 0.25, 0.00, 0.00, 0.00, 0.7041, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.22, 0.25, 0.00, 0.00, 0.00, 0.3984, 0.3333, 0.0000, 0.0000
0.0, 0.75, 0.78, 0.00, 0.00, 0.25, 0.00, 0.7967, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.04, 0.25, 0.00, 0.00, 0.00, 0.1366, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.34, 0.25, 0.00, 0.00, 0.00, 0.3756, 0.0000, 0.3333, 0.0000
0.0, 0.75, 0.86, 0.00, 0.00, 0.25, 0.00, 0.7593, 0.0000, 0.0000, 0.3333
0.0, 0.25, 0.26, 0.00, 0.00, 0.25, 0.00, 0.2764, 0.0000, 0.3333, 0.0000
0.5, 0.25, 0.00, 0.25, 0.00, 0.00, 0.00, 0.0081, 0.0000, 0.0000, 0.3333
0.5, 0.50, 0.16, 0.00, 0.00, 0.25, 0.00, 0.1447, 0.0000, 0.0000, 0.3333
0.0, 0.50, 0.20, 0.25, 0.00, 0.00, 0.00, 0.2618, 0.0000, 0.0000, 0.3333
0.0, 0.75, 0.82, 0.00, 0.00, 0.25, 0.00, 0.7984, 0.0000, 0.0000, 0.3333
Python k-means program gives identical results:
# kmeans_demo.py
import numpy as np
from sklearn.cluster import KMeans
fn = ".\\Data\\people_encoded.txt"
X = np.loadtxt(fn, usecols=[0,1,2,3,4,5,6,7,8,9,10],
delimiter=',', comments="#", dtype=np.float64)
print("\nsource data:")
print(X)
km = KMeans(n_clusters=3, random_state=1, init='random')
km.fit(X)
print("\nclustering (first 12) = ")
print(km.labels_)
print("\nWCSS = %0.4f " % km.inertia_)
print("\ncounts: ")
print(np.sum(km.labels_ == 0))
print(np.sum(km.labels_ == 1))
print(np.sum(km.labels_ == 2))
print("\nmeans: ")
np.set_printoptions(precision=2)
print(km.cluster_centers_)
Output:
C:\VSM\ClusterMixedKMeans: python kmeans_scikit_demo.py
source data:
[[0.5 0.25 0.12 ... 0. 0. 0.33]
[0. 0.75 0.42 ... 0. 0.33 0. ]
[0.5 0.25 0.9 ... 0.33 0. 0. ]
...
[0.5 0.5 0.16 ... 0. 0. 0.33]
[0. 0.5 0.2 ... 0. 0. 0.33]
[0. 0.75 0.82 ... 0. 0. 0.33]]
clustering (first 12) =
[0 1 2 0 0 2 2 0 0 1 0 0]
WCSS = 49.3195
counts:
89
77
74
means:
[[0.26 0.4 0.19 0.09 0.06 0.09 0.02 0.26 0.1 0.11 0.12]
[0. 0.63 0.66 0.08 0.06 0.08 0.03 0.68 0.04 0.16 0.13]
[0.5 0.32 0.69 0.07 0.08 0.05 0.05 0.65 0.16 0.15 0.02]]
You must be logged in to post a comment.