Decision Tree Regression From Scratch Using C# That Explicitly Handles Mixed Numeric and Categorical Data

I recently refactored my decision tree classifier (implemented from scratch using C#) so that it could explicitly handle both numeric data and categorical data. While the ideas were still fresh in my mind, I figured I’d refactor my decision tree regression code (predict a single numeric value) so that it can handle mixed numeric and categorical data too.

I changed the regression decision tree constructor to accept a string array that tells the tree if a column is numeric/ordinal or categorical. I put together a demo. I used one of my standard synthetic sets of data . The data looks like:

0, 0.24, 0, 0.2950, 2
1, 0.39, 2, 0.5120, 1
0, 0.63, 1, 0.7580, 0
1, 0.36, 0, 0.4450, 1
. . .

Each line represents a person. The fields are sex (male = 0, female = 1), age (divided by 100), State (Michigan = 0, Nebraska = 1, Oklahoma = 2), income (divided by $100,000) and political leaning (conservative = 0, moderate = 1, liberal = 2). The goal is to predict income from sex, age, State and political leaning. (Note: it’s not necessary to normalize numeric data for decision trees but I like to do so anyway so that the data can be used by ML prediction systems that need normalization: linear (ridge) regression, k-nearest neighbors regression, kernel ridge regression, Gaussian process regression, neural network regression, and so on.

The key calling code is:

// load train and test data here
string[] columnKind = new string[] { "C", "N", "C", "C" };
DecisionTree dt = new DecisionTree(63, columnKind);
dt.BuildTree(trainX, trainY);

Console.WriteLine("Predicting for male, 34," +
  " Oklahoma, moderate ");
double[] x = new double[] { 0, 0.34, 2, 1 };
double predY = dt.Predict(x, verbose: true);

The “C” kind means categorical and the “N” kind means numeric. The tree constructor creates a balanced binary tree with 63 nodes, some of them empty. The root node is Node 0. The second level is Node 1 and 2. The third level is Node 3, 4, 5, 6. And so on.

If a node is a numeric value, the tree branches to the left for less-than or to the right for greater-or-equal. For example, “If Age ‘lt’ 0.34” goes left and “If Age ‘gte’ 0.34” goes right. If a node is categorical value, the tree branches to the left for not-equal or to the right for is-equal. For example, “If State = 2 (Oklahoma)” goes left and “If State != 2 (any other)” goes right.

For decision tree classifiers the splitting condition is weighted Gini impurity so that leaf nodes have mostly the same counts of target classes. This idea doesn’t work for regression trees.

For my decision tree regression system I used weighted statistical variance so that leaf nodes have target y values that are very similar. The predicted value is a simple average of the target y values in the terminal node.

My Predict() method has a verbose parameter that generates output like:

curr node id = 0
Column kind = N
Comparing 0.43 in column 1 with 0.34
attempting move left to Node 1
new node id = 1

curr node id = 1
Column kind = N
Comparing 0.31 in column 1 with 0.34
attempting move right to Node = 4
new node id = 4

curr node id = 4
Column kind = N
Comparing 0.38 in column 1 with 0.34
attempting move left to Node 9
new node id = 9

curr node id = 9
Column kind = C
Comparing 1 in column 0 with 0
attempting move left to Node 19
new node id = 19

curr node id = 19
Column kind = N
Comparing 0.32 in column 1 with 0.34
attempting move right to Node = 40
new node id = 40

curr node id = 40
Column kind = C
Comparing 1 in column 3 with 1
attempting move right to Node = 82

IF (*) AND (column 1 = 0.31) AND 
 (column 1 = 0.32)

Predicted Y = 0.40227

The wonky “if col 1 (age) less than 0.43” followed by “if age greater-equal than 0.31” is one of the symptoms of decision trees.

Decision trees are paradoxically simple and complex. They’re simple because there’s almost no mathematics involved other than computing the Gini value (for tree classifiers) or a measure of variability (for tree regression) to split data in the tree. But trees are complex because they have very tricky data structures and lots of tossing data around.



Machine learning systems are sort of software automata. I’ve always been fascinated by mechanical automata — entertaining machines that have complex movements. I stumbled across a wonderful example named “The Alchemyst’s Clocktower” by a man named Thomas Kuntz. It’s a five foot tall clock building where all kinds of things on the outside move, and then the front opens up and a 12″ wizard talks and does tricks. Absolutely beautiful.


Demo code. Very long! Not thoroughly tested so do not use for production. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

using System;
using System.Collections.Generic;
using System.IO;

namespace DecisionTreeRegression
{
  internal class DecisionTreeRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nDecision tree regression ");
      Console.WriteLine("Predict income from sex," +
        " age, State, political leaning ");

      // ------------------------------------------------------

      string trainFile =
        "..\\..\\..\\Data\\people_train_tree.txt";
      // sex, age, State, income, politics
      double[][] trainX = Utils.MatLoad(trainFile,
        new int[] { 0, 1, 2, 4 }, ',', "#");
      double[] trainY = 
        Utils.MatToVec(Utils.MatLoad(trainFile,
        new int[] { 3 }, ',', "#"));

      string testFile =
        "..\\..\\..\\Data\\people_test_tree.txt";
      double[][] testX = Utils.MatLoad(testFile,
        new int[] { 0, 1, 2, 4 }, ',', "#");
      double[] testY = 
        Utils.MatToVec(Utils.MatLoad(testFile,
        new int[] { 3 }, ',', "#"));

      Console.WriteLine("\nFirst three X data: ");
      for (int i = 0; i "lt" 3; ++i)
        Utils.VecShow(trainX[i], 4, 9, true);

      Console.WriteLine("\nFirst three target Y : ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine(trainY[i].ToString("F4"));

      // sex, age, State, politics
      string[] columnKind = 
        new string[] { "C", "N", "C", "C" };
      DecisionTree dt = new DecisionTree(63, columnKind);
      dt.BuildTree(trainX, trainY);

      Console.WriteLine("\nTree snapshot: ");
      //dt.ShowTree();  // show all nodes in tree
      dt.ShowNode(0);
      dt.ShowNode(30);

      Console.WriteLine("\nComputing model accuracy:");
      double trainAcc = dt.Accuracy(trainX, trainY, 0.10);
      Console.WriteLine("Train data accuracy = " +
        trainAcc.ToString("F4"));

      double testAcc = dt.Accuracy(testX, testY, 0.10);
      Console.WriteLine("Test data accuracy = " +
        testAcc.ToString("F4"));

      Console.WriteLine("\nPredicting for male, 34," +
        " Oklahoma, moderate ");
      double[] x = new double[] { 0, 0.34, 2, 1 };
      double predY = dt.Predict(x, verbose: true);
      // Console.WriteLine("Predicted income = " +
      // predY.ToString("F5"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main

  } // Program

  // ----------------------------------------------------------

  class DecisionTree
  {
    public int numNodes;
    public int numClasses;
    public List"lt"Node"gt" tree;
    public string[] columnKind;

    // ---------- nested classes

    public class Node
    {
      public int nodeID;
      public List"lt"int"gt" rows;  // source data rows
      public int splitCol;
      public double splitVal;
      public double[] targetValues;
      public double predictedY;  // avg of targets
    }

    public class SplitInfo  // helper struc
    {
      public int splitCol;
      public double splitVal;
      public List"lt"int"gt" lessRows;
      public List"lt"int"gt" greaterRows;
    }

    // ----------

    public DecisionTree(int numNodes, string[] colKind)
    {
      this.numNodes = numNodes;
      this.tree = new List"lt"Node"gt"();
      for (int i = 0; i "lt" numNodes; ++i)
        this.tree.Add(new Node());
      this.columnKind = colKind;  // by ref
    } // ctor

    // -------------------------------------------------------

    public void BuildTree(double[][] trainX,
      double[] trainY)
    {
      // prep the list and the root node
      int n = trainX.Length;

      List"lt"int"gt" allRows = new List"lt"int"gt"();
      for (int i = 0; i "lt" n; ++i)
        allRows.Add(i);

      this.tree[0].rows = new List"lt"int"gt"(allRows);

      for (int i = 0; i "lt" this.numNodes; ++i)
      {
        this.tree[i].nodeID = i; 

        SplitInfo si = GetSplitInfo(trainX, trainY,
          this.tree[i].rows, this.columnKind);
        this.tree[i].splitCol = si.splitCol;
        this.tree[i].splitVal = si.splitVal;

        //Utils.ListShow(this.tree[i].rows, 4);
        //Console.ReadLine();

        this.tree[i].targetValues = 
          GetTargets(trainY, this.tree[i].rows);
        this.tree[i].predictedY = 
          Average(tree[i].targetValues);

        int leftChild = (2 * i) + 1;
        int rightChild = (2 * i) + 2;

        if (leftChild "lt" numNodes)
          tree[leftChild].rows =
            new List"lt"int"gt"(si.lessRows);
        if (rightChild "lt" numNodes)
          tree[rightChild].rows =
            new List"lt"int"gt"(si.greaterRows);
      } // i

    } // BuildTree()

    // -------------------------------------------------------

    private static double[] GetTargets(double[] trainY,
      List"lt"int"gt" rows)
    {
      int n = rows.Count;
      double[] result = new double[n];
 
      for (int i = 0; i "lt" n; ++i)
      {
        int r = rows[i];
        double target = trainY[r];
        result[i] = target;
      }
      return result;
    }

    // -------------------------------------------------------
    
    private static double Average(double[] targets)
    {
      int n = targets.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
        sum += targets[i];
      return sum / n;
    }

    // -------------------------------------------------------

    private static SplitInfo GetSplitInfo(double[][]
      trainX, double[] trainY, List"lt"int"gt" rows,
      string[] colKind)
    {
      // given a set of parent rows, find the col and
      // value, and less-rows and greater-rows of
      // partition that gives lowest resulting mean
      // variance/variability
      int nCols = trainX[0].Length;
      SplitInfo result = new SplitInfo();

      int bestSplitCol = 0;
      double bestSplitVal = 0.0;
      double bestVariability = double.MaxValue;
      List"lt"int"gt" bestLessRows = new List"lt"int"gt"();
      List"lt"int"gt" bestGreaterRows = new List"lt"int"gt"();

      foreach (int i in rows)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          string kind = colKind[j];  // "N"um or "C"at

          double splitVal = trainX[i][j];
          List"lt"int"gt" lessRows = new List"lt"int"gt"();
          List"lt"int"gt" greaterRows = new List"lt"int"gt"();
          foreach (int ii in rows)  // walk column
          {
            if (kind == "N")  // numeric
            {
              if (trainX[ii][j] "lt" splitVal)
                lessRows.Add(ii);
              else
                greaterRows.Add(ii);
            }
            else if (kind == "C")// categorical
            {
              if ((int)trainX[ii][j] != (int)splitVal)
                lessRows.Add(ii);
              else
                greaterRows.Add(ii);
            }

          } // ii

          double meanVariability = 
            MeanVariability(trainY,
            lessRows, greaterRows);
          if (meanVariability "lt" bestVariability)
          {
            bestVariability = meanVariability;
            bestSplitCol = j;
            bestSplitVal = splitVal;

            bestLessRows = 
              new List"lt"int"gt"(lessRows);
            bestGreaterRows = 
              new List"lt"int"gt"(greaterRows);
          }

        } // j
      } // i

      result.splitCol = bestSplitCol;
      result.splitVal = bestSplitVal;
      result.lessRows = 
        new List"lt"int"gt"(bestLessRows);
      result.greaterRows = 
        new List"lt"int"gt"(bestGreaterRows);

      return result;
    }

    // -------------------------------------------------------

    private static double Variability(double[]
      trainY, List"lt"int"gt" rows)
    {
      // lower variability better
      int n = rows.Count;
      if (n == 0) return 0.0;  // FIX THIS

      double sum = 0.0;  // compute mean
      for (int i = 0; i "lt" rows.Count; ++i)
      {
        int idx = rows[i];
        double target = trainY[idx];
        sum += target;
      }
      double mean = sum / n;

      // use mean to compute variance
      for (int i = 0; i "lt" rows.Count; ++i)
      {
        int idx = rows[i];  // pts into refY
        double target = trainY[idx];
        sum += (target - mean) * (target - mean);
      }
      return sum / n;  // variance
    }

    // -------------------------------------------------------

    private static double MeanVariability(double[] trainY,
      List"lt"int"gt" rows1, List"lt"int"gt" rows2)
    {
      // weighted by number items in rows
      if (rows1.Count == 0 && rows2.Count == 0)
        return 0.0;  // FIX

      // 0.0 if either rows Count is 0:
      double variability1 = Variability(trainY, rows1);
      double variability2 = Variability(trainY, rows2);
      int count1 = rows1.Count;
      int count2 = rows2.Count;
      double wt1 = (count1 * 1.0) / (count1 + count2);
      double wt2 = (count2 * 1.0) / (count1 + count2);
      double result = (wt1 * variability1) +
        (wt2 * variability2);
      return result;
    }

    // -------------------------------------------------------

    public void ShowTree()  // show all nodes in tree
    {
      for (int i = 0; i "lt" this.numNodes; ++i)
        ShowNode(i);
    }

    // -------------------------------------------------------

    public void ShowNode(int nodeID)
    {
      Console.WriteLine("\n==========");
      Console.WriteLine("Node ID: " +
        this.tree[nodeID].nodeID);
      Console.WriteLine("Node split column: " +
        this.tree[nodeID].splitCol);
      Console.WriteLine("Node split value: " +
        this.tree[nodeID].splitVal.ToString("F2"));

      Console.WriteLine("Node target values: ");
      for (int i = 0; i "lt" this.tree[nodeID].
        targetValues.Length; ++i)
      {
        if (i "gt" 0 && i % 10 == 0) Console.WriteLine("");
        Console.Write(this.tree[nodeID].
          targetValues[i].ToString("F4").PadLeft(8));
      }
      Console.WriteLine("");

      Console.WriteLine("\nSource rows: ");
      for (int i = 0; i "lt" this.tree[nodeID].rows.Count;
        ++i)
      {
        if (i "gt" 0 && i % 10 == 0) Console.WriteLine("");
        Console.Write(this.tree[nodeID].rows[i].
          ToString().PadLeft(4) + " ");
      }
      Console.WriteLine("");
      Console.WriteLine("Node predicted y: " +
        this.tree[nodeID].predictedY.ToString("F4"));
      Console.WriteLine("==========");
    }

    // -------------------------------------------------------

    public double Predict(double[] x, bool verbose)
    {
      bool vb = verbose;
      double result = -1.0;
      int currNodeID = 0;
      int newNodeID = 0;
      string rule = "IF (*)";  // if any  . . 
      while (true)
      {
        if (this.tree[currNodeID].rows.Count == 0)
          break; // at an empty node

        if (vb) Console.WriteLine("\ncurr node id = " +
          currNodeID);

        int sc = this.tree[currNodeID].splitCol;

        string scKind = this.columnKind[sc];  // "N" or "C"
        if (vb) Console.WriteLine("Column kind = " +
          scKind);

        // ----------------------------------------------------

        if (scKind == "N")  // a Numeric column
        {

          double sv = this.tree[currNodeID].splitVal;
          double v = x[sc];
          if (vb) Console.WriteLine("Comparing " + sv +
            " in column " + sc + " with " + v);

          if (v "lt" sv)
          {
            newNodeID = (2 * currNodeID) + 1;
            if (vb) Console.WriteLine("attempting move" +
              " left to Node " + newNodeID);
            if (newNodeID "gte" this.tree.Count)
              break;  // attempt to fall out of tree
            if (this.tree[newNodeID].rows.Count == 0)
              break;  // move to invalid Node

            currNodeID = newNodeID;
            result = this.tree[currNodeID].predictedY;
            rule += " AND (column " + sc +
              " "lt" " + sv + ")";
          }
          else if (v "gte" sv)
          {
            newNodeID = (2 * currNodeID) + 2;
            if (vb) Console.WriteLine("attempting move" +
              " right to Node = " + newNodeID);
            if (newNodeID "gte" this.tree.Count)
              break;
            if (this.tree[newNodeID].rows.Count == 0)
              break;

            currNodeID = newNodeID;
            result = this.tree[currNodeID].predictedY;
            rule += " AND (column " + sc +
              " "gte" " + sv + ")";
          }
          else
          {
            if (vb) Console.WriteLine("Logic Error: " +
              "Unable to move left or right");
          }

          if (vb) Console.WriteLine("new node id = " +
            currNodeID);
        }

        // ----------------------------------------------------

        else if (scKind == "C")  // Categorical column
        {

          int sv = (int)this.tree[currNodeID].splitVal;
          int v = (int)x[sc];
          if (vb) Console.WriteLine("Comparing " + sv +
            " in column " + sc + " with " + v);

          if (v != sv)
          {
            newNodeID = (2 * currNodeID) + 1;
            if (vb) Console.WriteLine("attempting move" +
              " left to Node " + newNodeID);
            if (newNodeID "gte" this.tree.Count)
              break;
            if (this.tree[newNodeID].rows.Count == 0)
              break;

            currNodeID = newNodeID;
            result = this.tree[currNodeID].predictedY;
            rule += " AND (column " + sc +
              " != " + sv + ")";
          }
          else if (v == sv)
          {
            newNodeID = (2 * currNodeID) + 2;
            if (vb) Console.WriteLine("attempting move" +
              " right to Node = " + newNodeID);
            if (newNodeID "gte" this.tree.Count)
              break;
            if (this.tree[newNodeID].rows.Count == 0)
              break;

            currNodeID = newNodeID;
            result = this.tree[currNodeID].predictedY;
            rule += " AND (column " +
              sc + " == " + sv + ")";
          }
          else
          {
            if (vb) Console.WriteLine("Logic Error: " +
              "Unable to move left or right");
          }

          if (vb) Console.WriteLine("new node id = " +
            currNodeID);
        }

      } // while

      if (vb) Console.WriteLine("\n" + rule);
      if (vb) Console.WriteLine("Predicted Y = " +
        result.ToString("F5"));

      return result;
    } // Predict

    // -------------------------------------------------------

    public double Accuracy(double[][] dataX,
      double[] dataY, double pctClose)
    {
      int numCorrect = 0;
      int numWrong = 0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double predY = Predict(dataX[i], verbose: false);
        double actualY = dataY[i];

        if (Math.Abs(predY - actualY) "lt"
          Math.Abs(pctClose * actualY))
        {
          ++numCorrect;
        }
        else
        {
          ++numWrong;
        }
      }
      return (numCorrect * 1.0) / (numWrong + numCorrect);
    }

  } // DecisionTree class

  // ----------------------------------------------------------

  public class Utils
  {
    public static double[][] VecToMat(double[] vec,
      int rows, int cols)
    {
      // vector to row vec/matrix
      double[][] result = MatCreate(rows, cols);
      int k = 0;
      for (int i = 0; i "lt" rows; ++i)
        for (int j = 0; j "lt" cols; ++j)
          result[i][j] = vec[k++];
      return result;
    }

    public static double[][] MatCreate(int rows,
      int cols)
    {
      double[][] result = new double[rows][];
      for (int i = 0; i "lt" rows; ++i)
        result[i] = new double[cols];
      return result;
    }

    static int NumNonCommentLines(string fn,
      string comment)
    {
      int ct = 0;
      string line = "";
      FileStream ifs = new FileStream(fn,
        FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
        if (line.StartsWith(comment) == false)
          ++ct;
      sr.Close(); ifs.Close();
      return ct;
    }

    public static double[][] MatLoad(string fn,
      int[] usecols, char sep, string comment)
    {
      // count number of non-comment lines
      int nRows = NumNonCommentLines(fn, comment);
      int nCols = usecols.Length;
      double[][] result = MatCreate(nRows, nCols);
      string line = "";
      string[] tokens = null;
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        tokens = line.Split(sep);
        for (int j = 0; j "lt" nCols; ++j)
        {
          int k = usecols[j];  // into tokens
          result[i][j] = double.Parse(tokens[k]);
        }
        ++i;
      }
      sr.Close(); ifs.Close();
      return result;
    }

    // -------------------------------------------------------

    public static double[] MatToVec(double[][] m)
    {
      int rows = m.Length;
      int cols = m[0].Length;
      double[] result = new double[rows * cols];
      int k = 0;
      for (int i = 0; i "lt" rows; ++i)
        for (int j = 0; j "lt" cols; ++j)
          result[k++] = m[i][j];

      return result;
    }

    public static void MatShow(double[][] m,
      int dec, int wid)
    {
      for (int i = 0; i "lt" m.Length; ++i)
      {
        for (int j = 0; j "lt" m[0].Length; ++j)
        {
          double v = m[i][j];
          if (Math.Abs(v) "lt" 1.0e-8) v = 0.0; // hack
          Console.Write(v.ToString("F" +
            dec).PadLeft(wid));
        }
        Console.WriteLine("");
      }
    }

    public static void VecShow(int[] vec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString().PadLeft(wid));
      Console.WriteLine("");
    }

    public static void ListShow(List"lt"int"gt" list,
      int wid)
    {
      if (list.Count == 0)
        Console.WriteLine("EMPTY LIST ");
      for (int i = 0; i "lt" list.Count; ++i)
        Console.Write(list[i].ToString().PadLeft(wid));
      Console.WriteLine("");
    }

    public static void VecShow(double[] vec,
      int dec, int wid, bool newLine)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
      {
        double x = vec[i];
        if (Math.Abs(x) "lt" 1.0e-8) x = 0.0; 
        Console.Write(x.ToString("F" +
          dec).PadLeft(wid));
      }
      if (newLine == true)
        Console.WriteLine("");
    }

  } // Utils class

} // ns

Demo training data:

# people_train_tree.txt
#
0, 0.24, 0, 0.2950, 2
1, 0.39, 2, 0.5120, 1
0, 0.63, 1, 0.7580, 0
1, 0.36, 0, 0.4450, 1
0, 0.27, 1, 0.2860, 2
0, 0.50, 1, 0.5650, 1
0, 0.50, 2, 0.5500, 1
1, 0.19, 2, 0.3270, 0
0, 0.22, 1, 0.2770, 1
1, 0.39, 2, 0.4710, 2
0, 0.34, 0, 0.3940, 1
1, 0.22, 0, 0.3350, 0
0, 0.35, 2, 0.3520, 2
1, 0.33, 1, 0.4640, 1
0, 0.45, 1, 0.5410, 1
0, 0.42, 1, 0.5070, 1
1, 0.33, 1, 0.4680, 1
0, 0.25, 2, 0.3000, 1
1, 0.31, 1, 0.4640, 0
0, 0.27, 0, 0.3250, 2
0, 0.48, 0, 0.5400, 1
1, 0.64, 1, 0.7130, 2
0, 0.61, 1, 0.7240, 0
0, 0.54, 2, 0.6100, 0
0, 0.29, 0, 0.3630, 0
0, 0.50, 2, 0.5500, 1
0, 0.55, 2, 0.6250, 0
0, 0.40, 0, 0.5240, 0
0, 0.22, 0, 0.2360, 2
0, 0.68, 1, 0.7840, 0
1, 0.60, 0, 0.7170, 2
1, 0.34, 2, 0.4650, 1
1, 0.25, 2, 0.3710, 0
1, 0.31, 1, 0.4890, 1
0, 0.43, 2, 0.4800, 1
0, 0.58, 1, 0.6540, 2
1, 0.55, 1, 0.6070, 2
1, 0.43, 1, 0.5110, 1
1, 0.43, 2, 0.5320, 1
1, 0.21, 0, 0.3720, 0
0, 0.55, 2, 0.6460, 0
0, 0.64, 1, 0.7480, 0
1, 0.41, 0, 0.5880, 1
0, 0.64, 2, 0.7270, 0
1, 0.56, 2, 0.6660, 2
0, 0.31, 2, 0.3600, 1
1, 0.65, 2, 0.7010, 2
0, 0.55, 2, 0.6430, 0
1, 0.25, 0, 0.4030, 0
0, 0.46, 2, 0.5100, 1
1, 0.36, 0, 0.5350, 0
0, 0.52, 1, 0.5810, 1
0, 0.61, 2, 0.6790, 0
0, 0.57, 2, 0.6570, 0
1, 0.46, 1, 0.5260, 1
1, 0.62, 0, 0.6680, 2
0, 0.55, 2, 0.6270, 0
1, 0.22, 2, 0.2770, 1
1, 0.50, 0, 0.6290, 0
1, 0.32, 1, 0.4180, 1
1, 0.21, 2, 0.3560, 0
0, 0.44, 1, 0.5200, 1
0, 0.46, 1, 0.5170, 1
0, 0.62, 1, 0.6970, 0
0, 0.57, 1, 0.6640, 0
1, 0.67, 2, 0.7580, 2
0, 0.29, 0, 0.3430, 2
0, 0.53, 0, 0.6010, 0
1, 0.44, 0, 0.5480, 1
0, 0.46, 1, 0.5230, 1
1, 0.20, 1, 0.3010, 1
1, 0.38, 0, 0.5350, 1
0, 0.50, 1, 0.5860, 1
0, 0.33, 1, 0.4250, 1
1, 0.33, 1, 0.3930, 1
0, 0.26, 1, 0.4040, 0
0, 0.58, 0, 0.7070, 0
0, 0.43, 2, 0.4800, 1
1, 0.46, 0, 0.6440, 0
0, 0.60, 0, 0.7170, 0
1, 0.42, 0, 0.4890, 1
1, 0.56, 2, 0.5640, 2
1, 0.62, 1, 0.6630, 2
1, 0.50, 0, 0.6480, 1
0, 0.47, 2, 0.5200, 1
1, 0.67, 1, 0.8040, 2
1, 0.40, 2, 0.5040, 1
0, 0.42, 1, 0.4840, 1
0, 0.64, 0, 0.7200, 0
1, 0.47, 0, 0.5870, 2
0, 0.45, 1, 0.5280, 1
1, 0.25, 2, 0.4090, 0
0, 0.38, 0, 0.4840, 0
0, 0.55, 2, 0.6000, 1
1, 0.44, 0, 0.6060, 1
0, 0.33, 0, 0.4100, 1
0, 0.34, 2, 0.3900, 1
0, 0.27, 1, 0.3370, 2
0, 0.32, 1, 0.4070, 1
0, 0.42, 2, 0.4700, 1
1, 0.24, 2, 0.4030, 0
0, 0.42, 1, 0.5030, 1
0, 0.25, 2, 0.2800, 2
0, 0.51, 1, 0.5800, 1
1, 0.55, 1, 0.6350, 2
0, 0.44, 0, 0.4780, 2
1, 0.18, 0, 0.3980, 0
1, 0.67, 1, 0.7160, 2
0, 0.45, 2, 0.5000, 1
0, 0.48, 0, 0.5580, 1
1, 0.25, 1, 0.3900, 1
1, 0.67, 0, 0.7830, 1
0, 0.37, 2, 0.4200, 1
1, 0.32, 0, 0.4270, 1
0, 0.48, 0, 0.5700, 1
1, 0.66, 2, 0.7500, 2
0, 0.61, 0, 0.7000, 0
1, 0.58, 2, 0.6890, 1
0, 0.19, 0, 0.2400, 2
0, 0.38, 2, 0.4300, 1
1, 0.27, 0, 0.3640, 1
0, 0.42, 0, 0.4800, 1
0, 0.60, 0, 0.7130, 0
1, 0.27, 2, 0.3480, 0
0, 0.29, 1, 0.3710, 0
1, 0.43, 0, 0.5670, 1
0, 0.48, 0, 0.5670, 1
0, 0.27, 2, 0.2940, 2
1, 0.44, 0, 0.5520, 0
0, 0.23, 1, 0.2630, 2
1, 0.36, 1, 0.5300, 2
0, 0.64, 2, 0.7250, 0
0, 0.29, 2, 0.3000, 2
1, 0.33, 0, 0.4930, 1
1, 0.66, 1, 0.7500, 2
1, 0.21, 2, 0.3430, 0
0, 0.27, 0, 0.3270, 2
0, 0.29, 0, 0.3180, 2
1, 0.31, 0, 0.4860, 1
0, 0.36, 2, 0.4100, 1
0, 0.49, 1, 0.5570, 1
1, 0.28, 0, 0.3840, 0
1, 0.43, 2, 0.5660, 1
1, 0.46, 1, 0.5880, 1
0, 0.57, 0, 0.6980, 0
1, 0.52, 2, 0.5940, 1
1, 0.31, 2, 0.4350, 1
1, 0.55, 0, 0.6200, 2
0, 0.50, 0, 0.5640, 1
0, 0.48, 1, 0.5590, 1
1, 0.22, 2, 0.3450, 0
0, 0.59, 2, 0.6670, 0
0, 0.34, 0, 0.4280, 2
1, 0.64, 0, 0.7720, 2
0, 0.29, 2, 0.3350, 2
1, 0.34, 1, 0.4320, 1
1, 0.61, 0, 0.7500, 2
0, 0.64, 2, 0.7110, 0
1, 0.29, 0, 0.4130, 0
0, 0.63, 1, 0.7060, 0
1, 0.29, 1, 0.4000, 0
1, 0.51, 0, 0.6270, 1
1, 0.24, 2, 0.3770, 0
0, 0.48, 1, 0.5750, 1
0, 0.18, 0, 0.2740, 0
0, 0.18, 0, 0.2030, 2
0, 0.33, 1, 0.3820, 2
1, 0.20, 2, 0.3480, 0
0, 0.29, 2, 0.3300, 2
1, 0.44, 2, 0.6300, 0
1, 0.65, 2, 0.8180, 0
1, 0.56, 0, 0.6370, 2
1, 0.52, 2, 0.5840, 1
1, 0.29, 1, 0.4860, 0
1, 0.47, 1, 0.5890, 1
0, 0.68, 0, 0.7260, 2
0, 0.31, 2, 0.3600, 1
0, 0.61, 1, 0.6250, 2
0, 0.19, 1, 0.2150, 2
0, 0.38, 2, 0.4300, 1
1, 0.26, 0, 0.4230, 0
0, 0.61, 1, 0.6740, 0
0, 0.40, 0, 0.4650, 1
1, 0.49, 0, 0.6520, 1
0, 0.56, 0, 0.6750, 0
1, 0.48, 1, 0.6600, 1
0, 0.52, 0, 0.5630, 2
1, 0.18, 0, 0.2980, 0
1, 0.56, 2, 0.5930, 2
1, 0.52, 1, 0.6440, 1
1, 0.18, 1, 0.2860, 1
1, 0.58, 0, 0.6620, 2
1, 0.39, 1, 0.5510, 1
1, 0.46, 0, 0.6290, 1
1, 0.40, 1, 0.4620, 1
1, 0.60, 0, 0.7270, 2
0, 0.36, 1, 0.4070, 2
0, 0.44, 0, 0.5230, 1
0, 0.28, 0, 0.3130, 2
0, 0.54, 2, 0.6260, 0

Test data:

# people_test_tree.txt
#
1, 0.51, 0, 0.6120, 1
1, 0.32, 1, 0.4610, 1
0, 0.55, 0, 0.6270, 0
0, 0.25, 2, 0.2620, 2
0, 0.33, 2, 0.3730, 2
1, 0.29, 1, 0.4620, 0
0, 0.65, 0, 0.7270, 0
1, 0.43, 1, 0.5140, 1
1, 0.54, 1, 0.6480, 2
0, 0.61, 1, 0.7270, 0
0, 0.52, 1, 0.6360, 0
0, 0.30, 1, 0.3350, 2
0, 0.29, 0, 0.3140, 2
1, 0.47, 2, 0.5940, 1
0, 0.39, 1, 0.4780, 1
0, 0.47, 2, 0.5200, 1
1, 0.49, 0, 0.5860, 1
1, 0.63, 2, 0.6740, 2
1, 0.30, 0, 0.3920, 0
1, 0.61, 2, 0.6960, 2
1, 0.47, 2, 0.5870, 1
0, 0.30, 2, 0.3450, 2
1, 0.51, 2, 0.5800, 1
1, 0.24, 0, 0.3880, 1
1, 0.49, 0, 0.6450, 1
0, 0.66, 2, 0.7450, 0
1, 0.65, 0, 0.7690, 0
1, 0.46, 1, 0.5800, 0
1, 0.45, 2, 0.5180, 1
1, 0.47, 0, 0.6360, 0
1, 0.29, 0, 0.4480, 0
1, 0.57, 2, 0.6930, 2
1, 0.20, 0, 0.2870, 2
1, 0.35, 0, 0.4340, 1
1, 0.61, 2, 0.6700, 2
1, 0.31, 2, 0.3730, 1
0, 0.18, 0, 0.2080, 2
0, 0.26, 2, 0.2920, 2
1, 0.28, 0, 0.3640, 2
1, 0.59, 2, 0.6940, 2
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a comment