I came across an interesting dataset recently. The dataset has 308 items. Each item has seven numbers that correspond to a boat hull shape. The first six numbers are predictor variables and the seventh number is a measure of resistance. The idea is to predict resistance from the first six values.
The data looks like:
-2.3 0.568 4.78 3.99 3.17 0.125 0.11 -2.3 0.568 4.78 3.99 3.17 0.150 0.27 . . . -5.0 0.600 4.78 4.24 3.15 0.125 0.06 . . .
The predictor variables are:
1. Longitudinal position of the center of buoyancy.
2. Prismatic coefficient.
3. Length-displacement ratio.
4. Beam-draught ratio.
5. Length-beam ratio.
6. Froude number.
Just for kicks, on my lunch break, I decided to take a crack at creating a deep neural regression model. I used the CNTK library. My first step was to normalize all the data. I used min-max normalization.
I was surprised at how fast I got a prediction model up and running — about 15 minutes. But in retrospect, I had all the CNTK code done from previous experiments, so all I really had to do was spend some time adjusting the neural network hyperparameters. In the image below, I used two hidden layers of 5 nodes each, tanh activation, a fixed learning rate of 0.005, and the Adam optimization algorithm.
In regression problems, where the goal is to predict a numeric value (resistance in this case), you have to define what it means for a predicted value to be correct. In my little demo, the program counts a prediction correct if it is within 20% of the true value. With that criterion, my model got 74.66% accuracy (230 correct, 78 wrong).