The Yacht Hull Hydrodynamics Dataset

I came across an interesting dataset recently. The dataset has 308 items. Each item has seven numbers that correspond to a boat hull shape. The first six numbers are predictor variables and the seventh number is a measure of resistance. The idea is to predict resistance from the first six values.

The data looks like:

-2.3 0.568 4.78 3.99 3.17 0.125 0.11
-2.3 0.568 4.78 3.99 3.17 0.150 0.27
. . .
-5.0 0.600 4.78 4.24 3.15 0.125 0.06
. . .

The predictor variables are:

1. Longitudinal position of the center of buoyancy.
2. Prismatic coefficient.
3. Length-displacement ratio.
4. Beam-draught ratio.
5. Length-beam ratio.
6. Froude number.

Just for kicks, on my lunch break, I decided to take a crack at creating a deep neural regression model. I used the CNTK library. My first step was to normalize all the data. I used min-max normalization.

I was surprised at how fast I got a prediction model up and running — about 15 minutes. But in retrospect, I had all the CNTK code done from previous experiments, so all I really had to do was spend some time adjusting the neural network hyperparameters. In the image below, I used two hidden layers of 5 nodes each, tanh activation, a fixed learning rate of 0.005, and the Adam optimization algorithm.

In regression problems, where the goal is to predict a numeric value (resistance in this case), you have to define what it means for a predicted value to be correct. In my little demo, the program counts a prediction correct if it is within 20% of the true value. With that criterion, my model got 74.66% accuracy (230 correct, 78 wrong).

Good fun!

The USS Sequoia yacht was used by U.S. presidents Herbert Hoover, Franklin Roosevelt, Dwight Eisenhower, John F. Kennedy, Richard Nixon, and Gerald Ford.

This entry was posted in Machine Learning. Bookmark the permalink.