## Using scikit Linear Discriminant Analysis (LDA) for Dimensionality Reduction

The idea of dimensionality reduction is to convert a vector with multiple values down to a vector with fewer values (often 2). For example, suppose you are looking at the Wheat Seeds dataset. The raw data looks like:

The raw data is at archive.ics.uci.edu/ml/datasets/seeds and looks like:

```15.26  14.84  0.871   5.763  3.312  2.221  5.22   1
14.88  14.57  0.8811  5.554  3.333  1.018  4.956  1
. . .
17.63  15.98  0.8673  6.191  3.561  4.076  6.06   2
16.84  15.67  0.8623  5.998  3.484  4.675  5.877  2
. . .
11.84  13.21  0.8521  5.175  2.836  3.598  5.044  3
12.3   13.34  0.8684  5.243  2.974  5.637  5.063  3

---------------------------------------------------
10.59  12.41  0.8081  4.899  2.63   0.765  4.519 (min values)
21.18  17.25  0.9183  6.675  4.033  8.456  6.55  (max values)
```

There are 210 data items. Each represents one of three species of wheat seeds: Kama = 1, Rosa = 2, Canadian = 3. There are 70 of each species. The first 7 values on each line are the predictors: area, perimeter, compactness, length, width, asymmetry, groove. The eighth value is the one-based encoded species.

If you could reduce each data item that has 7 predictors down to a vector with just 2 values, then you could plot the data on a graph to see if there’s anything interesting. The classical technique to reduce dimensionality is principal component analysis (PCA). I noticed a scikit-learn library (aka scikit or sklearn) documentation example that shows dimensionality reduction using linear discriminant analysis (LDA). I hadn’t thought about this idea before so I figured I’d investigate.

Before I go any further, let me note that, in many scenarios, using PCA for dimensionality reduction is a better option than using LDA because LDA requires labeled data but PCA does not. And, using a neural network autoencoder for dimensionality reduction is often better than either PCA or LDA because PCA and LDA make several often unrealistic assumptions such as the predictor variables are Gaussian distributed. (But a neural autoencoder is more difficult to tune).

Although normalization isn’t necessary for LDA, I decided to normalize the Wheat Seeds data. The ranges of the raw predictor values varies significantly. To normalize, I used the divide-by-constant technique. I dropped the raw data into an Excel spreadsheet. I divided the columns by (25, 20, 1, 10, 10, 10, 10). I also re-coded the target class labels from 1-based to 0-based. The resulting source 210-items looked like:

```0.6104  0.7420  0.8710  0.5763  0.3312  0.2221  0.5220  0
0.5952  0.7285  0.8811  0.5554  0.3333  0.1018  0.4956  0
. . .
0.7052  0.7990  0.8673  0.6191  0.3561  0.4076  0.6060  1
0.6736  0.7835  0.8623  0.5998  0.3484  0.4675  0.5877  1
. . .
0.5048  0.6835  0.8481  0.5410  0.2911  0.3306  0.5231  2
0.5104  0.6690  0.8964  0.5073  0.3155  0.2828  0.4830  2
```

I split the 210-item normalized data into a 180-item training set and a 30-item test set. I used the first 60 of each target class for training and the last 10 of each target class for testing. I didn’t use the test data but I wanted to have it for other types of problems such as multi-class classification.

I loaded the training data, used the data to create an LDA model, converted the data to its reduced form with two components, and then plotted the data. The graph shows that a couple of the Kama items (blue) fall into the Canadian (orange) data items. For example, the blue Kama dot at approximately x = 2.72, y = -0.04 (I put the mouse pointer over the dot) looks like it belongs to the orange Canadian group. A next logical step would be to search the transformed points, get the index of the anomalous data point, then use the index to get the anomalous point full vector.

Well, doing dimensionality reduction using LDA was an interesting experiment. Maybe it’s not the most useful technique but I learned a few things and so I’m satisfied.

Actor Kerwin Mathews (1926-2007) is little remembered today but he starred in three old fantasy movies I like. Each movie features a character dimension reduction via special effects.

Left: In “The 7th Voyage of Sinbad” (1958), Matthews played Sinbad the Sailor and he rescued Princess Parisa who was shrunk by evil wizard Sokurah. Special effects by Ray Harryhausen.

Center: In “Jack the Giant Killer” (1962), Matthews played farmer Jack who rescues the Princess Elaine from the evil wizard Pendragon who used a puppet that was a giant in disguise. Special effects by Jim Danfort.

Right: In “The 3 Worlds of Gulliver” (1960), Matthews plays Lemuel Gulliver who is shipwrecked in Lilliput where everyone is tiny relative to him. Special effects by Ray Harryhausen.

Demo code. The data can be found at https://jamesmccaffrey.wordpress.com/2023/04/04/the-wheat-seeds-dataset-problem-using-pytorch/.

```# wheat_dim_reduce_lda.py

# Wheat Seeds dimensionality reduction using
# scikit LDA

# Anaconda3-2022.10  Python 3.9.13  scikit 1.0.2
# Windows 10/11

import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import \
LinearDiscriminantAnalysis

# ---------------------------------------------------------

def main():
# 0. prepare
print("\nBegin Wheat Seeds dim reduce using scikit LDA ")
np.set_printoptions(precision=4, suppress=True)
np.random.seed(1)

train_file = ".\\Data\\wheat_train_k.txt"  # 180 items

# 2. create LDA model
print("\nCreating and fitting LDA model ")
model = LinearDiscriminantAnalysis(n_components=2)
model.fit(X, y)

# 3. transform
print("\nFetching the reduced version of predictors ")
reduced = model.transform(X)
# print(reduced)

# 4. plot
print("\nDisplaying data ")
colors = ["blue", "red", "orange"]
labels_num = [0, 1, 2]