“Multi-Class Classification Using LightGBM” in Visual Studio Magazine

I wrote an article titled “Multi-Class Classification Using LightGBM” in the May 2024 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/Articles/2024/05/02/LightGBM-multi-class-classification.aspx.

A multi-class classification problem is one where the goal is to predict a discrete variable that has three or more possible values. For example, you might want to predict a person’s political leaning (conservative, moderate, liberal) from sex, age, state of residence and annual income. There are many machine learning techniques for multi-class classification. One of the most powerful techniques is to use the LightGBM (lightweight gradient boosting machine) system.

LightGBM is a sophisticated, open-source, tree-based system that was introduced in 2017. LightGBM can perform multi-class classification, binary classification (predict one of two possible values), regression (predict a single numeric value) and ranking.

My article presents a complete end-to-end demo. LightGBM has three programming language interfaces — C, Python and R. The demo program uses the Python language API.

I used one of my standard synthetic datasets. The raw data look like:

F  24  michigan  29500.00  liberal
M  39  oklahoma  51200.00  moderate
F  63  nebraska  75800.00  conservative
M  36  michigan  44500.00  moderate
F  27  nebraska  28600.00  liberal
. . .

When using LightGBM, you encode categorical data using ordinal encoding. Unlike most machine learning classification techniques, you don’t need to normalize numeric data. The encoded data looks like:

1, 24, 0, 29500.00, 2
0, 39, 2, 51200.00, 1
1, 63, 1, 75800.00, 0
0, 36, 0, 44500.00, 1
1, 27, 1, 28600.00, 2
. . .

My article explain how to install Python and LightGBM for readers who are new to both.

The key statements that create and train the demo LightGBM multi-class classifier are:

  print("Creating and training LGBM multi-class model ")
  params = {
    # 'objective': 'multiclass',  # not needed
    'boosting_type': 'gbdt',  # default
    'num_leaves': 31,  # default
    'max_depth':-1,  # default (unlimited) 
    'n_estimators': 50,  # default = 100
    'learning_rate': 0.05,  # default = 0.10
    'min_data_in_leaf': 5,  # default = 20
    'random_state': 0,
    'verbosity': -1  # only fatal. default = 1 error, warn
  }
  model = lgbm.LGBMClassifier(**params)  # scikit API
  model.fit(train_x, train_y)
  print("Done ")

The main challenge when using LightGBM is wading through the dozens of parameters. The LGBMClassifier class/object has 19 parameters (num_leaves, max_depth and so on) and behind the scenes there are 57 Learning Control Parameters (min_data_in_leaf, bagging_fraction and so on), for a total of 76 parameters to deal with. A large section of my article explains which parameters to change and which to leave alone.

Arguably, the two most powerful techniques for multi-class classification on non-trivial datasets are neural networks and tree boosting. In some recent multi-class classification challenges, LightGBM entries have dominated the contest leader board. This may be due, in part, to the fact that LightGBM can be used out-of-the-box, which leaves a lot of time for hyperparameter fine-tuning. Using a neural network classifier requires significantly more background knowledge and effort.



It’s sometimes difficult to classify films because several different genres can be represented. One of my favorite classifications is science fiction mystery movies.

Left: Dark City (1998) – A man wakes up next to a murdered prostitute. He can’t remember who he is or how he got there. He is pursued by the creepy Strangers. Who are they? Why is it always night time?

Center: Gog (1954) – A government investigator (Richard Egan) is sent to a super-secret underground desert laboratory complex to solve a series of bizarre deaths. Which of the many suspects is responsible?

Right: The Power (1968) – One of a group of six scientists, including one played by George Hamilton, has super mind control powers and is murdering the others in the group one by one. Who is it and how can he/she be stopped?


This entry was posted in Machine Learning. Bookmark the permalink.

Leave a comment