An Introduction to Decision Trees

Decision Trees are a machine learning technique which exhibit excellent classification performance. Unfortunately, they are not without their fair share of shortcomings.

What are decision trees?

A decision tree is a graph that allows you to reach an outcome (a decision) by repeatedly evaluating conditions. Once a condition has been evaluated, you follow the corresponding branch to the next node. This node may either be another condition to be evaluated or it may be a leaf containing a decision.

What are Decision Trees used for?

In the same way that humans use their past experiences to influence their future decisions, a decision tree uses training data to make a decision on what the outcome should be. For example, a simple one level tree may ask if a user has liked Facebook pages about cycling to decide whether or not to show them an advert for a new bicycle.

Step 1: Preprocessing The Data

We must sculpt our dataset to make it suitable for the construction of a tree.
One of the most common ways of preprocessing data is dimensionality reduction.

Dimensionality Reduction

What is high-dimensional data?

High-dimensional data sets are those that are defined by a large number of distinct variables. For example, the current health status of a patient can contain over 100+ different parameters, including immune system status, genetic background, nutrition and so on. It is essential that we focus only on the parameters that are strictly necessary to solve the problem we are tackling, therefore it is necessary to eliminate some of these parameters whilst retaining relationships within the data.

How can we reduce dimensionality?

One of the foremost algorithms for dimensionality reduction was put forward in the paper Nonlinear Dimensionality Reduction by Locally Linear Embedding (Roweis and Saul, 2000). Locally Linear Embedding (LLE) takes a high-dimensional dataset as an input and maps it to a single, global coordinate system of lower dimensionality. It is particularly adept at identifying complex non-linear groupings of data in high dimensions and maintaining these groupings whilst mapping data points to a lower dimensional space.

Step 2: Building The Tree

After our dataset has been prepared, we must now convert it into a decision tree using the ID3 algorithm, which takes the data and does ABC before doing XYZ.

Building Decision Trees

Issues with building trees

One of the main issues with decision trees is overfitting. This is when the decision tree has been built to exactly fit the training data. As such, when the tree is exposed to some new, real-world data, it gives poor outputs.

The ID3 tree-building algorithm

The ID3 algorithm was first proposed by Ross Quinlan in 1986. Despite its age, and being improved upon by C4.5 and C5.0, it remains in common use today, as the underlying concept is simple and easy to implement.

Step 3: Postprocessing

This step involves analysis of how our tree performs in order to increase effectiveness by adjusting the model. To do this we will consider common weaknesses with decision trees.

Measuring Performance

In order to tweak our model to increase performance, we need to know how it performs in the first place. Here we look at common issues that are often exacerbated by poor implementation so that we know where to start when fixing them, as well as best practices for testing the effectiveness of our model.

Random Forests

This extension to decision trees attempts to combat overfitting by reducing variance. This is particularly useful when we are building trees from very large, but ‘dirty’ or noisy datasets (those with many outliers). These kind of datasets are common in big data, where the masses of data being collected make it near impossible to clean at a preprocessing level.


Adaboost and RobustBoost are boosting methods for machine learning algorithms, which combine multiple, weak classifiers and weight them according to their performance. This weighting based upon performance results in a stronger classifier when taken as a whole. Use of negative weights even allows bad classifiers to contribute to improving performance


Digits Case Study

If you have had enough theoretical talk, please click below for a real-life case study on decision trees for a handwritten digit dataset, with an implementation written by us in Python.

Digits Case Study

The Team

Working on this project has been a privilege.
We hope you enjoyed our work.

Rohan Padmanabhan

Rohan studies BEng Computing at Imperial College London, where he is an elected representative for first year undergraduates. He was responsible for writing much of the content on decision trees and the ID3 algorithm. In addition to this, Rohan wrote many of the summary paragraphs on this very home page.

Rohan Pritchard

Rohan studies MEng Computing with Artificial Intelligence at Imperial College London. He was responsible for writing the content on random forests, as well as utilising machine learning libraries (such as SciKit in Python) to implement some of the tools we discussed within these pages.

Zeshan Amjad

Zeshan studies MEng Computing at Imperial College London. He was responsible for writing the content on preprocessing datasets and dimensionality reduction. In addition to this, he designed, structured and coded much of the current website in strong collaboration with other team members.

George Zhelev

George studies MEng Computing at Imperial College London. He was responsible for writing the section on post-processing boosting of decision trees and other classification algorithms, as well as providing support for other team members during the research phase.