To use a decision tree to classify data, we must have a set of existing data from which we can build a tree. For example, if we have a data set of basketball players that includes their height, sprint speed, free throw average, playing position, ect we could build a decision tree to help decide where each player should play (1). Using this tree, we can then decide where new recruits should play based off the same attributes (excluding playing position of course). Another example would be if we had existing data on video streaming users (age, gender, previous view, star ratings) then we could use a decision tree to recommend a new TV show to another user.
When making a decision tree, we wish for the tree to be as short as possible on average. This can be thought of as the average number of questions (conditions evaluated) needed to go from the start (root node) all the way to taking a decision. To understand this, we will look at the simple example below.
Say you work for a large food retailer and you have a product you would like to advertise to your pregnant customers. However, the information you have from your loyalty card scheme does not include pregnancy status so you want to first be able to establish if the customer is pregnant based off their age, gender, and shopping history. You have manually collected pregnancy data on some of your customers but not all (say, from customers who have signed up to a newsletter). To do this, you could ask questions about whether they have recently bought baby formula, baby clothes, baby bottles, et cetera (2). However, if after asking all these questions you then ask if the customer is male or female and the answer is male then all the previous questions have been wasted and the average number of questions that must be asked is far greater. This means that the trees are bigger and take more time to filter through. As such, we can see that the order in which the questions are asked makes a great difference as to how efficient the tree is.
In general, the most efficient way to construct a tree is to a such that if a question has n possible outcomes, it should split the group into as close to n equal parts as is possible. To see why this is the case, and to learn about entropy, read the section on decision trees.