Beyond the regression models, supervised machine learning algorithms include decision trees, random forests and support-vector machines. While these examples focus on a case where there is a binary variable, these methods scale to a larger number of classes as well. Variants of these methods may even be used with a linear variable, similar to regression models.
Decision trees are based on a set of questions where the answer is yes or no. Each leaf explicates a question and based on the answer, one moves forward to the next leaf. Figure 3.3 illustrates a decision tree predicting how happy people are in Sweden based on World Value Survey Data.3.1The first question is if leisure time is very important to the responder or not (question V6, measured in the survey using four steps, where 1 is very important and 4 is not important at all). If it is, we further check if family is very important (next child, question V4) and if work is very important (last child, question V8); then the responder is most likely very happy. If the responder finds family very important but work only as important or less, then the person is likely rather happy. Furthermore, as this approach asks true or false questions from the data, it can determine clear-cut points (like responding 3 or less or 2 or less to survey questions).
Decision trees are built by choosing the best split for each leaf. This is done by going through all possible variables (features) at each level and measuring how using that variable as the question would help draw conclusions. The best variable and question are chosen for each leaf. This is continued until a leaf is exhausted and leads to determination of a category. Decision trees are run once, and errors such as overfitting may occur. However, better results may emerge when the analysis process is run several times to fix issues of overfitting. This is where random forests are useful: They are decision trees on steroids. Random forests run the decision tree generation process multiple times on the data and `average' the results to a single model.
Support vector machines similarly seek to divide the data into classes. Instead of looking at individual variables, support vector machines seek to `draw lines' in the data, separating different classes well. Figure 3.4a illustrates data entries from two classes, men (blue) and women (red). They are positioned on the diagram based on their height (vertical y-axis) and voice pitch (horizontal x-axis). If we are asked to draw a line that would separate these two groups, we could do it in unlimited ways. Figure 3.4b shows three possibilities for drawing such lines. The importance of choosing the least-wrong line emerges only when new, unclassified data are added to the data set (see Figure 3.4c).
Supervised machine learning methods solve the line-drawing problem by maximising margins for the line. It attempts different lines and, based on that, chooses which has maximal margins. Therefore, from Figure 3.4b, the method would choose the black line because it has the highest free area both on its left and right sides. Beyond only linear (straight) lines, support vector machines can also use polynomic, radial or Gaussian approaches to drawing these lines. These are known formally as kernels of support vector machines. Similarly, real data are usually not two-dimensional like in Figure 3.4. Data can be multidimensional; they can have more than two dependable variables (or features) in the data set. The process works similarly in higher-dimensional and non-linear kernels. Instead of writing code for all of these cases, scholars use data analysis libraries for these tasks (see further in Section 8.2). These examples seek to provide intuition on how these systems work.