Another direction of algorithmic data analysis is to provide examples for an algorithm and ask it to define rules based on those examples, which can be later used for classification. Regression analysis is a great example of this process. There are examples of `correct' classification, , and for each there is a set of related variables . Regression analysis determines coefficients so that by seeing we can estimate the value of . This idea is also communicated as an equation , stating that the value of depends on variables .
The major difference between example-driven approach to data analysis and a rule-based approach is the process of how are determined. In a rule-based approach these values are based on an expert evaluation and consideration. In the case of text analysis, both `happy' and `joyful' had the same value because an expert decided that they presented the same level of positive sentiment. Therefore, the sentiment of the sentence was determined by calculating the count of the words `happy', `joyful', `sad' and `unhappy' in the sentence. Put mathematically, the sentiment was . Values of s are . Learning the rules from examples determines these values not by expert evaluation but through looking evidence. This means that they could be different for each word. In practice, these values would be determined by going through a large number of sentences, each of which would have a correct sentiment value labelled to it. These would be known as ground truth values: the absolute real truth to which the machine learning model is developed. Based on the ground truth values, the impact of each word (or variable) can be evaluated.
This approach is known as supervised machine learning. It is always based on example data that includes variables used in the model building (dependent variables) and variable(s) describing the outcomes (ground truth or independent variables). However, there are several algorithms that can be used for supervised machine learning problems.