While we initially focused on categorising data, algorithmic data analysis methods can also be used to examine associations between variables. The output itself is not explicitly like categorisation or prediction but can be seen to inform us about connections in the data set. The process and output of finding associations are like qualitative comparative analysis: ref Both processes examine under which conditions there is strong evidence towards a specific output. The most familiar case for most is a shopping recommendation: `Customers who bought this, also bought these itemsâ. In this case, if we know that the customer has milk in the shopping basket, we ask `a reverseâ rule: What did people who ended up buying milk also buy? Then we recommend those to the customer. When conducted computationally, we use association rules learning. In social sciences, finding such rules, or conditions that lead to a specific outcome, have some opportunities. Motivated by qualitative comparative analysis, examining different cases may be an area where these methods can be extensively used.
For example, Jurek and Scime (2014) used association rules to examine which kinds of leadership countries have democratic and free societies. The study of conditions for democratic societies is an age-old question in political science, opening many questions, such as: What is a democratic society and what is freedom? Using association rules mining, they, for example, identify that if the leader of the country has been in office for 5 to 10 years, the country is likely to be free, but if the leader has been in the office for longer than 10 years, the country is less likely to be free. These are conditions that would help us to further categorise data we have not previously seen in similar cases. However, association rules learning, thanks to its helpful output of rules, can also be used to investigate the phenomena and understand what conditions explain an output well.
Another approach for machine learning is semi-supervised machine learning. If the data set does not have ground truth values included, adding them manually to the data is costly and time consuming-limiting the opportunities to use supervised machine learning. However, the challenge with unsupervised machine learning is that the established groups cannot be predefined; they are established using statistical processes with minimal human involvement. In these cases, semi-supervised machine learning balances between these two approaches. For training the model, it uses data both with ground truth values and without ground truth values. Using such training data sometimes leads to improved model accuracy.