The first perspective emphasises novel data sources and their transformative effect on social inquiry. Novel data sources allow asking new questions left unanswered by social science research. This perspective is often advocated by computer science and data science researchers. In the influential research agenda-setting paper, Lazer et al. (2009) demonstrate this perspective well. The paper highlights how versatile digital data sets can be used for social science. Digital data sets include emails, smartphone sensory logs and social media data (Lazer et al., 2009) (see also Golder and Macy, 2014; Lewis, 2015, for discussion). The paper suggests that these types of data sources are a microscope. The data allow scientists to investigate a new detailed level of society. This perspective links computational social science with digital data set collection and analysis.
Adamic and Glance (2005) illustrate this data-driven approach to computational social science1.1. Their work focuses on the U.S. presidential election of 2004 and studies how blogs, both liberal and conservative ones, link to other blogs. The gist of their work is that liberal and conservative blogs do not cross-link, but rather, the authors tend to link to blogs with a similar perspective to their own work. Thus, their work shows that political polarisation occurs in political blogs. There were clear divisions based on parties. What makes this work a representative example of a data-driven approach is both the data collection and analysis approach applied. Adamic and Glance (2005, 7) highlight the size of their data, stating that `BlogPulse currently monitors over 5.5 million weblogs and indexes 450K weblog posts per day.' The analysis is worded to illustrate how data drive the analysis and not, for example, theory. For example, `The set of informative phrases was extracted using a phrase-finding algorithm that identifies phrases that are most informative with respect to a background model of term frequencies in weblog data' (Adamic and Glance, 2005, 10).
Another example of a data-driven approach is using data from digital sensors to understand society. Eagle and Pentland (2006) explored how mobile phones can act as sensors to study complex social systems, an approach they call reality mining. Eagle and Pentland (2006, 255) argue that surveys, an established data collection method for social scientists, are `plagued with issues such as bias, sparsity of data, and lack of continuity between discrete questionnairesâ. Therefore, they are interested in data available through phones, such as location and nearby people. Since this work, the capacity of smartphones has increased. Location can be collected more precisely, data on movement style, or even heart beats nowadays. Eagle and Pentland (2006) used their data set to understand routines and organisational rhythms. Among all their observations, they claim that `the data these devices have returned to us is unprecedented in both magnitude and depth' and call for further investigations across a wide range of research areas.
As seen, the examples focus on analysing digital data sources, such as user-generated content (blog posts and their links) or data sets from novel sensors collecting information about our surroundings. I agree that digital data sets have opened novel avenues for researchers. The digital traces have allowed researchers to study online behaviour from perspectives such as social contamination and diffusion, social exchange and collective action (Golder and Macy, 2014). Yet, there are challenges as well. Often, digital trace data are not intended for research. This creates methodological and ethical concerns (Salganik, 2017). We will address these concerns in a more extensive manner in chapters to come.
Clearly, digital data sets are beneficial to social research. However, it is not enough to consider computational social science as a discipline that exists to use digital trace data and other digital data sets. In this book, I will advocate that this is not the case. However, I am not alone with these thoughts. Wallach (2018), a computer scientist at Microsoft Research, highlights that computational social science ought to do more than use digital data from social phenomena. She calls for more integrated collaboration with social science researchers to understand these digital data sources in more detail. Furthermore, a researcher with a computer science perspective is often focused on tasks like predicting variables from the data. This has not been traditionally the focus in social science research, where the aim of scholarship is to explain a phenomena and not to predict variables in the data sets (Wallach, 2018).