Above we had a narrow scope of ethics by examining research ethics as a relationship between research subject(s) and researcher(s): how we seek to limit various harms caused by research and correctly balance them with benefits from the research. There is a set of topics in computational social sciences where we face ethical challenges not related to human subjects directly: the legislative limitations on data collection and analysis and impacts of our work. To some, computational social sciences emphasised the large-scale data collected through online spaces. Novel ethical questions emerge regarding who has access to the data and under which terms. For example, researchers' access to the data is often facilitated through same tools as commercial interests. Therefore, choices that these companies make on this access can severely limit scholarly activities (Bruns, 2019). To some degree, it is possible to circumvent many of these restrictions with extra work (Lomborg and Bechmann, 2014; Freelon, 2018). However, the online terms of services (often just called the TOS) explicitly prohibit such activities. Sometimes the terms of service even limit particular approaches of studying a phenomenon. For example, analysis tweets that users have posted and then deleted is against Twitter's terms of service, as developers are supposed to delete tweets they observe users have deleted (Meeks, 2016).
These terms are there to protect companies that have created these services and may have a business interest not to allow other commercial entities to copy their service or data. The TOS also set certain protections towards their users. However, as academics work towards public interest, the question is more complex. If research is needed and, in particular, if it is critical towards powerful actors, should we not seek to break the rules? Discussion about breaking the rules has been active and even gained tentative support from some (e.g. Freelon, 2018; Bruns, 2019; Venturini and Rogers, 2019). There are no clear commercial reasons why not to allow academic research, but researchers are bounded by the same legal framework as ordinary citizens. Therefore, companies might ask scholars to remove data collected through an unauthorised manner, which would naturally challenge any scholarly work. While breaking the TOS might not be critical for researchers, there are also laws that govern how research is conducted. If content created or owned by others is stored, copyright questions may be relevant; if PII is stored, privacy laws and regulations might be relevant; or particular ways of collecting the data could be seen as interfering with telecommunications systems. Therefore, one must ask if research is worth potential legal implications for researchers if they have been breaking laws and regulations (Freelon, 2018).
Beyond digital data collection and its challenges, legal aspects are increasingly impacting any research with human subjects because of privacy and data protection regulations. In Europe, the General Data Protection Regulation (GDPR) has given new rights to people in regards to their personal data. Universities and researchers have attempted to interpret implications that the new regulation has on practices; however, there seems not to be extensive consensus yet. In response to the GDPR, it may be that other countries also renew their regulatory frameworks. Thus, developments on these can impact how computational social science can be conducted today and in the future.
Beyond highlighting the differences between ethical considerations and legislative settings, computational social science ethics are also important when considering the research outputs - not only the results but also the code. In computational social science, the outputs can have various purposes. They can be algorithmic systems that classify people to categories, visualisations that reveal connections between people not visible otherwise, simulation models that are used to help policy decisions or interactive system ideas that can be later used to develop a full service. What are researchers' responsibilities in understanding how their research is used in society?
Scholars and scientists have previously asked the question of ethical responsibilities of outcomes. The most famous example is the Manhattan project, or the development of the atom bomb in the United States during World War II. Scientists who participated in the project have, during and after it, questioned if their work could be justified. For example, after the project, Oppenheimer, the project leader, championed against further developments of such weapons and highlighted how scientists should think about their responsibilities when doing work. Obviously, many of us are not working on weapons of mass destruction. However, as computational social scientists we work on weapons of math destruction (O'neil, 2016): algorithms. Algorithmic systems can have power in our society (e.g. Seaver, 2017; Gillespie, 2012; Bucher, 2012; Kitchin, 2017).
For example, in Chapter 3 we briefly mentioned Wang and Kosinski (2018) as an example of examining images. Their use case for image recognition was to examine if machine learning models can accurately detect sexual orientation from images. The result is: yes. In their work, they show machine learning models become more accurate than humans. (I read these results with a grain of salt. It remains unclear if observed differences relate more to different styles of presenting oneself on dating websites where the data were gathered.) Once such a model was developed, what would be the authorâs role in its use through the pipeline? Understanding sexual orientation can open new business opportunities for targeted advertisements or, more concerning, governments could use these technologies to oppress people. However, they also ask about their responsibilities if the study was not conducted:
Some people may wonder if such findings should be made public lest they inspire the very application that we are warning against. We share this concern. However, as governments and companies seem to be already deploying face-based classifiers aimed at detecting intimate traits (Chin & Lin, 2017; Lubin, 2016), there is an urgent need for making policymakers, the general public and gay communities aware of the risks that they might be facing already. Delaying or abandoning the publication of these findings could deprive individuals of the chance to take preventive measures and policymakers the ability to introduce legislation to protect people. Moreover, this work does not offer any advantage to those who may be developing or deploying classification algorithms, apart from emphasising the ethical implications of their work. We used widely available off-the-shelf tools, publicly available data and methods well known to computer vision practitioners. We did not create a privacy-invading tool but rather showed that basic and widely used methods pose serious privacy threats. We hope that our findings will inform the public and policymakers and inspire them to design technologies and write policies that reduce the risks faced by homosexual communities across the world (Wang and Kosinski, 2018, 255).The quote demonstrates difficulties with research ethics when focusing on outcomes of the ethical evaluation. They provide arguments about why they consider their research effort will do more good for society through making it explicit that such analysis can be done and asking people to address this in their behaviour and in regulations. Similarly, it is still debated if the atomic bombings of Hiroshima and Nagasaki and the massive human sacrifice could be justified by the benefits they had: a quicker ending to the war between the allies and Japanese, which saved a significant number of lives. The ethical questions often do not have a single clear answer, but it depends on what assumptions you make about ethics.
The challenges of further using technologies are not unique to algorithmic data analysis but present in many areas of computational social science. In Chapter we highlighted how simulation models can help in policymaking. In the spring of 2020, when I was writing this chapter a lot of discussion was around COVID-19 pandemic and different models that have been used to develop policies for decision making. In Finland, simulation models have been used to predict how economic policies impact underemployment or distribution of wealth. These simulations are not only tools for scientists but are also taken in use and interpreted by politicians, media or even the public. Sometimes this means that there are interests in understanding or even seeking to influence parameters or simulation rules to provide results more in the politically expected direction. For example, it can be claimed that the simulation overestimates (or underestimates) some of the dynamic economic impacts of an economic model, thus not giving an accurate prediction of the simulation. Similarly, I have heard of a case (not in Finland) where researchers estimate that a program had a transition rate of 10% from underemployed to employed. The rate of 10% was not based on a hunch, but researchers had examined available data. However, political leaders examined the model when in development as part of workshops to ensure that the policy model would benefit them. Once hearing about this detail, they argued that their experience says that the rate is closer to 15% than 10%, suggesting that the model was overly pessimistic. I do not recall how the story ended: Was the model developed with a 10% or a 15% transfer rate? However, this illustrates interest in shaping tools so they would be more aligned with political aims. How should one respond to such requests, especially given that estimating some parameter values after examining data is sometimes magic? For example, during the COVID-19 epidemic many fundamental parameter values are not explicit. As many countries have had limited testing capabilities, even the mortality rate is an estimate. The different results that modelling groups have in part relate to differences in these kinds of parameters. Therefore, additional informants can provide justifiable and good insights. However, at the same time, one should not forget that there may be embedded political interests in the development of computational social science.
I think that a chapter about research ethics would not be complete without some personal reflections. I was involved in the development and maintenance of a hate speech detection tool in Finland during the 2017 Finnish municipal election (colleagues and I have sought to disseminate these observations, e.g. Laaksonen et al., 2020; Haapoja et al., 2020). The aim never was to analyse ordinary citizens, but instead we analysed candidates and their social media activity, with the overreaching aim of limiting the use of hate speech rhetoric in political campaigns. To ensure delicacy and discretion, we shared the results of the analysis only with the public officials (with whom we were collaborating). Public discussion emerged in response to the activity, focusing in part on how we classify a concept of hate speech and if such a normative concept could be used. (We have discussed this in our team and understood that classifications matter (Bowker and Star, 2000b). With our analysis, hate speech referred to explicit references of violence towards a group based on some protected attribute, such as ethical background or political status. However, when communicating about the work we failed to properly acknowledge these reflections and used a politically sensitive term when describing our activities. Reflecting on it now, I think we should really have emphasised not hate speech but civilised discussion and also explicitly said that we think discussion on difficult and sensitive topics is essential and can even be strongly formulated as long as people avoid suggesting that people should be killed. Beyond a better marketing strategy, I would also collaborate with political parties to legitimise the classification framework.) However, beyond the data engineering and machine learning activities, the tool itself could be seen as a probe we developed and sent out to the world. Haapoja et al. (2020) develops this direction of thinking further by showing how the hate speech monitoring tool was a paw in a game between different political interests in Finland. Doing such research is far from traditional value, neutral research and brings out the idea of research as activist work. One reason I participated in the development of this tool was to improve the political campaign to focus on civilised political discussion instead of promoting violence. For me these are important aspects for functioning democracy, to ensure social stability and inclusion into the political processes. However, I understand anyone who has raised their eyebrows while reading about this work. There are important values on the other side of the table - freedom of expression being the most prominent. Therefore, when we work on our weapons of math destruction, it is clear that our personal perspectives and values drive forward ideas about what to research and how to approach the research topic (further on the latter, see Nelimarkka et al., 2017, 4542). Therefore, the ethical questions do not only focus on what we do and its potential implications but also can encompass how we envision the society to be.