In business settings, software projects are known to fail. Less than half are finished on time, on budget and with the correct scope. As code is an essential tool for computational social scientists, we should similarly understand why industry is preforming so badly and what antidotes they have developed for this. Given the extensive interest and opportunities to perform better, scholars have written a lot about software projects. Fairley and Willshire (2003) illustrates the challenges using the analogy of Vasa, the Royal Swedish Navy's flagship boat launched in 1628. Vasa sunk after only 1,300 meters, or clearly less than a nautical mile. Petrillo et al. (2009) examined how game developers themselves describe the major reasons for project failures. There are more commentaries on these (Charette, 2005). What follows is an attempt to integrate and summarise these observations and contextualise them into the computational social sciences domain.
A common challenge in software projects is the scope, such as feature creep, or extensive addition of new ideas throughout the project, requirements to change the goals or setting unrealistic goals. In scientific software creation, similar challenges can relate to the complexity of the analysis process. Using fine-grained analysis approaches, applying more advanced analysis tools or accounting for novel insights during the process are common phenomena that may occur. For example, data analysis can easily focus on adding more theoretical insights into the model, which complicates the project and the code. However, the challenge of feature creep may be familiar to all research activities. The struggle between what one thinks and what is finally presented out of that in the published work is something I believe every senior researcher can relate to. The same struggle is real in software as well.
The second challenge is extensive innovations: seeking to use or even extend beyond the state of the art. In software projects, this can lead to unforeseeable technological problems. In scientific work, this could include testing novel tools or pushing the boundaries, which is the main purpose in many research projects. My hunch, which should be taken with a grain of salt, is that it is difficult at the same time to do empirical cutting-edge research and push the methodological boundaries. If there are interesting novel empirical findings with novel methods, are they created by the methods or are they really there? Similarly, cutting-edge work is always a risk, a feature that many scholars can share across whatever they are doing.
The final broader set of challenges relate to scheduling. With software it is sometimes difficult to estimate how long it will take. While writing this book, I have tried to also pre-process some data for further analysis. Some of my collaborators are frustrated, as it has taken more than a month now. Initially we hit some software environment problems, but after that it just seems that pre-processing takes a lot of time. (These are not about writing the code but only the steps to where it is run.) Scheduling pressures can lead to smelly code (Tufano et al., 2017), which then leads to further complications down the line.
The recommendations on how to avoid these challenges include having a clear goal of what the code is expected to achieve, increasing transparency among project teams, prioritising the most important aims and having proper plans in place about the software and its functions. Similar practices may help in projects using coding for sciences as well. What I really emphasise is that software created for scientific analysis should always be prepared as something used for a longer period of time. While this is not always the case, various reuses can happen. Understanding that software can be problematic and often a non-trivial component means that allocating sufficient resources to it may be helpful.