Clarity helps everyone, including the future you examining the code at a later time, to understand what the code is supposed to do. Deissenboeck and Pizka (2006) analysed three large-scale software products and determined that around 60-70% of the code was identifiers, such as names of variables, functions and source code files. When unclear identifiers are used, it becomes harder to understand what software does. Unclear identifiers can obfuscate the code, make it difficult to understand what the programmer aimed to achieve, make it challenging to confidently understand what role the variable has in the code or even understand if the code is relevant to the changes worked on (Lawrie et al., 2006; Deissenboeck and Pizka, 2006). There are easy tricks to help address these challenges.
First, use full words or clear abbreviations when writing code. Often, simple variables are only a few characters long, which rarely provide enough details to work on (Deissenboeck and Pizka, 2006). Short names do not allow full comprehension and decrease the speed of writing code as well as its quality (Lawrie et al., 2006; Hofmeister et al., 2019). The reasons for using short names can be that when writing the code, the idea is clear in oneâs mind. However, after I started to teach programming, I observed that students seemed to understand better when I wrote variables with full words and have since tried to improve my code-writing habits.
Second, identifiers should have high consistency and clear conceptual links (Deissenboeck and Pizka, 2006). When working with software code, we present our thinking through identifiers. They map something outside the code into the real-world problem. These could be, for example, theoretical constructs, operationalisation of theories into variables and procedures or data stored for further processing. These could be clearly visible in the identifiersâ names. age, sex and location are much easier to understand than a, s and l. The other importance is consistency. The same idea should be named in the same way throughout the code. Homophily should be avoided. There should not be many identifiers referring to the same idea. In the examples, we have used the variable line consistently to refer to lines read from files, even in cases when we have read several different lines at the same time.
Code clarity can also be improved by checking that the code is formulated consistently: lines are organised at correct indention levels, space is used to organise the code to coherent blocks to illustrate logical bits (see Code Example 8.9 where stages of data processing are explicated by extra line breaks) and the code does not look ugly. Furthermore, code can be commented on by adding notes that are not executed by the computer but can be read by the programmer to quickly grasp what the code does and what the purpose of a variable is. In Python, comments are marked with # and everything after is not executed by the program; in R, the symbol for this is %. An important use for comments is also to mark down weird solutions to a problem, such as workarounds or `temporary fixes' that should be rethought. Comments are not limited to the code itself. They can also be used to remind about operationalisations, theoretical concepts and other thinking about the research that has been formulated in the code.
Some clarity-related decisions are more fundamental, relating to the problems in the solution strategies. Among software professionals, these are known as code smells (for reviews, see Sharma and Spinellis, 2018; Zhang et al., 2011). These smells include:
These code smells are often caused by high workload and time pressures, leading to creating bad code smell solutions (Tufano et al., 2017). One approach to avoid bad code is refactoring the code: go through it and reduce decisions that cause smelly code. This is not a silver bullet for all issues but may be a practice that helps to manage code and increase its clarity.