We have in previous chapters focused on using techniques like for-loop, lists and dictionaries to avoid repetition in the code. There is another way as well: using functions (sometimes also called methods). You have already used various functions without knowing it. For example, the length of a list (in Python len(special list), in R length(special list)) is a function that determines how many items a list in the variable list name has. This ensures that you can easily access this idea and not always need to copy and paste the code (Code Example 8.1 shows how you would manually compute the length of a list.) We cannot use previous techniques to avoid copy-pasting the code because while the code looks similar, the name of the list variable can change. The code is otherwise exactly the same, but the variable we are iterating over - or which length we are computing - changes. If one would copy-paste the code to determine the length of the list, they would need to make small changes to the code as well to examine the length of a special list or another list.
This is where functions excel. A function consists of input variables, code being executed with the input variables and an outcome variable, often called function output or value returned by the function. In the case of list length, the code in the function takes a list as input (stored in a variable that is a list). The code executes a new temporary variable length of list and runs a for loop. The outcome is available in the variable length of list, which is returned in the end of a function to another code that could use the value. By temporary variable we mean that variables created in the function are visible only within the function. Once the execution of the function is complete, the variable length of list is destroyed. The returned outcome is stored into a separate variable. This is crucial to allow us to easily compute outcomes based on different inputs. The variables in the computation are relevant only within that function.
Functions can help in executing the same repeatable tasks across different inputs. For example, data cleaning is a case where the same commands are executed to a large number of variables. Table 8.1 shows salary data - monthly salary and months employed. To calculate yearly income, we would multiply these. However, the data are formulated weird. The variables are separated by a space, and salaries have the letter D to separate decimals instead of a dot we would expect. To transform these into numbers, we would first need to replace D with a . and then cast the variable into a number. This is done in exactly the same way for both monthly salary and months employed. Instead of writing the code twice, it is possible to create a function that solves this. Code Example 8.2 shows how the function is defined and how its code is displayed. The function has a name, fix_format, and an input variable text, also called the parameter of the function. The function computes the outcome variable number using the input values. The outcome value is returned in the end of the function, which makes the outcome value accessible for the program where it was used. Naturally, functions can be more complicated than two lines like in this case.
After creating the function, we need to use it when solving our problem: calculating the yearly income based on the file in Table 8.1. As Code Example 8.3 shows, the solution strategy is similar to the usual approach of working through such files: read it line by line and separate the monthly salaries for months employed for the calculation. To clean the numbers, we call the cleaning function and store the outcome values into variables. Finally, we multiply to compute the final answer.
The example demonstrated how a function is used as a simple machine by transforming an input into an outcome variable. This allows creating helper functions for some common repetitive tasks and creating the code only once. Functions can also help to divide the problem into sub-goals. In these cases, the narrower sub-goals help make the larger problem easier to understand and solve. Sometimes it is also easier to write shorter functions than try to solve all the problems in a larger code. Sometimes such sub-goals may require more than one input variable, but this is not a problem. Functions can take more than one input as parameters. While a function can return one output value, the value can be a dictionary or list to return more than one thing. However, it is not mandatory for a function to return an output. Sometimes a function might, for example, print data in a preferred and easy-to-read format but not return anything.
Code Example 8.4 demonstrates how measuring differences between two experimental groups in a between-subject experiment could be inspected. Like many others, we do this evaluation through computing mean and variance of an experimental score between the two groups. We divide the task into three different sub-goals, each achieved through its own function: