I recently got badly bitten by functions with default arguments.
When writing code myself, I usually don't introduce default arguments. Until now, the only acceptable case to do this was for existing functions, which needed additional features, but the functions were used in multiple places, and making the change without default arguments would mean many modifications.
Now, I changed my mind. Note to self: forget default arguments, they don't exist. Really, forget them...
Do you ever read the documentation when using functions you did not write? Do you understand what each argument does and what the impact is? I definitely don't.
The story started with the simple task of writing a CSV file.
df_1 <- data.frame(
stringsAsFactors = FALSE,
x = c(NA_character_, letters[1:3])
)
out_file <- tempfile(fileext = ".csv")
# I naively expect to need only the "x" and "file" parameter,
# I should not care about the rest... (big mistake, see below)
utils::write.csv(
x = df_1,
file = out_file
)
# I'm again very naive and assume I only need to set the "file"
# argument, everthing else should work....
df_2 <- utils::read.csv(file = out_file)
# If my naive assumptions above were right, the input and output should be
# the same
waldo::compare(df_1, df_2)
# `old` is length 1
# `new` is length 2
#
# `names(old)`: "x"
# `names(new)`: "X" "x"
#
# `old$X` is absent
# `new$X` is an integer vector (1, 2, 3, 4)
dplyr::glimpse(df_1)
# Rows: 4
# Columns: 1
# $ x <chr> NA, "a", "b", "c"
dplyr::glimpse(df_2)
# Rows: 4
# Columns: 2
# $ X <int> 1, 2, 3, 4
# $ x <chr> NA, "a", "b", "c"
I was for sure just unlucky to use the wrong tool, let's see if other tools do better.
dt_1 <- data.table::data.table(
stringsAsFactors = FALSE,
x = c(NA_character_, letters[1:3])
)
out_file <- tempfile(fileext = ".csv")
# again, I expect the "x" and "file" arguments to be enough
data.table::fwrite(x = dt_1, file = out_file)
dt_2 <- data.table::fread(file = out_file)
# compare the input and output, to make sure my naive assumptions were right
waldo::compare(dt_1, dt_2)
# old vs new
# x
# - old[1, ] NA
# + new[1, ]
# old[2, ] a
# old[3, ] b
# old[4, ] c
#
# `old$x`: NA "a" "b" "c"
# `new$x`: "" "a" "b" "c"
As you can see in the examples, naive assumptions give unfortunately wrong results. Worse, the same naive assumptions give different results, depending on the used tools.
The bug I mentioned in the beginning was due to the usage of utils::write_csv() having the default argument na = "NA".
This resulted in a file having the string "NA" for missing values. Since the exported file was read by another program outside R, which just expected strings, the problem was noticed too late, when the values were already imported in other systems...
Seems like a simple mistake, but it triggered big problems... There are plenty of such examples, with quite significant consequences, e.g. UK air traffic control meltdown from August 2023.
This is not to blame the authors who wrote those functions, I'm sure they did their best, and most of the time I profit from their work. This is just to remind myself that I have to learn to use those tools carefully, and not rely on naive expectations...
Make a promise. Show up. Do the work. Repeat.