One step to write good is to write code that is readable. "Readable" in the sense that the person reading your code (and don't forget, this person might be you in a few months) can quickly understand it, or at least see the picture, i.e. the little details might not be obvious.

What's "readable" is a matter of taste, but if in doubt, you can find recommendations from more experienced people, see for example The art of readable code.

Below I describe a few things I found useful in my coding, maybe you can pick something for yourself.

A while ago, a beginner in R asked me what's the point in writing functions in R At that time I was trying to convince him not to write hundreds of lines of code in an R script, but to extract them into functions, and call the functions instead. I admit I did not have a good answer, apart from: believe me, it's better this way. This is because functions seemed so natural to me, that I never questioned them.

I believe I have a better answer now. Science tells us that we can only keep in our short term memory about 4 chunks  (isn't that nice when I can blame science for my shortcomings?).

As examples I use pieces of code from testthat, a package that I find very useful and use extensively.

There are many ways to skin a cat, so let's see some of them. Here a few functions, let's concentrate on show_status first.

colourise <- function(text, as = c("success", "skip", "warning", "failure", "error")) {
  crayon::style(text, style(as))
}

style <- function(type = c("success", "skip", "warning", "failure", "error")) {
  type <- match.arg(type)

  c(
    success = "green",
    skip = "blue",
    warning = "magenta",
    failure = "orange",
    error = "orange"
  )[[type]]
}

show_status <-function(data, complete = TRUE, pad = FALSE, should_update = FALSE) {
  if (complete) {
    if (data$n_fail > 0) {
      status <- crayon::red(cli::symbol$cross)
    } else {
      status <- crayon::green(cli::symbol$tick)
    }
  } else {
    if (!should_update) {
      return()
    }
    
    if (data$n_fail > 0) {
      status <- colourise(status, "failure")
    } else if (data$n_warn > 0) {
      status <- colourise(status, "warning")
    }
  }

  col_format <- function(n, type) {
    if (n == 0) {
      " "
    } else {
      n
    }
  }

  message <- paste0(
    status, " | ", sprintf("%3d", data$n_ok), " ",
    col_format(data$n_fail, "fail"), " ",
    col_format(data$n_warn, "warn"), " ",
    col_format(data$n_skip, "skip"), " | ",
    data$name
  )

  width <- 3

  if (pad) {
    message <- stringr::str_pad(message, width)
    message <- crayon::col_substr(message, 1, width)
  }

  if (!complete) {
    message <- stringr::str_pad(message, width)
    cat("\r", message)
  } else {
    cat("\r", message)
  }
}

A few things that a first-time reader of the function show_message  observes:

  • the function is long, you need to scroll, so you cannot immediately understand what it does
  • after some reading, one realizes the name of the function is somewhat misleading: it shows indeed the status message, but for this it first needs to build it and customize it (set colors), so it obviously does more than the name says
  • what about its arguments? Well, data is a list, with 4 named elements.

What happens if I call the function with an empty list?

show_status(data = list())
# Error in if (data$n_fail > 0) { : argument is of length zero

Well, this did not go well. One way to make this function more robust, is to make in the beginning that the input arguments have the expected format, for example:

validate_data <- function(data) {
  checkmate::assertList(
    x = data,
    len = 4,
    names = "named"
  )

  expected_names <- c("n_ok", "n_fail", "n_warn", "n_skip")
  checkmate::assertSubset(
    x = names(data),
    choices = expected_names,
    empty.ok = FALSE
  )
}

Now that we know more, let's call the function with these arguments:

data <- list(n_fail = 0, n_warn = 3, n_skip = 1, n_ok = 10)
complete <- TRUE
pad <- FALSE
should_update <- FALSE

show_status(data, complete = complete, pad = pad, should_update = should_update)

# ✓ |  10   3 1 | 

This works now, we get our inputs printed.

Can we improve more the show_status function? Well, we noticed in the beginning that it does more that it says, let's first just extract the code which actually creates the status in a separate function:

build_status <- function(data, complete) {
  if (complete) {
    if (data$n_fail > 0) {
      status <- crayon::red(cli::symbol$cross)
    } else {
      status <- crayon::green(cli::symbol$tick)
    }
  } else {
    if (!should_update) {
      return()
    }

    if (data$n_fail > 0) {
      status <- colourise(status, "failure")
    } else if (data$n_warn > 0) {
      status <- colourise(status, "warning")
    }
  }

  return(status)
}

This is kind of short, but is it easy to read? I usually try to avoid long if/else statements, and to return early from the function, if possible:

build_status <- function(data, complete) {
  if (complete) {
    if (data$n_fail > 0) {
      status <- crayon::red(cli::symbol$cross)
      return(status)
    }
    status <- crayon::green(cli::symbol$tick)
    return(status)
  }
  if (!should_update) {
    return(invisible(NULL))
  }

  if (data$n_fail > 0) {
    status <- colourise(status, "failure")
  } else if (data$n_warn > 0) {
    status <- colourise(status, "warning")
  }

  return(status)
}

While this second version is not shorter, I find it easier to reason about. Let's see what we now have:

show_status <-function(data, complete = TRUE, pad = FALSE, should_update = FALSE) {
  validate_data(data)
  status <- build_status(data = data, complete = complete)

  col_format <- function(n, type) {
    if (n == 0) {
      " "
    } else {
      n
    }
  }

  message <- paste0(
    status, " | ", sprintf("%3d", data$n_ok), " ",
    col_format(data$n_fail, "fail"), " ",
    col_format(data$n_warn, "warn"), " ",
    col_format(data$n_skip, "skip"), " | ",
    data$name
  )

  width <- 3

  if (pad) {
    message <- stringr::str_pad(message, width)
    message <- crayon::col_substr(message, 1, width)
  }

  if (!complete) {
    message <- stringr::str_pad(message, width)
    cat("\r", message)
  } else {
    cat("\r", message)
  }
}

Our function is now a bit safer (since we validate one of the inputs) and a bit more readable, since we extracting the part where we build the status in a separate function. Can we do more? I find functions in function difficult to read, I would move the col_format function outside. If you pay attention, you'll notice that the cat statements are repeated. If you tell yourself: I would not repeat statement, believe me: it happens very easily when you make modifications to your code.

show_status <-function(data, complete = TRUE, pad = FALSE, should_update = FALSE) {
  validate_data(data)
  status <- build_status(data = data, complete = complete)

  message <- paste0(
    status, " | ", sprintf("%3d", data$n_ok), " ",
    col_format(data$n_fail, "fail"), " ",
    col_format(data$n_warn, "warn"), " ",
    col_format(data$n_skip, "skip"), " | ",
    data$name
  )

  width <- 3

  if (pad) {
    message <- stringr::str_pad(message, width)
    message <- crayon::col_substr(message, 1, width)
  }

  if (!complete) {
    message <- stringr::str_pad(message, width)
  }

  cat("\r", message)
}

Anything else we can improve?

  • The boolean arguments could be named to make their purpose more visible. The general advice is to add prefixes like is, has, can, should to make booleans more clear. For example, pad could become should_pad.
  • The variable width: in what units? It's usually advisable to add the units in the name of the variable.

What about the other functions?

style <- function(type = c("success", "skip", "warning", "failure", "error")) {
  type <- match.arg(type)
  
  message("type: ", type)
  
  out <- c(
    success = "green",
    skip = "blue",
    warning = "magenta",
    failure = "orange",
    error = "orange"
  )[[type]]
  
  return(out)
}

The default values of the argument type is a perfectly valid way to indicate to the user what the expected values are:

style(type = "dummy")
#Error in match.arg(type) : 
#  'arg' should be one of “success”, “skip”, “warning”, “failure”, “error” 

style(type = c("success", "skip"))
#  Error in match.arg(type) : 'arg' must be of length 1 

However, the devil is as usual in the detail. What do you think will happen if type is NULL?

style(type = NULL)
# type: success
# [1] "green"

Was this what you expected? If you read the help for match.arg, you'll see that in case of NULL, the first choice is returned. This might be want you want, but I prefer to make things more obvious. Here is the modified version of the function:

style <- function(type = c("success", "skip", "warning", "failure", "error")) {
  checkmate::assertCharacter(x = type, len = 1)
  checkmate::assertSubset(
    x = type,
    choices = c("success", "skip", "warning", "failure", "error"),
    empty.ok = FALSE
  )
  
  ll <- list(
    success = "green",
    skip = "blue",
    warning = "magenta",
    failure = "orange",
    error = "orange"
  )
 
  return(ll[type])
}

Now, if you try to call it with a NULL argument:

style(type = NULL)
#Error in style(type = NULL) : 
#  Assertion on 'type' failed: Must be of type 'character', not 'NULL'. 

Also, the function name could be improved. Style is can interpreted as a verb or a noun. A more suitable name would for example be get_text_color.

This was all for today. The code can be found here. Hope you found this useful.

Happy coding and stay healthy

 

Make a promise. Show up. Do the work. Repeat.