Some time ago I needed to see how fast the data from some newly created tables from a database are loaded. 

I chose the microbenchmark package for this task. As I had a hard time finding how to use the list argument, here is an example for the future me.

https://xkcd.com/627/

xkcd tech support cheat sheet

The sources that I consulted when building the example:

https://stackoverflow.com/questions/32950881/how-to-use-list-argument-in-microbenchmark

https://github.com/joshuaulrich/microbenchmark/issues/4

The key is the usage of the bquote function:

db_table <- paste0("table_", 1:3)

job_list <- purrr::map(
  .x = db_table,
  .f = function(x) {
    # note the usage of the bquote function: the term wrapped in .() is evaluated
    # in the specified where environment (by default: parent.frame())
    bquote({
      sql <- paste("select * from", .(x))
    })
    
    sleep_time_seconds <- switch(
      x,
      "table_1" = sample(x = 2, size = 1),
      "table_2" = sample(x = 4, size = 1),
      "table_3" = sample(x = 6, size = 1),
    )
    
    Sys.sleep(sleep_time_seconds)
  }
)

names(job_list) <- db_table

obj <- microbenchmark::microbenchmark(
  list = job_list,
  times = 3,
  unit = "seconds"
)

An example of tabulating the results and highlighting in red the rows over a given threshold:

.runtime_table <- function(obj, maximum_allowed_run_seconds) {
  class(obj) <- "data.frame"
  
  `%>%` <- magrittr::`%>%`
  
  tmp <- obj %>%
    dplyr::mutate(
      time_seconds = time,
      time_minutes = time_seconds/60
    ) %>%
    dplyr::group_by(expr) %>%
    dplyr::summarise(
      average_time_seconds = round(mean(time_seconds), 2),
      average_time_minutes = round(mean(time_minutes), 2)
    ) %>%
    dplyr::rename(table = expr)
  
  out <- tmp %>%
    dplyr::arrange(-average_time_minutes) %>%
    gt::gt(.) %>%
    gtExtras::gt_theme_538() %>%
    gt::tab_options(
      data = .,
      column_labels.background.color = "#FCF3CF",
      column_labels.font.weight = "bold",
      column_labels.border.lr.style = "solid",
      column_labels.vlines.style = "solid"
    ) %>%
    gt::opt_row_striping(.) %>%
    gt::tab_style(
      data = .,
      style = gt::cell_fill(color = "red"),
      locations = gt::cells_body(
        columns = tidyselect::all_of(
          c("average_time_minutes", "table", "average_time_seconds")
        ),
        rows = average_time_seconds > maximum_allowed_run_seconds
      )
    ) 
    
  return(out)
}

.runtime_table(obj = obj, maximum_allowed_run_seconds = 10)

The resulting table:

microbenchmark tabulated results

 

Make a promise. Show up. Do the work. Repeat.