Yet another story from wandering through the R land... A few days ago I had to do with an R script, in which data was loaded from a database for a given date, nothing wild, something like:

glue::glue(
  "select * 
    from the_table
    where date_column = {the_date}"
)

My job was to modify the script to run for a list of dates. In theory, an easy job. In practice: it was a Hadoop db, accessed via RJDBC, which happened to be quite slow. It turned out it was faster to load all the data in R and do the filtering for dates on the R side (rather than in the SQL query):

df %>% 
  dplyr::filter(date_column %in% the_dates)

The result came to me as a suprize: there were no rows left. (I skip the boring details: it was a long script, which loaded all kinds of data, and the problem appeared in the last step, meaning about one hour of running each time...)

A bit of digging revealed several issues. On the one side: all the columns retrieved from the database were of type character. It turns out this is "a known issue". You might think that's not a big deal. But this is what happens:

df <- tibble::tribble(
  ~x,
  "2022-01-02",
  "2022-01-03",
  "2022-01-04"
)

valid_date <- c(
  "2022-01-02",
  "2022-01-03"
) |>
  as.Date()

#~~~~~~~~~~~~~~~~~~~~~~~~~~~
# is character %in% date? #####
# see also 
# help(`%in%`)
df |>
  dplyr::filter(x %in% valid_date)
# A tibble: 0 × 1
# … with 1 variable: x <chr>
# ℹ Use `colnames()` to see all variable names

When comparing character and dates using the the "%in%" operator, no rows are left. Funny enough, the equality operator works as expected, I guess there both left and right sides are coerced to strings:

# is character equal to a date? #####
df |>
  dplyr::filter(x == valid_date[1])
# A tibble: 1 × 1
# x         
# <chr>     
# 1 2022-01-02

The solution was to convert my dates into character:

df |>
  dplyr::filter(x %in% as.character(valid_date))
# A tibble: 2 × 1
# x         
# <chr>     
#  1 2022-01-02
#  2 2022-01-03

As usual in R (or in general in programming), things mostly go well, until they don't. Or in the words of a former colleague, who was talking about a suite of Jenkins tests: everything is so long green until it turns red....

"https://xkcd.com/1537/"

xkcd cartoon about programming data types

Make a promise. Show up. Do the work. Repeat.