Yet another story from wandering through the R land... A few days ago I had to do with an R script, in which data was loaded from a database for a given date, nothing wild, something like:
glue::glue(
"select *
from the_table
where date_column = {the_date}"
)
My job was to modify the script to run for a list of dates. In theory, an easy job. In practice: it was a Hadoop db, accessed via RJDBC, which happened to be quite slow. It turned out it was faster to load all the data in R and do the filtering for dates on the R side (rather than in the SQL query):
df %>%
dplyr::filter(date_column %in% the_dates)
The result came to me as a suprize: there were no rows left. (I skip the boring details: it was a long script, which loaded all kinds of data, and the problem appeared in the last step, meaning about one hour of running each time...)
A bit of digging revealed several issues. On the one side: all the columns retrieved from the database were of type character. It turns out this is "a known issue". You might think that's not a big deal. But this is what happens:
df <- tibble::tribble(
~x,
"2022-01-02",
"2022-01-03",
"2022-01-04"
)
valid_date <- c(
"2022-01-02",
"2022-01-03"
) |>
as.Date()
#~~~~~~~~~~~~~~~~~~~~~~~~~~~
# is character %in% date? #####
# see also
# help(`%in%`)
df |>
dplyr::filter(x %in% valid_date)
# A tibble: 0 × 1
# … with 1 variable: x <chr>
# ℹ Use `colnames()` to see all variable names
When comparing character and dates using the the "%in%" operator, no rows are left. Funny enough, the equality operator works as expected, I guess there both left and right sides are coerced to strings:
# is character equal to a date? #####
df |>
dplyr::filter(x == valid_date[1])
# A tibble: 1 × 1
# x
# <chr>
# 1 2022-01-02
The solution was to convert my dates into character:
df |>
dplyr::filter(x %in% as.character(valid_date))
# A tibble: 2 × 1
# x
# <chr>
# 1 2022-01-02
# 2 2022-01-03
As usual in R (or in general in programming), things mostly go well, until they don't. Or in the words of a former colleague, who was talking about a suite of Jenkins tests: everything is so long green until it turns red....
Make a promise. Show up. Do the work. Repeat.