Here are a few things I realized I did not know the last days. And which costed a few colleagues and me too much time. Maybe this can help you too.

R: vectorized grepl

I use the R data.table package a lot. A colleague asked me how to get only the rows for which a character from a column contains the character from another column. My answer was: just use the "%like%" operator. What I did not know, was that this operator uses grepl, which works only with length 1 patterns:

dat <- data.table::data.table(
  word1 = c("abc", "efg", "xyz"),
  word2 = c("a", "pqr", "xyz")
  )

`%like%` <- data.table::`%like%`
dat[word1 %like% word2]

# Warning message:
#  In grepl(pattern, vector, ignore.case = ignore.case, fixed = fixed) :
#  argument 'pattern' has length > 1 and only the first element will be used

This is a bit surprising to me, after all, we are used that many R functions  work nicely with vectors (length > 1) as arguments. One can of course write some not
so nice lapply code:

dat[, row_nr := .I]
dat[, is_like := lapply(X = word1, FUN = grepl, pattern = word2), by = row_nr]
dat[is_like == TRUE]

or one can use the vectorized function from the stringr package:

dat[stringr::str_detect(string = word1, pattern = word2)]
#    word1 word2
# 1:   abc     a
# 2:   xyz   xyz

While I usuall pay attention when adding package dependencies, this is a case 
for me to opt for stringr instead of lapply.

 

R: misspelled function arguments in ellipsis

Ellipsis in R can be very useful. However, spelling mistakes might go undetected when using them.

The ellipsis package has few handy functions to solve some of the problems.

print_labels <- function(...)
{
  ll <- list(...)
  message("labels: ", ll$labels)
}

do_something <- function()
{
  print_labels(lables = "yellow")
}

do_something()
# labels: 

Have you noticed the misspelled "lables"? This can be turned into an error:

print_labels <- function(..., labels)
{
  ellipsis::check_dots_unnamed()
  ll <- list(...)
  ll$labels <- labels
  message("labels: ", ll$labels)
}

do_something <- function()
{
  print_labels(lables = "yellow")
}

do_something()
#Error: 1 components of `...` had unexpected names.

#We detected these problematic arguments:
#  * `lables`

#Did you misspecify an argument?

 

R/RStudio on Linux: initialization error 

If your RStudio session refuzes to open, with "initialization error", and you recently modified your .bashrc, remove the .bashrc and see if the RStudio session can be started. (Costed a few hours of try and error to realize this - while I could start a new bash shell without a problem, the RStudio session could not be initialized.)

 

R packages/github

If you want to browse public R packages on github in a VSCode look, you might want to try github1s. All you need to do is to replace in the url "github.com" with "github1s.com" (this is: 1(one)s). Example url for rlang: https://github1s.com/r-lib/rlang.

 

PostgreSQL: a NULL is never equal to a NULL, so watch out on joining on columns which can be NULL

NULL indicates in PostgresSQL a missing value (as in R) and a NULL is never equal to a NULL. As a consequence, when joining on columns which can be NULL, you have to additionally include the "maybe NULL" conditions if you want to be sure that you get all the rows.

Here is an example (you can run it online).

drop table if exists todos;
create table todos (
  id bigint generated always as identity,
  task varchar
  );
insert into todos(task) values ('wake up');
insert into todos(task) values (null);
insert into todos(task) values ('look at the sky');
select * from todos;
                    
drop table if exists dones;
create table dones (task varchar);
insert into dones(task) values ('wake up');
insert into dones(task) values(null);
                    
select dones.task
  from dones 
  inner join todos
  on dones.task = todos.task;
                    
select dones.task
  from dones 
  inner join todos
  on (
      dones.task = todos.task or
      (dones.task is null and todos.task is null)
      );

If you are wondering why I used identity instead of serial, see here. And here is a comparison of NULL in C/C++, Java and PostgreSQL. Spoiler: you can directly compare NULLs in C/C++ and Java, so it's worth to be aware of the differences.

 

git stash apply

I knew about git stashes, but I use them as in 'git stash pop', I wasn't  aware until recently that 'git stash apply'. In a simplified view:
 

# git stash pop = git stash apply + git stash drop

Here is a small list of git commands related to stashes:

# stash with message
git stash save “use a more meaningful message instead”
# stash untracked files
git stash save -u
# view the list of stashes
git stash list
# git stash pop on a new branch
git stash branch <name> stash@{1}
# clear all stashes - use with care
git stash clear

For more explanations, see for example here.

 

Make a promise. Show up. Do the work. Repeat.