Today I stumbled upon some older code where data.tables were merged using the form X[Y]:

``````dt1 <- data.table::data.table(
id = letters[1:3],
x = 1:3
)
dt1
#    id x
# 1:  a 1
# 2:  b 2
# 3:  c 3

dt2 <- data.table::data.table(
id = letters[1:2],
y = 10:11,
z = 100:101
)
dt2
#    id  y   z
# 1:  a 10 100
# 2:  b 11 101

data.table::setkey(dt1, id)
data.table::setkey(dt2, id)
dt1[dt2]
#    id x  y   z
# 1:  a 1 10 100
# 2:  b 2 11 101

dt2[dt1]
#    id  y   z x
# 1:  a 10 100 1
# 2:  b 11 101 2
# 3:  c NA  NA 3``````

I find it difficult to remember what kind of merge this is (inner or outer) and which is the left/right table, so I have to look it up every time. The way I understand the documentation, data.table FAQ on the difference between X[Y] and merge(X, Y)?, the form X[Y] is useful if you only do the merge because you are interested in some values calculating from columns from the second table, for example:

``````dt1[dt2, w := x + z]
#    id x   w
# 1:  a 1 101
# 2:  b 2 103
# 3:  c 3  NA``````

If you would use the merge function, you would need additional operations in order to keep only the columns you are interested in and you would waste some memory:

``````dt1 <- data.table::data.table(
id = letters[1:3],
x = 1:3
)
dt1
#    id x
# 1:  a 1
# 2:  b 2
# 3:  c 3

dt2 <- data.table::data.table(
id = letters[1:2],
y = 10:11,
z = 100:101
)
dt2
#    id  y   z
# 1:  a 10 100
# 2:  b 11 101

dt1 <- merge(
x = dt1,
y = dt2,
by = "id",
all.x = TRUE
)
dt1
#    id x  y   z
# 1:  a 1 10 100
# 2:  b 2 11 101
# 3:  c 3 NA  NA

dt1[, w := x + z]
dt1[, y := NULL]``````

If you anyway need all the columns from both tables, I would stick to the merge function, which I find more readable.

One case which requires the X[Y] is for rolling joins.

Here is a stackoverflow discussion about different types of data.table joins.

Now I'm sure I belong to the 80% people whose data.table understanding is above average....

https://sketchplanations.com/the-lake-wobegon-effect

Make a promise. Show up. Do the work. Repeat.