Today I stumbled upon some older code where data.tables were merged using the form X[Y]:
dt1 <- data.table::data.table( id = letters[1:3], x = 1:3 ) dt1 # id x # 1: a 1 # 2: b 2 # 3: c 3 dt2 <- data.table::data.table( id = letters[1:2], y = 10:11, z = 100:101 ) dt2 # id y z # 1: a 10 100 # 2: b 11 101 data.table::setkey(dt1, id) data.table::setkey(dt2, id) dt1[dt2] # id x y z # 1: a 1 10 100 # 2: b 2 11 101 dt2[dt1] # id y z x # 1: a 10 100 1 # 2: b 11 101 2 # 3: c NA NA 3
I find it difficult to remember what kind of merge this is (inner or outer) and which is the left/right table, so I have to look it up every time. The way I understand the documentation, data.table FAQ on the difference between X[Y] and merge(X, Y)?, the form X[Y] is useful if you only do the merge because you are interested in some values calculating from columns from the second table, for example:
dt1[dt2, w := x + z] # id x w # 1: a 1 101 # 2: b 2 103 # 3: c 3 NA
If you would use the merge function, you would need additional operations in order to keep only the columns you are interested in and you would waste some memory:
dt1 <- data.table::data.table( id = letters[1:3], x = 1:3 ) dt1 # id x # 1: a 1 # 2: b 2 # 3: c 3 dt2 <- data.table::data.table( id = letters[1:2], y = 10:11, z = 100:101 ) dt2 # id y z # 1: a 10 100 # 2: b 11 101 dt1 <- merge( x = dt1, y = dt2, by = "id", all.x = TRUE ) dt1 # id x y z # 1: a 1 10 100 # 2: b 2 11 101 # 3: c 3 NA NA dt1[, w := x + z] dt1[, y := NULL]
If you anyway need all the columns from both tables, I would stick to the merge function, which I find more readable.
One case which requires the X[Y] is for rolling joins.
Here is a stackoverflow discussion about different types of data.table joins.
Now I'm sure I belong to the 80% people whose data.table understanding is above average....
Make a promise. Show up. Do the work. Repeat.