Coming back to work after holidays, the first line of code that I get to see makes me wonder if I forgot all the R code I know...
df[] <- Map(function(x, y) get(y)(x), df, paste0("as.", names(col_types)))
What does this mean? What does the "[" operator do there? And what in the world do the many brackets here "get(y)(x)" do?
Here are the lessons I learned while trying to understand that code. The author's intention was to create a data.frame with no rows, but with columns with given name and given data type. His search on stackoverflow resulted in this (simplified version):
col_types <- list(
Date = "a_date",
character = "a_char",
POSIXct = "a_posixct",
numeric = "a_numeric"
)
df <- data.frame(
a_date = logical(0),
a_char = logical(0),
a_posixct = logical(0),
a_numeric = logical(0)
)
#https://stackoverflow.com/questions/57775371/how-to-convert-data-type-according-to-a-given-list
tmp <- Map(
f = function(x, y) get(y)(x),
x = df,
y = paste0("as.", names(col_types))
)
str(tmp)
# List of 4
# $ a_date : 'Date' num(0)
# $ a_char : chr(0)
# $ a_posixct: 'POSIXct' num(0)
# $ a_numeric: num(0)
df[] <- tmp
str(df)
# 'data.frame': 0 obs. of 4 variables:
# $ a_date : 'Date' num(0)
# $ a_char : chr
# $ a_posixct: 'POSIXct' num(0)
# $ a_numeric: num
Shame on me, it took me a few hours to simplify the code and understand what it actually does (the output was obvious, but it was unclear to me how the
code works).
Here my poor explanations:
get("as.Date") # this returns the function
# function (x, ...)
# UseMethod("as.Date")
# <bytecode: 0x56023802d8d0>
# <environment: namespace:base>
x <- list(a = 1, b = 2)
df <- data.frame(a = "1", b = "2")
# here the interesting part for me
df[] <- x
str(df)
# 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ b: num 2
What is this line "df[] <- x" actually doing? A little bit of reading and playing around gave the answer:
# see help for the "[" operator
?`[`
# "[" is the extraction or the replacement operator
# the default usage is:
# df[rows, columns]
# if rows and columns are not given, the whole df is extracted or replaced with the given value
df <- data.frame(a = "1", b = "2", stringsAsFactors = FALSE)
df[] <- list(1, 3)
# str(df)
# 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ b: num 3
df <- data.frame(a = "1", b = "2", stringsAsFactors = FALSE)
# note: if the value has only one element, and the df has more columns,
# all columns are given that value
df[] <- list(1)
str(df)
# 'data.frame': 1 obs. of 2 variables:
# $ a: num 1
# $ b: num 1
df <- data.frame(a = "1", b = "2", stringsAsFactors = FALSE)
# note: you should not use a value with more elements that the number of columns
df[] <- list(1, 2, 3)
# Warning message:
# In `[<-.data.frame`(`*tmp*`, , value = list(1, 2, 3)) :
# provided 3 variables to replace 2 variables
df
# a b
# 1 1 2
Was it worth spending all the time on this? All the trouble above could have easily been avoided... The standard way to create a data.frame in R with no rows and columns of given data types is this:
df <- data.frame(
a_date = as.Date(numeric(0), origin = "1970-01-01"),
a_char = character(0),
a_posixct = as.POSIXct(character(0)),
a_numeric = numeric(0),
stringsAsFactors = FALSE
)
str(df)
# 'data.frame': 0 obs. of 4 variables:
# $ a_date : 'Date' num(0)
# $ a_char : chr
# $ a_posixct: 'POSIXct' num(0)
# - attr(*, "tzone")= chr ""
# $ a_numeric: num
"https://stackoverflow.com/" might be useful in many cases. Copy/pasting code from there without understanding it and without adjusting it for your purposes definitely isn't.
In case you wonder, I kindly asked my colleague to explain what the code does. He told me what the output is, but he was not able to actually explain how it was done.
This is not to make anybody feel bad. The person in question is actually a quick learner. This is just a hint of how not to do it and why. I can see the apparent advantage of copying some code which looks like it solves your problem. Although the proposed solution might be the right one in a given context, this does not automatically mean that this is the right solution for you too.
I tried to explain my colleague that I needed a few hours to simplify and understand what the code does (and I have about 4 years of experience in daily programming with R). If he's away, and that part of the code needs to be modified, other colleagues would most probably (I might be wrong, of course) also need significant time to understand and modify it. I'm pretty sure that the final price for this quick solution would not be as small as originally thought...
There are other people who have similar thoughts:
- why you should not copy paste from stackoverflow
- things you don't learn when you copy and paste from stackoverflow
Unexpected, complicated code is not only a common source of bugs:
"https://sketchplanations.com/fixing-bugs"
I recently had to do with some code which seemed unnecessarily complicated to me, a colleague reaction while seeing what I was doing: "I had a look at that part too, but it seemed so complicated to me, that I preferred to just close the file..." it was meant to be ironical, but the lesson is obvious: even if the code works, i.e. no obvious bugs, adding features could still be a nightmare, to the point that a programmer would rather re-write the whole stuff... Which I'm pretty sure will make that piece of code more expensive than needed...
My advice at the end: search less, browse more.
Make a promise. Show up. Do the work. Repeat.