Coming back to work after holidays, the first line of code that I get to see makes me wonder if I forgot all the R code I know...
df <- Map(function(x, y) get(y)(x), df, paste0("as.", names(col_types)))
What does this mean? What does the "[" operator do there? And what in the world do the many brackets here "get(y)(x)" do?
Here are the lessons I learned while trying to understand that code. The author's intention was to create a data.frame with no rows, but with columns with given name and given data type. His search on stackoverflow resulted in this (simplified version):
col_types <- list( Date = "a_date", character = "a_char", POSIXct = "a_posixct", numeric = "a_numeric" ) df <- data.frame( a_date = logical(0), a_char = logical(0), a_posixct = logical(0), a_numeric = logical(0) ) #https://stackoverflow.com/questions/57775371/how-to-convert-data-type-according-to-a-given-list tmp <- Map( f = function(x, y) get(y)(x), x = df, y = paste0("as.", names(col_types)) ) str(tmp) # List of 4 # $ a_date : 'Date' num(0) # $ a_char : chr(0) # $ a_posixct: 'POSIXct' num(0) # $ a_numeric: num(0) df <- tmp str(df) # 'data.frame': 0 obs. of 4 variables: # $ a_date : 'Date' num(0) # $ a_char : chr # $ a_posixct: 'POSIXct' num(0) # $ a_numeric: num
Shame on me, it took me a few hours to simplify the code and understand what it actually does (the output was obvious, but it was unclear to me how the code works).
Here my poor explanations:
get("as.Date") # this returns the function # function (x, ...) # UseMethod("as.Date") # <bytecode: 0x56023802d8d0> # <environment: namespace:base> x <- list(a = 1, b = 2) df <- data.frame(a = "1", b = "2") # here the interesting part for me df <- x str(df) # 'data.frame': 1 obs. of 2 variables: # $ a: num 1 # $ b: num 2
What is this line "df <- x" actually doing? A little bit of reading and playing around gave the answer:
# see help for the "[" operator ?`[` # "[" is the extraction or the replacement operator # the default usage is: # df[rows, columns] # if rows and columns are not given, the whole df is extracted or replaced with the given value df <- data.frame(a = "1", b = "2", stringsAsFactors = FALSE) df <- list(1, 3) # str(df) # 'data.frame': 1 obs. of 2 variables: # $ a: num 1 # $ b: num 3 df <- data.frame(a = "1", b = "2", stringsAsFactors = FALSE) # note: if the value has only one element, and the df has more columns, # all columns are given that value df <- list(1) str(df) # 'data.frame': 1 obs. of 2 variables: # $ a: num 1 # $ b: num 1 df <- data.frame(a = "1", b = "2", stringsAsFactors = FALSE) # note: you should not use a value with more elements that the number of columns df <- list(1, 2, 3) # Warning message: # In `[<-.data.frame`(`*tmp*`, , value = list(1, 2, 3)) : # provided 3 variables to replace 2 variables df # a b # 1 1 2
Was it worth spending all the time on this? All the trouble above could have easily been avoided... To my knowledge, the standard way to create a data.frame in R with no rows and columns of given data types is this:
df <- data.frame( a_date = as.Date(numeric(0), origin = "1970-01-01"), a_char = character(0), a_posixct = as.POSIXct(character(0)), a_numeric = numeric(0), stringsAsFactors = FALSE ) str(df) # 'data.frame': 0 obs. of 4 variables: # $ a_date : 'Date' num(0) # $ a_char : chr # $ a_posixct: 'POSIXct' num(0) # - attr(*, "tzone")= chr "" # $ a_numeric: num
https://stackoverflow.com/ might be useful in many cases. Copy/pasting code from there without understanding it and without adjusting it for your purposes definitely isn't.
In case you wonder, I kindly asked my colleague to explain what the code does. He told me what the output is, but he was not able to actually explain how it was done. This is not to make anybody feel bad. The person in question is actually a quick learner. This is just a hint of how not to do it and why.
I can see the apparent advantage of copying some code which looks like it solves your problem. Although the proposed solution might be the right one in a given context, this does not automatically mean that this is the right solution for you too. I tried to explain my colleague that I needed a few hours to simplify and understand what the code does (and I have more than three years of experience in daily programming with R). If he's away, and that part of the code needs to be modified, other colleagues would most probably (I might be wrong, of course) also need significant time to understand and modify it. I'm pretty sure that the final price for this quick solution would not be as small as originally thought...
There are other people who have similar thoughts:
I admit, I was cheating. Now the final thoughts. There may be many reasons to use a ready-made solution from the internet. The first one (sorted by frequency) that I hear all the time: "but it works...". (In my bubble, code that works is not necessarily good code.) The second one: "but it got through the review, so I assumed the code is ok". (The reviewer might have been under time pressure, probably not every line of code was checked, only the final result. In my bubble, not the reviewer is responsible for the quality of my code, but myself. I might be lucky, and the reviewer might give hints which can improve the code considerably. But it's not the reviewer's job to ensure that I write the best code I can. That's my job.)
Whatever the reasons, stop.
Make a promise. Show up. Do the work. Repeat.