Yet another story about the price of ignorance/lack of knowledge...

The story goes like this:  A few days ago, a colleague asks me if there is a chance that a function that I recommended him to use in order to dispatch an R script does not work  with parallel code (i.e. using "foreach" statements).

My first reaction: I doubt it (wait, ignorance is a bless - not really). I mean, we use that function since ages, I'm sure we have scripts running "foreach" code, so I'm pretty confident that we would have noticed something like this... 

You see it coming, don't you? The more I felt the need to explain, the more unsecure I became... Ok, I say, let me quickly test it, it must be a piece of cake to prove that it works, isn't it?

So I type the following lines and save them in a file called "script.R":

# script.R
cl <- parallel::makeCluster(2, type = "SOCK")
on.exit(expr = parallel::stopCluster(cl), add = TRUE, after = FALSE)

doParallel::registerDoParallel(cl)

`%dopar%` <- foreach::`%dopar%`
out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
print(str(out))

First test:

Rscript script.R

List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1

Ok, no surprise here. A quick glance at the original code and I realize that we actually source the script. No big deal, let's see:

Rscript  -e 'source("script.R")'
Error in summary.connection(connection) : invalid connection
Calls: source ... sendData.SOCKnode -> serialize -> summary -> summary.connection
Execution halted

Wait, what? Why? And how many?

As it happens, it was late in the afternoon, my colleague needed a solution and I was out of ideas. I promised to think about it and come back to him the next day. Which I did.

Want to know what my "solution" was? I'm ashamed to reveal it... Something like:

script <- readLines("script.R")
eval(parse(text = script))

Somewhere on stackoverflow I found a sentence like "if eval parse is the answer then you ask the wrong question" (unfortunately  I can't find the link anymore). This should have been a warning to me... So here is me in the morning, making the change, releasing the package with the "fix", and telling the colleague that this should work now. 

Do you think it worked? Well, it did. It still does. But I needed to understand the problem in the first place. Searching the internet did not offer any solution, it looked like I was the only one in the world noticing the "problem". This should also have been a warning... 

The internet search showed me another way of stopping the cluster: doParallel::stopImplicitCluster(). So I modified the file script.R to look like this:

#script.R
cl <- parallel::makeCluster(2, type = "SOCK")
on.exit(expr = doParallel::stopImplicitCluster(), add = TRUE, after = FALSE)

doParallel::registerDoParallel(cl)

`%dopar%` <- foreach::`%dopar%`
out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
print(str(out))

Let's see:

Rscript -e 'source("script.R")'
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1

Now it works. I'm lost. What on earth...? I would normally debug with debugonce or so. But let's first try the brute method of debugging via printing. It looked like it had something to do with the function called  in on.exit. Let's see if this is true. script.R now looks like this:

# script.R
in_the_end <- function()
{
  message("in the end")
  doParallel::stopImplicitCluster()
}
cl <- parallel::makeCluster(2, type = "SOCK")
on.exit(expr = in_the_end(), add = TRUE, after = FALSE)

doParallel::registerDoParallel(cl)

`%dopar%` <- foreach::`%dopar%`
out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
print(str(out))

When sourcing:

Rscript -e 'source("script.R")'
in the end
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1

Ok, so "in_the_end" is called, and it still works. What if I don't source the script?

Rscript script.R
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1

Oh, wait again. Why is "on.exit" not called? Then it hit me: well, there is no function to exit from, so on.exit is not called. (Does this mean that my clusters stay alive?).

When I source the script, well, "source" is a function, so "on.exit" is called. Now I'm getting somewhere. Let's make sure there is a function in both cases, so modify script.R to look like this:

# script.R
in_the_end <- function()
{
  message("in the end")
  doParallel::stopImplicitCluster()
}

run_me <- function()
{
  cl <- parallel::makeCluster(2, type = "SOCK")
  on.exit(expr = in_the_end(), add = TRUE, after = FALSE)

  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%`
  out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
  print(str(out))
}

run_me()

When the scripts is sourced:

Rscript -e 'source("script.R")'
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1
in the end

Aha, "on.exit" is called, as expected.

When the script is executed directly:

Rscript script.R
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1
in the end

Now I have the same outputs. Let's use the original function to stop the cluster (parallel::stopCluster instead of doParallel::stopImplicitCluster):

# script.R
in_the_end <- function(cl)
{
  message("in the end")
  parallel::stopCluster(cl)
}

run_me <- function()
{
  cl <- parallel::makeCluster(2, type = "SOCK")
  on.exit(expr = in_the_end(cl), add = TRUE, after = FALSE)

  doParallel::registerDoParallel(cl)

  `%dopar%` <- foreach::`%dopar%`
  out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
  print(str(out))
}

run_me()

Same steps as before:

Rscript script.R
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1
in the end

This works, as before. And the surprise:

Rscript -e 'source("script.R")'
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1
in the end

Oh, so now it works also when the script is sourced. Kind of obvious by now...

The original code would have also worked, if there would have been no call to on.exit:

# script.R
cl <- parallel::makeCluster(2, type = "SOCK")

doParallel::registerDoParallel(cl)

`%dopar%` <- foreach::`%dopar%`
out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
print(str(out))

parallel::stopCluster(cl)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rscript -e 'source("script.R")'
List of 3
 $ : num 2.72
 $ : num 7.39
 $ : num 20.1

Have you paid attention? Remember this? script.R looks like this:

# script.R
in_the_end <- function(cl)
{
  message("in the end")
  parallel::stopCluster(cl)
  message("in the end: done")
}

cl <- parallel::makeCluster(2, type = "SOCK")
on.exit(expr = in_the_end(cl), add = TRUE, after = FALSE)

doParallel::registerDoParallel(cl)

`%dopar%` <- foreach::`%dopar%`
out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
print(str(out))

And the output when sourced looks like this:

Rscript -e 'source("script.R")'
in the end
Error in summary.connection(connection) : invalid connection
Calls: source ... sendData.SOCKnode -> serialize -> summary -> summary.connection
Execution halted

See how the error message appears in the function called in on.exit? At least it looks like... That's of course a lie. 

Here is the traceback:

rlang::trace_back()
     █
  1. ├─base::source("script.R")
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │   └─base::eval(ei, envir)
  5. ├─`%dopar%`(...) script.R:13:0
  6. │ └─e$fun(obj, substitute(ex), parent.frame(), e$data)
  7. │   └─parallel::clusterCall(...)
  8. │     └─parallel:::sendCall(cl[[i]], fun, list(...))
  9. │       └─base::source("script.R")
 10. │         ├─base::withVisible(eval(ei, envir))
 11. │         └─base::eval(ei, envir)
 12. │           └─base::eval(ei, envir)
 13. └─`%dopar%`(...) script.R:13:0
 14.   └─e$fun(obj, substitute(ex), parent.frame(), e$data)
 15.     └─parallel::clusterCall(...)

You might notice that at the moment dopar is called, there seems to be no cluster anymore... Here more printouts to show the problem:

# script.R
in_the_end <- function(cl)
{
  message("in the end")
  parallel::stopCluster(cl)
  message("in the end: done")
}

cl <- parallel::makeCluster(2, type = "SOCK")
on.exit(expr = in_the_end(cl), add = TRUE, after = FALSE)

doParallel::registerDoParallel(cl)

message("hey, cluster")

`%dopar%` <- foreach::`%dopar%`
out <- foreach::foreach(i = 1:3) %dopar% {exp(i)}
print(str(out))

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rscript -e 'source("script.R")'
in the end
in the end: done
hey, cluster
Error in summary.connection(connection) : invalid connection
Calls: source ... sendData.SOCKnode -> serialize -> summary -> summary.connection
Execution halted

The on.exit function is called NOT at the end of the script, as I naively expected (so the cluster is stopped by the time foreach wants to do something with it):

# script.R
in_the_end <- function()
{
  message("in the end")
}

message("start")
on.exit(expr = in_the_end(), add = TRUE, after = FALSE)
message("done")

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rscript -e 'source("script.R")'
start
in the end
done

Compare this to the case where we have a function instead:
 

# script.R
in_the_end <- function()
{
  message("in the end")
}

run_me <- function()
{
  message("start")
  on.exit(expr = in_the_end(), add = TRUE, after = FALSE)
  message("done")
}

run_me()

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rscript -e 'source("script.R")'
start
done
in the end

 

That's all. Trivial, as you can see. I still have to to this class though...

https://xkcd.com/843/

xkcd_misconceptions

 

Make a promise. Show up. Do the work. Repeat.