Somewhere I read the following question: how would we behave if every time we send an email we would have to pay for each email adress we use? Notice that it was not a question about would be the price, but about the value (perceived or not) of things we do.

On short: if you use Rscript with the argument "-e", pay attention that there is a limit of 10,000 bytes on the total length of expressions used in this way. If your expression is longer, you don't get an error, just a warning, and nothing else. The rest is just details.

You probably know that you can run R expressions from the command line using Rscript:

Rscript -e 'writeLines(text = "fooo", con = "/tmp/delme.txt")'
cat /tmp/delme.txt 
# fooo

Some time ago, I (lost) spent a few hours trying to understand the behavior of some R code which was triggered via Rscript. In many cases it worked ok, but in some cases not, without an obvious error. The context was complicated and it took me some while to break down the problem. When the R expression contained a large list, things did not behave as expected anymore. Here is what happens.

Let's generate some long random text which we then try to write to a file:

TEXT=$(base64 /dev/urandom | head -c 100000)
echo "\"writeLines(text=$TEXT, con = '/tmp/delme.txt')\"" | xargs -0 Rscript -e
WARNING: '-e "writeLines(text=toKwj0+dhXoJQmiPIQCC0eiKwS60k5NojVj/jaADnvWg6Eyacn+MDWzZwcuU0LdpOlzHTLiN2yyR~n~OmX1oH7oirHWtmJMyksKh1cIUF8oSwMN9wDKzG4628uKP3nO5OLSYBgopi1978zDcEw/NJwRlw9S~n~tGuCTqCCeAD550VqZFbifI5hnK5DkyxVONBrNmbHKm/D7eccCr35cOswU+iJ4sXyyTIcgIR+Izuz~n~OQxMP32v1hMVsvGrV1G8lULER6lJA//tO7aDMVhMvq0fQ8XR6zNL0ox0JpE81zXsf691k0VP6+Bv~n~lKESbeQ5rASvJ2clzRBnM80RiFQECGmF18dvyDVLGzBkiB8Xi7Hnq5c6fT04VGiqiZQ18hc35MG1~n~dgNwrxPnts5ZTqQQYdaYPjx4GQgACvMBV7EPRr1C8/bBTrZBWwuw9BiojfrRwj0Owl9j2aXEG2HO~n~nRatOAZJ/nSS0Ty5xMWAlwisyh7ui0ptB2NPZ7N5mfj7WO41/F/twGbBhwg7VZjEcurThVehJe5A~n~dPUSD+ktI1xK0HVfVVc61bYMEjP4oP7FixKTpKRC1CxaQhXYvMUPouscQ3GLlLCqlK4n2soyw2Uw~n~XrsnvEBMHspJudYl/Qe+EbjrvsGNZxllCjxIPrLxUKAfQGkIsns1XFYkNJk3jQekfPJUg/e5cMgn~n~WhTehI7i+W0J6zBfMgC5lc4ear+5xgKIuLIjJ+p63Nkx1pvN8AdyHoBW4NsZcaV00Sz0QVT8i8x4~n~xDpO8F99ZfS3sTeIpozJloJFcXLRc0Bsd5a0y65n4T0LBl+JtwhkfD+Wdesyg2KKhgch6w0rU4g9~n~BEX9L6OWKrcP1/VFZZsfgUsSR4p9Jo+CQ5RhWwvEeheKO9SuCwOR3wseMLFuEOKwQW9q1btIsrVP~n~tI2bkwQfBKWNKxv2T1krYTBTTNbPwP8Uf/qVftzHZMek1

cat /tmp/delme.txt 
# fooo

A warning is issued, but the message is truncated, I have no idea what the warning is about. The bad thing is not the fact that this does not work as naively expected. I understand that there are technical limitations, but I fail to understand why the command above returns with a success status and not with an error.

An internet search revealed that somebody noticed the problem many years ago. And this is still a problem today. I wonder if the developers involved were fighting (other?) alligators instead of fixing this... With a proactive attitude, I should post the question about why is there no error message in this case in the R mailing list. (In an ideal world I would have the skills to fix this myself.)

The truth is, I gave up lately. For many reasons. Here just a few I encountered recently:

  • somebody asks how to do solve a specific problem in R. I asked for a minimal example and I receive a screenshot of an Excel table. My reply: I was actually thinking of a few lines of R code, which I can copy/paste to reproduce the problem. The answer: I don't have any R code, this is what I actually look for... No, I did not further explain that "minimal example" means for me this and this.
  • I attend to a short talk presenting results from a survey. On the same slide, two bar charts using two colors: red and blue. I usually associate red with danger. I know, there are cultural differences in the symbolism of colors. I imagine I can understand that. Where I was completely lost: in the first chart, red had indeed negative meaning - a feature got a bad score. But in the second chart (note: on the same slide), red had positive meaning - another feature got a good score. Why on earth would one do this? Being consistent with color across charts is in my mind a basic thing. Obviously only in my bubble...
  • I got the question why do I use the bang bang operator with testthat, this makes little sense, because bang bang is just a logical operator. I was totally confused by the question, I asked about the details. Here is the "minimum reproducible example":
     x <- 42
    testthat::expect_gt(!!x, 43)
    The person executed this line:
    # TRUE
    Since the answer was a logical value, the conclusion was that the bang bang operator is a logical operator, so what's the point in using it with testthat. To clarify, the person was using the term "bang bang operator", so he knew (something) about it. I tried to explain that this is not a logical operator. And that this is the impact:
     #WITHOUT bang bang
    x <- 42
    testthat::expect_gt(x, 43)
    # Error: `x` is not strictly more than 43. Difference: -1
    # with bang bang a clearer error message is generated:
    x <- 42
    testthat::expect_gt(!!x, 43)
    # Error: 42 is not strictly more than 43. Difference: -1
    The usage of the unquoting operator (pronounced "bang bang") with testthat is documented here.

Here a nice sketch that resonated with me while pondering about these issues:

sketplanations context is king

Which unfortunately just increases my confusion. In the situations above: was I in the letters context, and the others in the numbers context? Or the other way around? How many contexts are there? What else do I miss? How do I make the jump to another context?

From time to time I watch this.

Maybe all these questions are just alligator fighting...

Make a promise. Show up. Do the work. Repeat.