7 Glossary
This glossary contains the vocabulary necessary to work with tidy evaluation and, more generally, with expressions. The definitions in rlang are generally consistent with base R. When they differ, both definitions are presented so you can navigate between these two worlds more easily.
7.1 Data structures
7.1.1 TODO Data mask
7.1.2 Expression
An expression is a piece of R code that represents a value or a computation:
12 # Value
12 / 3 # Computation
12 / (1 + 2) # Nested computations
Expressions are normally transient. They are computed (or evaluated) when you source a file or call a function. You can only observe:
The final value of the outermost expression.
Their side effects, such as the console output of a
print()
expression inside a loop.
In R however, it is possible to suspend the normal evaluation of expressions with the quotation mechanism. In a way, quotation causes expressions to freeze in place:
# Evaluated expression
12 / 3
#> [1] 4
# Quoted expression
expr(12 / 3)
#> 12/3
The technical definition of expressions is any R object that is created by parsing R code:
- Constants like
NULL
,1
,"foo"
,TRUE
,NA
, etc. - Symbols like
height
orweight
- Calls like
c()
orlist()
Unlike constants, symbols and calls are symbolic objects: their value depends on the environment.
7.1.3 Expression (base)
In base R, “expression” refers to a special type of vector that contains quoted expressions in the rlang sense:
base::expression(key <- "foo", toupper(key))
#> expression(key <- "foo", toupper(key))
You’ll most likely encounter this rare data structure as the return value of base::parse()
:
code <- "key <- 'foo'; toupper(key)"
parse(text = code)
#> expression(key <- "foo", toupper(key))
The only advantage of expression vectors compared to lists is that they include source references. Expression vectors with source references are printed with whitespace and comments preserved:
code <- "{
# Interesting comment
weird <- whitespace
}"
parse(text = code, keep.source = TRUE)
#> expression({
#> # Interesting comment
#> weird <- whitespace
#> })
Source references are mostly useful for debugging and development tools. They don’t play any computational role and tidy evaluation doesn’t make use of references. Consequently the parsing tools in rlang return normal lists of expressions (in the rlang sense) instead of expression vectors:
rlang::parse_exprs(code)
#> [[1]]
#> {
#> weird <- whitespace
#> }
7.1.4 TODO Symbol
7.2 Programming Concepts
7.2.1 Constant versus symbolic
Constants, also called “literals”, always have the same value no matter the context. On the other hand, symbols and calls are symbolic expressions: their value depends on an environment and what kind of objects are defined there.
For instance the string "mickey"
always represents the same string no matter the environment and what objects are defined there:
# Here's a string:
"mickey"
#> [1] "mickey"
mickey <- "mouse"
# Still the same string:
"mickey"
#> [1] "mickey"
In constrast, symbols depend on current definitions:
# We've defined `mickey` as "mouse"
mickey
#> [1] "mouse"
mickey <- "mickey"
# Now `mickey` is "mickey"
mickey
#> [1] "mickey"
One source of problems when you’re working with quoted expressions is that they might be evaluated in arbitrary places, where objects have potentially been redefined to something different than expected. This is a common issue with tidyverse grammars because they evaluate quoted expressions in a data mask. Say you’d like to divide a column by a factor defined in the current environment:
factor <- 100
starwars %>% mutate(height / factor) %>% pull()
#> [1] 1.72 1.67 0.96 2.02 1.50 1.78 1.65 0.97 1.83 1.82 1.88 1.80 2.28 1.80
#> [15] 1.73 1.75 1.70 1.80 0.66 1.70 1.83 2.00 1.90 1.77 1.75 1.80 1.50 NA
#> [29] 0.88 1.60 1.93 1.91 1.70 1.96 2.24 2.06 1.83 1.37 1.12 1.83 1.63 1.75
#> [43] 1.80 1.78 0.94 1.22 1.63 1.88 1.98 1.96 1.71 1.84 1.88 2.64 1.88 1.96
#> [57] 1.85 1.57 1.83 1.83 1.70 1.66 1.65 1.93 1.91 1.83 1.68 1.98 2.29 2.13
#> [71] 1.67 0.79 0.96 1.93 1.91 1.78 2.16 2.34 1.88 1.78 2.06 NA NA NA
#> [85] NA NA 1.65
This works fine but what if the data frame contains a column called factor
? The expression will be evaluated with the parasite definition:
# Derive a data frame that contains a `factor` column
starwars2 <- starwars %>% mutate(factor = 1:n())
# Oh no! We're now dividing `height` by the new column!
starwars2 %>% mutate(height / factor) %>% pull()
#> [1] 172.0 83.5 32.0 50.5 30.0 29.7 23.6 12.1 20.3 18.2 17.1
#> [12] 15.0 17.5 12.9 11.5 10.9 10.0 10.0 3.5 8.5 8.7 9.1
#> [23] 8.3 7.4 7.0 6.9 5.6 NA 3.0 5.3 6.2 6.0 5.2
#> [34] 5.8 6.4 5.7 4.9 3.6 2.9 4.6 4.0 4.2 4.2 4.0
#> [45] 2.1 2.7 3.5 3.9 4.0 3.9 3.4 3.5 3.5 4.9 3.4
#> [56] 3.5 3.2 2.7 3.1 3.0 2.8 2.7 2.6 3.0 2.9 2.8
#> [67] 2.5 2.9 3.3 3.0 2.4 1.1 1.3 2.6 2.5 2.3 2.8
#> [78] 3.0 2.4 2.2 2.5 NA NA NA NA NA 1.9
Masking is generally not a problem in scripts because you know what columns are inside your data frame. However as soon as your code is getting more general, for instance if you create a reusable function, you can no longer make assumptions about what’s in the data.
Fortunately with quasiquotation it is easy to solve masking issues by replacing symbols with constants. The unquoting operator !!
allows you to inline constant values deep inside expressions. With qq_show()
we can observe the inlining:
vector <- 1:3
# Without inlining, the expression depends on the value of `vector`:
rlang::qq_show(list(vector))
#> list(vector)
# Let's inline the current value of `vector` by unquoting it:
rlang::qq_show(list(!!vector))
#> list(<int: 1L, 2L, 3L>)
Because constants have the same value in any environment, the data mask can never take over with parasite definitions:
rlang::qq_show(starwars2 %>% mutate(height / !!factor) %>% pull())
#> starwars2 %>% mutate(height / 100) %>% pull()
starwars2 %>% mutate(height / !!factor) %>% pull()
#> [1] 1.72 1.67 0.96 2.02 1.50 1.78 1.65 0.97 1.83 1.82 1.88 1.80 2.28 1.80
#> [15] 1.73 1.75 1.70 1.80 0.66 1.70 1.83 2.00 1.90 1.77 1.75 1.80 1.50 NA
#> [29] 0.88 1.60 1.93 1.91 1.70 1.96 2.24 2.06 1.83 1.37 1.12 1.83 1.63 1.75
#> [43] 1.80 1.78 0.94 1.22 1.63 1.88 1.98 1.96 1.71 1.84 1.88 2.64 1.88 1.96
#> [57] 1.85 1.57 1.83 1.83 1.70 1.66 1.65 1.93 1.91 1.83 1.68 1.98 2.29 2.13
#> [71] 1.67 0.79 0.96 1.93 1.91 1.78 2.16 2.34 1.88 1.78 2.06 NA NA NA
#> [85] NA NA 1.65