March 4, 2015

The Package

Implement Pipes (as in F#) to:

  • to decrease development time
  • to improve readability and maintainability of code

%>% the forward pipe

Pipe a value forward into an expression or function call, e.g.

cars$speed %>% sort %>% head
## [1] 4 4 7 7 8 9

%>% makes calls more intuitive and readable

Ville de Montréal Food Safety Offenders:

etablissement montant ville
RESTAURANT MAN LI 1250 $ Pierrefonds H8Y 3E3
RESTAURANT HUNG SING 1250 $ Montréal H1N 1C1
BELLES DIONYSIA 750 $ St-Laurent H4S 1L8
RESTO BAR NICKELS ST-LAURENT 1500 $ St-Laurent H4R 1K4
RESTAURANT CARVELI 2000 $ Montréal H4V 1H5

%>% makes calls more intuitive and readable

Suppose you wish to print the maximum fine, with comma-separatd thousands. You could nest the function calls:

format(max(as.numeric(gsub(" \\$","",offenders$montant))),big.mark=',')
## [1] "7,500"

or with an intermediary temporary variable:

max.fine = gsub(" \\$","",offenders$montant)
max.fine = as.numeric(max.fine)
max.fine = max(max.fine)
format(max.fine,big.mark=',')
## [1] "7,500"

%>% makes calls more intuitive and readable

With pipes it becomes

offenders$montant %>% gsub(" \\$","",.) %>% as.numeric %>% max %>% 
  format(big.mark=',')
## [1] "7,500"

%>% makes calls more intuitive and readable

Changing one step in the workflow:

format(summary(as.numeric(gsub(" \\$",""
  ,offenders$montant)),digits=Inf),big.mark=',')

Compare to magrittr:

offenders$montant %>% gsub(" \\$","",.) %>% as.numeric %>% 
  summary(digits=Inf) %>% format(big.mark=',')

The dot placeholder

By default the LHS value is passed as the first argument of the RHS. Use the "." if this is not the case:

offenders$montant %>% gsub(" \\$","",.) %>% as.numeric %>% 
  summary(digits=Inf) %>% format(big.mark=',')
##        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
## "  250.000" "1,100.000" "1,500.000" "1,513.438" "1,750.000" "7,500.000"

Piping into anonymous functions

offenders$ville[grepl("PIZZA",offenders$etablissement)] %>% 
str_extract("[A-Z][0-9][A-Z] [0-9][A-Z][0-9]$") %>% unique %>%
c("H3A 2J5")  %>% (
function(postal.codes){
 df = data.frame("locationvar"=postal.codes,"tipvar"=postal.codes)
}) %>% gvisMap %>% plot

Building (unary) functions

Example from the author:

f <- . %>% cos %>% sin 
# is equivalent to 
f <- function(.) sin(cos(.)) 
f(0)
## [1] 0.841471

%<>% compound assignment

offenders$montant = offenders$montant %>% gsub(" \\$","",.) %>% as.numeric

is equivalent to:

offenders$montant %<>% gsub(" \\$","",.) %>% as.numeric

%T>% The "tee" operator

The “tee” operator, %T>% allows branching off the main flow to go accomplish side-effects, e.g. plots.

iris[,1:4]  %T>% { scale(.) %>% t %>% dist %>% cmdscale %>% plot } %>% 
  cor %>% abs %>% min %>% round(3)

## [1] 0.118

%$% exposition pipe operator

This operator exposes the names of the LHS to the RHS:

iris[,c(1,3)] %T>% plot  %$% lm(Sepal.Length ~ Petal.Length)$coef

##  (Intercept) Petal.Length 
##    4.3066034    0.4089223

%>% Performance

In 2014 blog post, Ken Run (author of pipeR) discusses the cost of pipes:

system.time({ lapply(1:50000, function(i) {
  sample(letters,6,replace = T) %>%
      paste(collapse = "") %>% "=="("rstats") }) })
##    user  system elapsed 
##   5.924   0.012   5.947
system.time({lapply(1:50000, function(i) {
    x = sample(letters,6,replace = T) 
    x =  paste(x, collapse = "") 
    x == "rstats" }) })
##    user  system elapsed 
##   0.373   0.018   0.393