March 4, 2015
Implement Pipes (as in F#) to:
Pipe a value forward into an expression or function call, e.g.
cars$speed %>% sort %>% head
## [1] 4 4 7 7 8 9
Ville de Montréal Food Safety Offenders:
etablissement | montant | ville |
---|---|---|
RESTAURANT MAN LI | 1250 $ | Pierrefonds H8Y 3E3 |
RESTAURANT HUNG SING | 1250 $ | Montréal H1N 1C1 |
BELLES DIONYSIA | 750 $ | St-Laurent H4S 1L8 |
RESTO BAR NICKELS ST-LAURENT | 1500 $ | St-Laurent H4R 1K4 |
RESTAURANT CARVELI | 2000 $ | Montréal H4V 1H5 |
Suppose you wish to print the maximum fine, with comma-separatd thousands. You could nest the function calls:
format(max(as.numeric(gsub(" \\$","",offenders$montant))),big.mark=',')
## [1] "7,500"
or with an intermediary temporary variable:
max.fine = gsub(" \\$","",offenders$montant) max.fine = as.numeric(max.fine) max.fine = max(max.fine) format(max.fine,big.mark=',')
## [1] "7,500"
With pipes it becomes
offenders$montant %>% gsub(" \\$","",.) %>% as.numeric %>% max %>% format(big.mark=',')
## [1] "7,500"
Changing one step in the workflow:
format(summary(as.numeric(gsub(" \\$","" ,offenders$montant)),digits=Inf),big.mark=',')
Compare to magrittr:
offenders$montant %>% gsub(" \\$","",.) %>% as.numeric %>% summary(digits=Inf) %>% format(big.mark=',')
By default the LHS value is passed as the first argument of the RHS. Use the "." if this is not the case:
offenders$montant %>% gsub(" \\$","",.) %>% as.numeric %>% summary(digits=Inf) %>% format(big.mark=',')
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## " 250.000" "1,100.000" "1,500.000" "1,513.438" "1,750.000" "7,500.000"
offenders$ville[grepl("PIZZA",offenders$etablissement)] %>% str_extract("[A-Z][0-9][A-Z] [0-9][A-Z][0-9]$") %>% unique %>% c("H3A 2J5") %>% ( function(postal.codes){ df = data.frame("locationvar"=postal.codes,"tipvar"=postal.codes) }) %>% gvisMap %>% plot
Example from the author:
f <- . %>% cos %>% sin # is equivalent to f <- function(.) sin(cos(.))
f(0)
## [1] 0.841471
offenders$montant = offenders$montant %>% gsub(" \\$","",.) %>% as.numeric
is equivalent to:
offenders$montant %<>% gsub(" \\$","",.) %>% as.numeric
The “tee” operator, %T>% allows branching off the main flow to go accomplish side-effects, e.g. plots.
iris[,1:4] %T>% { scale(.) %>% t %>% dist %>% cmdscale %>% plot } %>% cor %>% abs %>% min %>% round(3)
## [1] 0.118
This operator exposes the names of the LHS to the RHS:
iris[,c(1,3)] %T>% plot %$% lm(Sepal.Length ~ Petal.Length)$coef
## (Intercept) Petal.Length ## 4.3066034 0.4089223
In 2014 blog post, Ken Run (author of pipeR) discusses the cost of pipes:
system.time({ lapply(1:50000, function(i) { sample(letters,6,replace = T) %>% paste(collapse = "") %>% "=="("rstats") }) })
## user system elapsed ## 5.924 0.012 5.947
system.time({lapply(1:50000, function(i) { x = sample(letters,6,replace = T) x = paste(x, collapse = "") x == "rstats" }) })
## user system elapsed ## 0.373 0.018 0.393