0.1 Object oriented programming

  • A programming paradigm
  • Work with objects
  • Easy to model things
  • The object is a class, and contains data and methods
  • OOP systems in R: S3, S4, RC and R6.
    • An S3 class object is one of R base types with class attribute set, for example a vector c(“a”,“b”,“c”) with factor variables and type integer.
    • To create an S3 object, simply give a name to a data structure:
    • qf <- structure(list(), class = "genomic_features")
    • Easy to change class of objects - be careful!
  • To create objects of your class, use a constructor, then use a validator to validate inside a helper function
  • To seal S4 class: setClass("class_name", sealed = T)
  • The variables inside an S4 class are stored in slots
    • To retrieve data, use my.gene@name or my.gene[1]
  • R6 implemented to be fast, similar to Java or C++
  • The class itself contains functions needed for changing parameters in the class, like hair colour or age

0.1.1 Tasks

0.1.1.1 1 S3 classes

  • What is the class of the object returned by the lm() function?
x <- lm(speed ~ dist, cars)
class(x)
## [1] "lm"
  • What basic data type is it built upon?
typeof(x)
## [1] "list"
  • What attributes does the object of the lm class have?
attributes(x)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"
  • What is the structure of the lm object?
str(x)
## List of 12
##  $ coefficients : Named num [1:2] 8.284 0.166
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "dist"
##  $ residuals    : Named num [1:50] -4.62 -5.94 -1.95 -4.93 -2.93 ...
##   ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:50] -108.894 29.866 -0.501 -3.945 -1.797 ...
##   ..- attr(*, "names")= chr [1:50] "(Intercept)" "dist" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:50] 8.62 9.94 8.95 11.93 10.93 ...
##   ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##   ..$ qr   : num [1:50, 1:2] -7.071 0.141 0.141 0.141 0.141 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:50] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "dist"
##   .. ..- attr(*, "assign")= int [1:2] 0 1
##   ..$ qraux: num [1:2] 1.14 1.15
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-07
##   ..$ rank : int 2
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 48
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = speed ~ dist, data = cars)
##  $ terms        :Classes 'terms', 'formula'  language speed ~ dist
##   .. ..- attr(*, "variables")= language list(speed, dist)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "speed" "dist"
##   .. .. .. ..$ : chr "dist"
##   .. ..- attr(*, "term.labels")= chr "dist"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(speed, dist)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "speed" "dist"
##  $ model        :'data.frame':   50 obs. of  2 variables:
##   ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
##   ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language speed ~ dist
##   .. .. ..- attr(*, "variables")= language list(speed, dist)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "speed" "dist"
##   .. .. .. .. ..$ : chr "dist"
##   .. .. ..- attr(*, "term.labels")= chr "dist"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(speed, dist)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "speed" "dist"
##  - attr(*, "class")= chr "lm"
  • Does the lm() class implement own str()?
str(unclass(x))
## List of 12
##  $ coefficients : Named num [1:2] 8.284 0.166
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "dist"
##  $ residuals    : Named num [1:50] -4.62 -5.94 -1.95 -4.93 -2.93 ...
##   ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:50] -108.894 29.866 -0.501 -3.945 -1.797 ...
##   ..- attr(*, "names")= chr [1:50] "(Intercept)" "dist" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:50] 8.62 9.94 8.95 11.93 10.93 ...
##   ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##   ..$ qr   : num [1:50, 1:2] -7.071 0.141 0.141 0.141 0.141 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:50] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "dist"
##   .. ..- attr(*, "assign")= int [1:2] 0 1
##   ..$ qraux: num [1:2] 1.14 1.15
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-07
##   ..$ rank : int 2
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 48
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = speed ~ dist, data = cars)
##  $ terms        :Classes 'terms', 'formula'  language speed ~ dist
##   .. ..- attr(*, "variables")= language list(speed, dist)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "speed" "dist"
##   .. .. .. ..$ : chr "dist"
##   .. ..- attr(*, "term.labels")= chr "dist"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(speed, dist)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "speed" "dist"
##  $ model        :'data.frame':   50 obs. of  2 variables:
##   ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
##   ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language speed ~ dist
##   .. .. ..- attr(*, "variables")= language list(speed, dist)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "speed" "dist"
##   .. .. .. .. ..$ : chr "dist"
##   .. .. ..- attr(*, "term.labels")= chr "dist"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(speed, dist)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "speed" "dist"
# No
  • What is the class of a tibble?
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.5
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
df <- as_tibble(mtcars)
class(df)
## [1] "tbl_df"     "tbl"        "data.frame"
  • What is the underlying data type?
typeof(df)
## [1] "list"
  • Is the str function used by tibbles the standard str function?
str(df)
## Classes 'tbl_df', 'tbl' and 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
str(unclass(df))
## List of 11
##  $ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num [1:32] 160 160 108 258 360 ...
##  $ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
##  $ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
##  - attr(*, "row.names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
# No

0.2 R-Scripts

  • In rstudio: source scriptfile.R
  • In the terminal: Rscript scriptfile.R
  • As executable: path/scriptfile.R if:
    • Script is executable: chmod +x scriptfile.R
    • First line in script is a hashbang: #! etc.
  • Can provide arguments to an R script
    • ./scriptfile.R inputfile outputfile
    • Packages are available that support long and short flags (–o, -o)
    • Use commandArgs() to access the arguments pased to R at launch
      • Use trailingonly = TRUE to suppress printing of arguments
    • packages: getopt, optparse, etc
  • To pipe text streams from one script to the next, define an open connection, read one line, and close it
    • To write results to stream, used stdout produced by the code (print, cat etc) which can be piped to a new process or written to a file

0.2.1 Tasks

0.2.1.1 Executing R Scripts

  • Make an R script and source it here
source("norm_test.R")
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -3.306946 -0.725706  0.007249  0.001318  0.681208  2.998722

0.2.1.2 Passing and parsing arguments

0.3 R Packages

0.3.1 What is an R package?

  • A bunch of files and folders, in a specific structure
    • Description
    • Namespace
    • R
    • typicalr.Rproj
  • R packages makes it easy to share code
  • Improves quality and rigor
  • Reproducibility
  • Package name must start with a letter, can’t start or end with a period
    • Make it googleable!

0.3.2 Possible package states

  • Package states:
    • Source: Development version of your package
    • Bundled: Compressed tar.gz file (.Rbuildignore files are ignored)
    • Binary: Bundle that is built for a certain architechure, parsed format
    • Installed: a binary package decompressed into a package library
    • In-memory: after loading the package with library(package)
  • Better to call functions directly from package rather than load multiple packages which opens up a lot of functions that aren’t used, like: packagename::function()
  • Require checks if the package is already installed, if it is, it does nothing. Library reloads the package if it is already loaded.

0.3.3 Package structure

  • Large functions are to be written in their own R files
  • Utility functions in their own file
  • Description file:
    • Title, version, authors, description, dependencies, license, encoding, lazydata, url, bugreports, etc.
  • Version number:
    • 4 numbers (0.0.0.0, major.minor.patch.dev), three numbers after release (removes dev)
    • Major changes: Large changes, not always backwards compatible
    • Minor: Bug fixes and new features
    • Patches: Small bugfixes etc.
    • Dev: Always start with 9000 (0.0.0.9000)
  • Manual directory (man):
    • Contains documentation on the package (.Rd file)
    • Created automatically by Roxygen
  • Vignette is a more complete guide to your package
    • Usually done in Rmarkdown
    • devtools::use_vignette("package")
    • Vignette titles should match
  • Namespace makes sure that the correct function from the correct package is used, regardless of which library is loaded (in case over overlapping function names). Removes the need for ::.
  • Data folder contains functions and/or datasets etc.
  • Source folder contains compiled code
  • Install packages from github: `devtools::install_github()

0.3.4 Tasks

0.3.4.1 See separate files

  • To create a new package, click on file, new project, new directory, R package
  • Use the Roxygen2 package to create basic documentation etc.
  • Use devtools::load_all() to reload the package after every change

0.4 Knitr: Some lesser known features (Yihui Xie)

  • Book: Dynamic Documents with R and knitr
  • Markdown also compatible with python and other languages
  • Access variables from a python code chunk in an R code chunk (py$x)
  • knitr::spin() compiles an rscript to a report
    • Use #+ as chunks
  • Use include = FALSE to exclude everything from a code chunk
  • To live preview HTML documents, use xaringan::inf_mr('your.rmd')
  • knitr::knit_watch() will watch an input file continuously, and knit it when it is updated
  • Animations in Rplots:
    • FFmpeg
  • Change style in rmarkdown: Docco creates a changeable look on markdown documents (Docco classic style)
  • knitr::imgur_upload() uploads an image to imgur.com and returns the url
    • The reprex package use this function to automatically embed plots in the output of reproducible examples *knitr::knit_expand() is used to combine values and text, like this:
    • knit_expand(test = "the value of pi is {{pi}}.")
    • This will create a line with the actual value of pi
  • To show plots later in the document, use knitr::fig_chunk()
    • ![a figure moved here](r knitr::fig_chunk('cars_plot', 'png')) *knitr::write_bib("knitr") creates a bibliography database for the knitr package