Serial Computing

Most (legacy) software is written for serial computation:

  • Problem broken into discrete set of instructions
  • Instructions executed sequentially on a single processor


Figure from here

Parallel computation

  • Problem divided into discrete parts that can be solved concurrently
  • Instructions executed simultaneously on different processors
  • Overall control/coordination mechanism

alt text
Figure from here

Flynn's taxonomy

A classification of computer architectures (Flynn, 1972)

Four Categories

  1. Single Instruction, Single Data (SISD)
    • No parallelization
  2. Single Instruction, Multiple Data (SIMD)
    • Run the same code/analysis on different datasets
    • Examples:
      • different species in species distribution model
      • same species under different climates

  1. Multiple Instruction, Single Data (MISD)
    • Run different code/analyses on the same data
    • Examples:
      • One species, multiple models
  2. Multiple Instruction, Multiple Data streams (MIMD)
    • Run different code/analyses on different data
    • Examples:
      • Different species & different models

Flynn's Taxonomy

Our focus: Single Instruction, Multiple Data (SIMD)

  1. Parallel functions within an R script
    • starts on single processor
    • runs looped elements on multiple 'slave' processors
    • returns results of all iterations to the original instance
    • foreach, multicore, plyr, raster
  2. Alternative: run many separate instances of R in parallel with Rscript
    • need another operation to combine the results
    • preferable for long, complex jobs
    • NOT planning to discuss in this session

R Packages

There are many R packages for parallelization, check out the CRAN Task View on High-Performance and Parallel Computing for an overview. For example:

  • Rmpi: Built on MPI (Message Passing Interface), a de facto standard in parallel computing.
  • snow: Simple Network of Workstations can use several standards (PVM, MPI, NWS)
  • parallel Built in R package (since v2.14.0).

Foreach Package

In this session we'll focus on the foreach package, which has numerous advantages including:

  • intuitive for() loop-like syntax
  • flexibility of parallel 'backends' from laptops to supercomputers (multicore, parallel, snow, Rmpi, etc.)
  • nice options for combining output from parallelized jobs

Documentation for foreach: