“The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood, and verified.”
Max Kuhn, CRAN Task View: Reproducible Research
Remember, the data and code are real, the products (tables, figures) are ephemeral…
Peng 2011, Science 334(6060) pp. 1226-1227
“An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.”
Claerbout and Karrenbach, Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics. 1992
Payoffs
Costs
Everything checksummed before storage and then referred by checksum.
It’s impossible to change the contents of any file or directory without Git knowing. You can’t lose information in transit or get file corruption without Git being able to detect it.
A way of reducing digital information to a unique ID:
A 40-character hexadecimal SHA-1 hash: 24b9da6552252987aa493b52f8696cd6d3b00373
Git doesn’t care about filenames, extensions, etc. It’s the information that matters…
The important stuff is hidden in the .git
folder.
Host your own server or use another private company, such as BitBucket.
Git tracks all changes to files inside a repository.
Select which changed files (added, deleted, or edited) you want to commit.
Add a commit message and click commit.
push
)Click the green arrow to sync with GitHub.
Git has many, many more features…
RStudio has limited functionality.
$ git help <verb>
$ git <verb> --help
$ man git-<verb>
For example, you can get the manpage help for the config command by running git help config
Similar to info in git tab in RStudio
git config
shows you all the git configuration settings:
user.email
remote.origin.url
(e.g. to connect to GitHub)Branches used to develop features isolated from each other.
Default: master branch. Use other branches for development/collaboration and merge them back upon completion.
$ git checkout -b devel # create new branch and switch to it
$ git checkout master #switch back to master
$ git merge devel #merge in changes from devel branch
But we won’t do much with branching in this course…
Check out the (free) book ProGIT
Or the cheatsheet.
Slides adapted from Dr. Çetinkaya-Rundel and Ben Marwick’s presentation to the UW Center for Statistics and Social Sciences (12 March 2014) (OrcID)