

planes$tailnum is a primary key because it uniquely identifies each plane in the planes table.flights$tailnum is a foreign key because it appears in the flights table where it matches each flight to a unique plane.
Matches pairs of observations whenever their keys are equal:
left_join keeps all observations in xright_join keeps all observations in yfull_join keeps all observations in x and y



semi_join(x, y) keeps all observations in x that have a match in y.
anti_join(x, y) drops all observations in x that have a match in y.
Anti-joins are useful for diagnosing join mismatches.
merge()base::merge() can perform all four types of joins:
| dplyr | merge | 
|---|---|
inner_join(x, y) | 
merge(x, y) | 
left_join(x, y) | 
merge(x, y, all.x = TRUE) | 
right_join(x, y) | 
merge(x, y, all.y = TRUE) | 
full_join(x, y) | 
merge(x, y, all.x = TRUE, all.y = TRUE) | 
SQL is the inspiration for dplyr’s conventions, so the translation is straightforward:
| dplyr | SQL | 
|---|---|
inner_join(x, y, by = "z") | 
SELECT * FROM x INNER JOIN y USING (z) | 
left_join(x, y, by = "z") | 
SELECT * FROM x LEFT OUTER JOIN y USING (z) | 
right_join(x, y, by = "z") | 
SELECT * FROM x RIGHT OUTER JOIN y USING (z) | 
full_join(x, y, by = "z") | 
SELECT * FROM x FULL OUTER JOIN y USING (z) | 
intersect(x, y): return only observations in both x and y.union(x, y): return unique observations in x and y.setdiff(x, y): return observations in x, but not in y.