planes$tailnum
is a primary key because it uniquely identifies each plane in the planes table.flights$tailnum
is a foreign key because it appears in the flights table where it matches each flight to a unique plane.Matches pairs of observations whenever their keys are equal:
left_join
keeps all observations in xright_join
keeps all observations in yfull_join
keeps all observations in x and ysemi_join(x, y)
keeps all observations in x that have a match in y.anti_join(x, y)
drops all observations in x that have a match in y.Anti-joins are useful for diagnosing join mismatches.
merge()
base::merge()
can perform all four types of joins:
dplyr | merge |
---|---|
inner_join(x, y) |
merge(x, y) |
left_join(x, y) |
merge(x, y, all.x = TRUE) |
right_join(x, y) |
merge(x, y, all.y = TRUE) |
full_join(x, y) |
merge(x, y, all.x = TRUE, all.y = TRUE) |
SQL is the inspiration for dplyr’s conventions, so the translation is straightforward:
dplyr | SQL |
---|---|
inner_join(x, y, by = "z") |
SELECT * FROM x INNER JOIN y USING (z) |
left_join(x, y, by = "z") |
SELECT * FROM x LEFT OUTER JOIN y USING (z) |
right_join(x, y, by = "z") |
SELECT * FROM x RIGHT OUTER JOIN y USING (z) |
full_join(x, y, by = "z") |
SELECT * FROM x FULL OUTER JOIN y USING (z) |
intersect(x, y)
: return only observations in both x and y.union(x, y)
: return unique observations in x and y.setdiff(x, y)
: return observations in x, but not in y.