Inner Join
By Robert Laing
Relational algebra uses various bowtie symbols to describe binary operations on two tables, which are anologous to combining two sets as I’ll try to explain in this diagram.
The above illustrates what relational algebra calls a full outer join which combines three parts, the inner or natural join P ⋈ Q which equates to P ∩ Q, and what this section covers, flanked by left and right antijoins P ▷ Q which equats to P - Q, and Q ▷ P which equats to Q - P.
Since outer joins involve doing a union, as in a left outer join P ⟕ Q is (P - Q) ∪ (P ∩ Q), I’ll cover them as a subsection of disjunction.
SQL has
join types
to use in the FROM
clause, which is probably better style since it’s closer to relational algebra.
But joins can also be done using AND
and OR
in the WHERE
clause.
Three-way inner join
Lets create a single view of the three tables in the college by doing a three-way inner join Student ⋈ Apply ⋈ College
sid | sname | gpa | sizehs | major | decision | cname | state | enrollment |
---|---|---|---|---|---|---|---|---|
123 | Amy | 3.9 | 1000 | CS | Y | Berkeley | CA | 36000 |
123 | Amy | 3.9 | 1000 | EE | Y | Cornell | NY | 21000 |
123 | Amy | 3.9 | 1000 | EE | N | Stanford | CA | 15000 |
123 | Amy | 3.9 | 1000 | CS | Y | Stanford | CA | 15000 |
234 | Bob | 3.6 | 1500 | biology | N | Berkeley | CA | 36000 |
345 | Craig | 3.5 | 500 | bioengineering | Y | MIT | MA | 10000 |
345 | Craig | 3.5 | 500 | EE | N | Cornell | NY | 21000 |
345 | Craig | 3.5 | 500 | bioengineering | N | Cornell | NY | 21000 |
345 | Craig | 3.5 | 500 | CS | Y | Cornell | NY | 21000 |
543 | Craig | 3.4 | 2000 | CS | N | MIT | MA | 10000 |
678 | Fay | 3.8 | 200 | history | Y | Stanford | CA | 15000 |
765 | Jay | 2.9 | 1500 | history | N | Cornell | NY | 21000 |
765 | Jay | 2.9 | 1500 | history | Y | Stanford | CA | 15000 |
765 | Jay | 2.9 | 1500 | psychology | Y | Cornell | NY | 21000 |
876 | Irene | 3.9 | 400 | CS | N | Stanford | CA | 15000 |
876 | Irene | 3.9 | 400 | biology | Y | MIT | MA | 10000 |
876 | Irene | 3.9 | 400 | marine biology | N | MIT | MA | 10000 |
987 | Helen | 3.7 | 800 | CS | Y | Stanford | CA | 15000 |
987 | Helen | 3.7 | 800 | CS | Y | Berkeley | CA | 36000 |
Note students {456, 567, 789, 654} aren’t in the inner join. How to include them will be covered in [/classical-logic/disjunction/outer-join/]
We have various options to get the above.
Natural Join
The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value (this is a consequence of the idempotence of the logical AND).— Wikipedia article on relational algebra
Since College and Apply have a shared cName column, and Apply and Student have a shared sID column, these can be natural joins.
A subtletly with natural joins or alternatively USING(shared_column)
is we don’t have to resolve
ambiguities over shared_column
by using tablename.shared_column
in the SELECT
and
ORDER BY
clauses as would be required with AND table1.shared_column = table2.shared_column
.
SELECT sID, sName, GPA, sizeHS, major, decision, cName, state, enrollment
FROM Student NATURAL INNER JOIN Apply NATURAL INNER JOIN College;
Inner Join with ON
If the name column in College was called Name and the ID column in Student simply ID while Apply’s corresponding columns retained their names cName and sID, a more verbose version of the above query would be needed:
SELECT Student.sID, sName, GPA, sizeHS, major, decision, College.cName, state, enrollment
FROM Student INNER JOIN Apply ON Student.sID = Apply.sID
INNER JOIN College ON Apply.cName = College.cName;
Inner Join with USING
Rather than ON Student.sID = Apply.sID
, SQL allows USING(sID)
.
In this case it’s equivalent to NATURAL
. Again, we can
use sID
and cName
rather than Student.sID
and College.cName
, which
is important for outer joins.
SELECT sID, sName, GPA, sizeHS, major, decision, cName, state, enrollment
FROM Student INNER JOIN Apply USING(sID) INNER JOIN College USING(cName);
Plain Vanilla SQL
SELECT Student.sID, sName, GPA, sizeHS, major, decision, College.cName, state, enrollment
FROM Student, Apply, College
WHERE Student.sID = Apply.sID AND Apply.cName = College.cName;
Prolog
Inner joins in Prolog simply require using the same variable names for the joining columns in their respective predicates.
findall(view(SID, SName, GPA, SizeHS, Major, Decision, CName, State, Enrollment), (
student(SID, SName, GPA, SizeHS),
apply(SID, CName, Major, Decision),
college(CName, State, Enrollment)
), List).
[ view(123,'Amy',3.9,1000,'CS','Y','Stanford','CA',15000)
, view(123,'Amy',3.9,1000,'EE','N','Stanford','CA',15000)
, view(123,'Amy',3.9,1000,'CS','Y','Berkeley','CA',36000)
, view(123,'Amy',3.9,1000,'EE','Y','Cornell','NY',21000)
, view(234,'Bob',3.6,1500,biology,'N','Berkeley','CA',36000)
, view(345,'Craig',3.5,500,bioengineering,'Y','MIT','MA',10000)
, view(345,'Craig',3.5,500,bioengineering,'N','Cornell','NY',21000)
, view(345,'Craig',3.5,500,'CS','Y','Cornell','NY',21000)
, view(345,'Craig',3.5,500,'EE','N','Cornell','NY',21000)
, view(678,'Fay',3.8,200,history,'Y','Stanford','CA',15000)
, view(678,'Fay',3.8,200,psychology,'Y','Stanford','CA',15000)
, view(987,'Helen',3.7,800,'CS','Y','Stanford','CA',15000)
, view(987,'Helen',3.7,800,'CS','Y','Berkeley','CA',36000)
, view(876,'Irene',3.9,400,'CS','N','Stanford','CA',15000)
, view(876,'Irene',3.9,400,biology,'Y','MIT','MA',10000)
, view(876,'Irene',3.9,400,'marine biology','N','MIT','MA',10000)
, view(765,'Jay',2.9,1500,history,'Y','Stanford','CA',15000)
, view(765,'Jay',2.9,1500,history,'N','Cornell','NY',21000)
, view(765,'Jay',2.9,1500,psychology,'Y','Cornell','NY',21000)
, view(543,'Craig',3.4,2000,'CS','N','MIT','MA',10000)
]