Frontier Software

Inner Join

By Robert Laing

Relational algebra uses various bowtie symbols to describe binary operations on two tables, which are anologous to combining two sets as I’ll try to explain in this diagram.

P - Q P ∩ Q Q - P P Q

The above illustrates what relational algebra calls a full outer join which combines three parts, the inner or natural join P ⋈ Q which equates to P ∩ Q, and what this section covers, flanked by left and right antijoins P ▷ Q which equats to P - Q, and Q ▷ P which equats to Q - P.

Since outer joins involve doing a union, as in a left outer join P ⟕ Q is (P - Q) ∪ (P ∩ Q), I’ll cover them as a subsection of disjunction.

SQL has join types to use in the FROM clause, which is probably better style since it’s closer to relational algebra. But joins can also be done using AND and OR in the WHERE clause.

Three-way inner join

Lets create a single view of the three tables in the college by doing a three-way inner join Student ⋈ Apply ⋈ College

sid sname gpa sizehs major decision cname state enrollment
123 Amy 3.9 1000 CS Y Berkeley CA 36000
123 Amy 3.9 1000 EE Y Cornell NY 21000
123 Amy 3.9 1000 EE N Stanford CA 15000
123 Amy 3.9 1000 CS Y Stanford CA 15000
234 Bob 3.6 1500 biology N Berkeley CA 36000
345 Craig 3.5 500 bioengineering Y MIT MA 10000
345 Craig 3.5 500 EE N Cornell NY 21000
345 Craig 3.5 500 bioengineering N Cornell NY 21000
345 Craig 3.5 500 CS Y Cornell NY 21000
543 Craig 3.4 2000 CS N MIT MA 10000
678 Fay 3.8 200 history Y Stanford CA 15000
765 Jay 2.9 1500 history N Cornell NY 21000
765 Jay 2.9 1500 history Y Stanford CA 15000
765 Jay 2.9 1500 psychology Y Cornell NY 21000
876 Irene 3.9 400 CS N Stanford CA 15000
876 Irene 3.9 400 biology Y MIT MA 10000
876 Irene 3.9 400 marine biology N MIT MA 10000
987 Helen 3.7 800 CS Y Stanford CA 15000
987 Helen 3.7 800 CS Y Berkeley CA 36000

Note students {456, 567, 789, 654} aren’t in the inner join. How to include them will be covered in [/classical-logic/disjunction/outer-join/]

We have various options to get the above.

Natural Join

The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value (this is a consequence of the idempotence of the logical AND). Wikipedia article on relational algebra

Since College and Apply have a shared cName column, and Apply and Student have a shared sID column, these can be natural joins.

A subtletly with natural joins or alternatively USING(shared_column) is we don’t have to resolve ambiguities over shared_column by using tablename.shared_column in the SELECT and ORDER BY clauses as would be required with AND table1.shared_column = table2.shared_column.

SELECT sID, sName, GPA, sizeHS, major, decision, cName, state, enrollment
FROM Student NATURAL INNER JOIN Apply NATURAL INNER JOIN College;

Inner Join with ON

If the name column in College was called Name and the ID column in Student simply ID while Apply’s corresponding columns retained their names cName and sID, a more verbose version of the above query would be needed:

SELECT Student.sID, sName, GPA, sizeHS, major, decision, College.cName, state, enrollment
FROM Student INNER JOIN Apply ON Student.sID = Apply.sID
  INNER JOIN College ON Apply.cName = College.cName;

Inner Join with USING

Rather than ON Student.sID = Apply.sID, SQL allows USING(sID). In this case it’s equivalent to NATURAL. Again, we can use sID and cName rather than Student.sID and College.cName, which is important for outer joins.

SELECT sID, sName, GPA, sizeHS, major, decision, cName, state, enrollment
FROM Student INNER JOIN Apply USING(sID) INNER JOIN College USING(cName);

Plain Vanilla SQL

SELECT Student.sID, sName, GPA, sizeHS, major, decision, College.cName, state, enrollment
FROM Student, Apply, College 
WHERE Student.sID = Apply.sID AND Apply.cName = College.cName;

Prolog

Inner joins in Prolog simply require using the same variable names for the joining columns in their respective predicates.

findall(view(SID, SName, GPA, SizeHS, Major, Decision, CName, State, Enrollment), (
student(SID, SName, GPA, SizeHS),
apply(SID, CName, Major, Decision),
college(CName, State, Enrollment)
), List).
[ view(123,'Amy',3.9,1000,'CS','Y','Stanford','CA',15000)
, view(123,'Amy',3.9,1000,'EE','N','Stanford','CA',15000)
, view(123,'Amy',3.9,1000,'CS','Y','Berkeley','CA',36000)
, view(123,'Amy',3.9,1000,'EE','Y','Cornell','NY',21000)
, view(234,'Bob',3.6,1500,biology,'N','Berkeley','CA',36000)
, view(345,'Craig',3.5,500,bioengineering,'Y','MIT','MA',10000)
, view(345,'Craig',3.5,500,bioengineering,'N','Cornell','NY',21000)
, view(345,'Craig',3.5,500,'CS','Y','Cornell','NY',21000)
, view(345,'Craig',3.5,500,'EE','N','Cornell','NY',21000)
, view(678,'Fay',3.8,200,history,'Y','Stanford','CA',15000)
, view(678,'Fay',3.8,200,psychology,'Y','Stanford','CA',15000)
, view(987,'Helen',3.7,800,'CS','Y','Stanford','CA',15000)
, view(987,'Helen',3.7,800,'CS','Y','Berkeley','CA',36000)
, view(876,'Irene',3.9,400,'CS','N','Stanford','CA',15000)
, view(876,'Irene',3.9,400,biology,'Y','MIT','MA',10000)
, view(876,'Irene',3.9,400,'marine biology','N','MIT','MA',10000)
, view(765,'Jay',2.9,1500,history,'Y','Stanford','CA',15000)
, view(765,'Jay',2.9,1500,history,'N','Cornell','NY',21000)
, view(765,'Jay',2.9,1500,psychology,'Y','Cornell','NY',21000)
, view(543,'Craig',3.4,2000,'CS','N','MIT','MA',10000)
]