Fill Values
By Robert Laing
The problem of how to deal with unknown values for logical variables led to the development of three-valued logic, which SQL implemented with NULL, and Prolog with unset variables.
Which students have not made any college applications yet?
In preparation of outer joins, lets here do the portion called an antijoin
Student ▷ Apply
Our basic goal here is to find the set {456, 567, 654, 789}.
In the example so far involving both the Student
and Apply
tables —
“Which students have not applied for CS?” — I sidestepped the problem
that the student table has columns sID, sName, GPA
and sizeHS
while
Apply has columns sID, cName, major
and decision
by
projecting both to their single common column, SID.
To explain the problem, lets look at the type (ie columns or attributes) of the two tables we want to think of as sets P and Q, thereby allowing P - Q:
sid | sname | gpa | sizehs |
---|---|---|---|
123 | Amy | 3.9 | 1000 |
234 | Bob | 3.6 | 1500 |
345 | Craig | 3.5 | 500 |
456 | Doris | 3.9 | 1000 |
567 | Edward | 2.9 | 2000 |
678 | Fay | 3.8 | 200 |
789 | Gary | 3.4 | 800 |
987 | Helen | 3.7 | 800 |
876 | Irene | 3.9 | 400 |
765 | Jay | 2.9 | 1500 |
654 | Amy | 3.9 | 1000 |
543 | Craig | 3.4 | 2000 |
sid | cname | major | decision |
---|---|---|---|
123 | Stanford | CS | Y |
123 | Stanford | EE | N |
123 | Berkeley | CS | Y |
123 | Cornell | EE | Y |
234 | Berkeley | biology | N |
345 | MIT | bioengineering | Y |
345 | Cornell | bioengineering | N |
345 | Cornell | CS | Y |
345 | Cornell | EE | N |
678 | Stanford | history | Y |
987 | Stanford | CS | Y |
987 | Berkeley | CS | Y |
876 | Stanford | CS | N |
876 | MIT | biology | Y |
876 | MIT | marine biology | N |
765 | Stanford | history | Y |
765 | Cornell | history | N |
765 | Cornell | psychology | Y |
543 | MIT | CS | N |
SQL
Here’s the error I get if I try to take the difference of these two tables in SQL as is:
dbclass=> SELECT SID, SName, GPA, SizeHS FROM Student
EXCEPT
SELECT SID, CName, Major, Decision FROM Apply;
ERROR: EXCEPT types real and text cannot be matched
LINE 3: SELECT SID, CName, Major, Decision FROM Apply;
To get the rows of the Student table to match Apply’s column, I need to fill in the missing
values with NULL
. This is the first time I’m using relational algebra’s
rename operator ρ,
which in SQL is AS
.
SELECT sID, NULL AS cName, NULL AS major, NULL AS decision
FROM Student
EXCEPT
SELECT sID, NULL, NULL, NULL
FROM Apply
ORDER BY sID;
When psql outputs the table, it represents the NULL
values as blanks.
sid | cname | major | decision |
---|---|---|---|
456 | |||
567 | |||
654 | |||
789 |
Prolog
Prolog doesn’t throw an error when asked to do Student - Apply. Instead, it simply ignores the missing apply column values while keeping the known student column values:
?- student(SID, SName, GPA, SizeHS), \+ apply(SID, CName, Major, Decision).
SID = 456,
SName = 'Doris',
GPA = 3.9,
SizeHS = 1000 ;
SID = 567,
SName = 'Edward',
GPA = 2.9,
SizeHS = 2000 ;
SID = 789,
SName = 'Gary',
GPA = 3.4,
SizeHS = 800 ;
SID = 654,
SName = 'Amy',
GPA = 3.9,
SizeHS = 1000 ;
false.
Prolog doesn’t have the equivalent of a NULL
. Lets use the Template
argument of
findall(+Template, :Goal, -Bag)
to project the four sID values for which there are no CName, Major, Decision
:
findall(apply(SID, CName, Major, Decision), (
student(SID, _, _, _),
\+ apply(SID, CName, Major, Decision)
), List).
The result is:
List = [
apply(456, _, _, _),
apply(567, _, _, _),
apply(789, _, _, _),
apply(654, _, _, _)
].
Instead of NULL
, Prolog uses unset variables as placeholders.