Fill Values

February 6, 2022

By Robert Laing

The problem of how to deal with unknown values for logical variables led to the development of three-valued logic, which SQL implemented with NULL, and Prolog with unset variables.

Which students have not made any college applications yet?

In preparation of outer joins, lets here do the portion called an antijoin

Student ▷ Apply

Our basic goal here is to find the set {456, 567, 654, 789}.

In the example so far involving both the Student and Apply tables — “Which students have not applied for CS?” — I sidestepped the problem that the student table has columns sID, sName, GPA and sizeHS while Apply has columns sID, cName, major and decision by projecting both to their single common column, SID.

To explain the problem, lets look at the type (ie columns or attributes) of the two tables we want to think of as sets P and Q, thereby allowing P - Q:

Student Table
sid	sname	gpa	sizehs
123	Amy	3.9	1000
234	Bob	3.6	1500
345	Craig	3.5	500
456	Doris	3.9	1000
567	Edward	2.9	2000
678	Fay	3.8	200
789	Gary	3.4	800
987	Helen	3.7	800
876	Irene	3.9	400
765	Jay	2.9	1500
654	Amy	3.9	1000
543	Craig	3.4	2000

Apply Table
sid	cname	major	decision
123	Stanford	CS	Y
123	Stanford	EE	N
123	Berkeley	CS	Y
123	Cornell	EE	Y
234	Berkeley	biology	N
345	MIT	bioengineering	Y
345	Cornell	bioengineering	N
345	Cornell	CS	Y
345	Cornell	EE	N
678	Stanford	history	Y
987	Stanford	CS	Y
987	Berkeley	CS	Y
876	Stanford	CS	N
876	MIT	biology	Y
876	MIT	marine biology	N
765	Stanford	history	Y
765	Cornell	history	N
765	Cornell	psychology	Y
543	MIT	CS	N

SQL

Here’s the error I get if I try to take the difference of these two tables in SQL as is:

dbclass=> SELECT SID, SName, GPA, SizeHS FROM Student
EXCEPT
SELECT SID, CName, Major, Decision FROM Apply;
ERROR:  EXCEPT types real and text cannot be matched
LINE 3: SELECT SID, CName, Major, Decision FROM Apply;

To get the rows of the Student table to match Apply’s column, I need to fill in the missing values with NULL. This is the first time I’m using relational algebra’s rename operator ρ, which in SQL is AS.

SELECT sID, NULL AS cName, NULL AS major, NULL AS decision
FROM Student
EXCEPT
SELECT sID, NULL, NULL, NULL
FROM Apply
ORDER BY sID;

When psql outputs the table, it represents the NULL values as blanks.

Student ▷ Apply
sid	cname	major	decision
456
567
654
789

Prolog

Prolog doesn’t throw an error when asked to do Student - Apply. Instead, it simply ignores the missing apply column values while keeping the known student column values:

?- student(SID, SName, GPA, SizeHS), \+ apply(SID, CName, Major, Decision).
SID = 456,
SName = 'Doris',
GPA = 3.9,
SizeHS = 1000 ;
SID = 567,
SName = 'Edward',
GPA = 2.9,
SizeHS = 2000 ;
SID = 789,
SName = 'Gary',
GPA = 3.4,
SizeHS = 800 ;
SID = 654,
SName = 'Amy',
GPA = 3.9,
SizeHS = 1000 ;
false.

Prolog doesn’t have the equivalent of a NULL. Lets use the Template argument of findall(+Template, :Goal, -Bag) to project the four sID values for which there are no CName, Major, Decision:

findall(apply(SID, CName, Major, Decision), (
  student(SID, _, _, _), 
  \+ apply(SID, CName, Major, Decision)
), List).

The result is:

List = [
apply(456, _, _, _), 
apply(567, _, _, _), 
apply(789, _, _, _), 
apply(654, _, _, _)
].

Instead of NULL, Prolog uses unset variables as placeholders.