Frontier Software

Not

By Robert Laing

Lets turn the query “Which students applied for ‘CS’?” around, again using Stanford University dean Jennifer Widom’s basic SQL examples using her college applications data.

Which students have not applied for CS?

A trap many people will fall into is to rewrite the σMajor = ‘CS’(Apply) selection to not equal, editing it into something like σMajor != ‘CS’(Apply).

That would return the set for not ‘CS’ as {123, 234, 345, 678, 765, 876}. Recalling that the students who applied for ‘CS’ were {123, 345, 543, 876, 987}, we see we shouldn’t have {123, 345, 876} in our result. The reason those students match Major <> 'CS' is student 123 also applied for ‘EE’, 345 applied for ‘EE’ and ‘bioengineering’ besides ‘CS’, while 876 applied for ‘biology’ and ‘marine biology’ besides ‘CS’.

But {234, 678, 765} also isn’t the correct answer because it leaves out {456, 567, 789, 654}.

college-classes.svg

There are broadly two ways of writing this type of query: using set difference, which for SQL means using EXCEPT as we’ll do in the next section, or thinking in terms of not a member of as we’ll do now. In relational algebra, the query we want to translate into SQL and Prolog looks something like:

σsID ∉ {ΠsIDMajor = ‘CS’(Apply))}(Student)

SQL

SQL offers NOT IN, as a subquery in the WHERE clause.

SELECT sid 
FROM Student
WHERE sid NOT IN (SELECT sid FROM Apply WHERE major = 'CS')
ORDER BY sid;

For those who enjoy their logic the Victorian syllogistic way, a synonymous way of writing the above is:

SELECT sid 
FROM Student
WHERE sid <> ALL(SELECT sid FROM Apply WHERE major = 'CS')
ORDER BY sid;

SQL, or at least Postgresql, has two versions of ALL, one which expects its input argument to be a subquery, and another which accepts an array.

SELECT sid 
FROM Student
WHERE sid <> ALL(ARRAY[123, 345, 543, 876, 987])
ORDER BY sid;

Prolog

The non list way of doing this is:

student(SID, _, _, _),
\+ apply(SID, _, 'CS', _).

If we allow lists, we can use Prolog’s handy builtin memberchk(?Elem, +List) to check for member inclusion and exclusion:

findall(CID, apply(CID, _, 'CS', _), CIDs),
findall(SID, (student(SID, _, _, _), \+ memberchk(SID, CIDs)), SIDs),
sort(SIDs, OSIDs).

Which returns OSIDs = [234, 456, 567, 654, 678, 765, 789].

By thinking of ordered lists as sets and jumping ahead to det differences, we can alternatively use subtract(+Set, +Delete, -Result).

findall(SID, student(SID, _, _, _), SIDs),
sort(SIDs, OSIDs),
findall(CID, apply(CID, _, 'CS', _), CIDs),
sort(CIDs, OCIDs),
subtract(OSIDs, OCIDs, Diff).

Which returns Diff = [234, 456, 567, 654, 678, 765, 789].

Next in difference, we’ll redo this example using SQL’s EXCEPT operator.