Frontier Software

Truth

By Robert Laing

What is true?

For our purposes, any garbage is true, provided it’s in the database.

Prolog possibly makes that clearer by saying you assert(+Term) “facts” into the database as opposed to SQL’s INSERT.

Furthermore, truth is mutable, something which upset ancient Greek philosopher Cratylus so much, he stopped speaking and would only communicate by wagging his finger.

To back up the assertion I made in the intro that all boolean functions (aka predicates) boil down to either a ≤ b or a = b which can then be strung together with and, or, and not, I’d like to quickly run through the the introductory SQL examples from Stanford University dean Jennifer Widom’s Relational Databases and SQL online course. I’ve put her original SQL college tables example below in SQL followed by my translation into Prolog, which I originally put on SWI Prolog webfrontend SWISH.

I’ve included the original SQL database creation script and my Prolog translation below: While doing this page, discovered psql’s \H command whereby select * from student; ouputs this:

Student Table
sid sname gpa sizehs
123 Amy 3.9 1000
234 Bob 3.6 1500
345 Craig 3.5 500
456 Doris 3.9 1000
567 Edward 2.9 2000
678 Fay 3.8 200
789 Gary 3.4 800
987 Helen 3.7 800
876 Irene 3.9 400
765 Jay 2.9 1500
654 Amy 3.9 1000
543 Craig 3.4 2000

select * from apply;

Apply Table
sid cname major decision
123 Stanford CS Y
123 Stanford EE N
123 Berkeley CS Y
123 Cornell EE Y
234 Berkeley biology N
345 MIT bioengineering Y
345 Cornell bioengineering N
345 Cornell CS Y
345 Cornell EE N
678 Stanford history Y
987 Stanford CS Y
987 Berkeley CS Y
876 Stanford CS N
876 MIT biology Y
876 MIT marine biology N
765 Stanford history Y
765 Cornell history N
765 Cornell psychology Y
543 MIT CS N

select * from college;

College Table
cname state enrollment
Stanford CA 15000
Berkeley CA 36000
MIT MA 10000
Cornell NY 21000

Basic College example

SQL

drop table if exists College;
drop table if exists Student;
drop table if exists Apply;

create table College(cName text, state text, enrollment int);
create table Student(sID int, sName text, GPA real, sizeHS int);
create table Apply(sID int, cName text, major text, decision text);

insert into Student values (123, 'Amy', 3.9, 1000);
insert into Student values (234, 'Bob', 3.6, 1500);
insert into Student values (345, 'Craig', 3.5, 500);
insert into Student values (456, 'Doris', 3.9, 1000);
insert into Student values (567, 'Edward', 2.9, 2000);
insert into Student values (678, 'Fay', 3.8, 200);
insert into Student values (789, 'Gary', 3.4, 800);
insert into Student values (987, 'Helen', 3.7, 800);
insert into Student values (876, 'Irene', 3.9, 400);
insert into Student values (765, 'Jay', 2.9, 1500);
insert into Student values (654, 'Amy', 3.9, 1000);
insert into Student values (543, 'Craig', 3.4, 2000);

insert into College values ('Stanford', 'CA', 15000);
insert into College values ('Berkeley', 'CA', 36000);
insert into College values ('MIT', 'MA', 10000);
insert into College values ('Cornell', 'NY', 21000);

insert into Apply values (123, 'Stanford', 'CS', 'Y');
insert into Apply values (123, 'Stanford', 'EE', 'N');
insert into Apply values (123, 'Berkeley', 'CS', 'Y');
insert into Apply values (123, 'Cornell', 'EE', 'Y');
insert into Apply values (234, 'Berkeley', 'biology', 'N');
insert into Apply values (345, 'MIT', 'bioengineering', 'Y');
insert into Apply values (345, 'Cornell', 'bioengineering', 'N');
insert into Apply values (345, 'Cornell', 'CS', 'Y');
insert into Apply values (345, 'Cornell', 'EE', 'N');
insert into Apply values (678, 'Stanford', 'history', 'Y');
insert into Apply values (987, 'Stanford', 'CS', 'Y');
insert into Apply values (987, 'Berkeley', 'CS', 'Y');
insert into Apply values (876, 'Stanford', 'CS', 'N');
insert into Apply values (876, 'MIT', 'biology', 'Y');
insert into Apply values (876, 'MIT', 'marine biology', 'N');
insert into Apply values (765, 'Stanford', 'history', 'Y');
insert into Apply values (765, 'Cornell', 'history', 'N');
insert into Apply values (765, 'Cornell', 'psychology', 'Y');
insert into Apply values (543, 'MIT', 'CS', 'N');

Prolog

My translation into Prolog looks as follows:

%! college(?CName:text, ?State:text, ?Enrollment:integer) is nondet
college('Stanford', 'CA', 15000).
college('Berkeley', 'CA', 36000).
college('MIT',      'MA', 10000).
college('Cornell',  'NY', 21000).

%! student(?SID:text, ?SName:text, ?GPA:float, ?SizeHS:integer) is nondet 
student(123, 'Amy',    3.9, 1000).
student(234, 'Bob',    3.6, 1500).
student(345, 'Craig',  3.5, 500).
student(456, 'Doris',  3.9, 1000).
student(567, 'Edward', 2.9, 2000).
student(678, 'Fay',    3.8, 200).
student(789, 'Gary',   3.4, 800).
student(987, 'Helen',  3.7, 800).
student(876, 'Irene',  3.9, 400).
student(765, 'Jay',    2.9, 1500).
student(654, 'Amy',    3.9, 1000).
student(543, 'Craig',  3.4, 2000).

%! apply(?SID:integer, ?CName:text, ?Major:text, ?Decision:text) is nondet
apply(123, 'Stanford', 'CS',             'Y').
apply(123, 'Stanford', 'EE',             'N').
apply(123, 'Berkeley', 'CS',             'Y').
apply(123, 'Cornell',  'EE',             'Y').
apply(234, 'Berkeley', 'biology',        'N').
apply(345, 'MIT',      'bioengineering', 'Y').
apply(345, 'Cornell',  'bioengineering', 'N').
apply(345, 'Cornell',  'CS',             'Y').
apply(345, 'Cornell',  'EE',             'N').
apply(678, 'Stanford', 'history',        'Y').
apply(987, 'Stanford', 'CS',             'Y').
apply(987, 'Berkeley', 'CS',             'Y').
apply(876, 'Stanford', 'CS',             'N').
apply(876, 'MIT',      'biology',        'Y').
apply(876, 'MIT',      'marine biology', 'N').
apply(765, 'Stanford', 'history',        'Y').
apply(765, 'Cornell',  'history',        'N').
apply(765, 'Cornell',  'psychology',     'Y').
apply(543, 'MIT',      'CS',             'N').

Please note the tablename apply is completely unrelated to Prolog’s builtin apply(:Goal, +List). I cut ’n pasted Widom’s tablename apply without remembering the word already has a common usage.

Widom’s example database has a strong bias to ‘CS’ and a slight to the major I took way back in the last century, ‘EE’, which I only discovered writing these notes. Let’s start with this simple query:

Which students have applied to major in ‘CS’?

In SQL, this query looks like:

dbclass=> select * from apply where major = 'CS';
 sid |  cname   | major | decision 
-----+----------+-------+----------
 123 | Stanford | CS    | Y
 123 | Berkeley | CS    | Y
 345 | Cornell  | CS    | Y
 987 | Stanford | CS    | Y
 987 | Berkeley | CS    | Y
 876 | Stanford | CS    | N
 543 | MIT      | CS    | N
(7 rows)

In Prolog, the query would look like this:

apply(SID, CName, 'CS', Decision).

Something that may confuse Prolog novices is I wrote apply(SID, CName, 'CS', Decision). which is shorthand for apply(SID, CName, Major, Decision), Major == 'CS'.

I can only use the “matching in the head” trick for a = b predicates. The a ≤ b family have to be written as separate comma separated statements. By the way, and in Prolog is a comma, and don’t forget the final full stop to end your query (something I often do).

Unlike Postgresql’s repl, swipl requires repeated pressing of ; (a semicolon means or in Prolog, and a comma and) to see all the results.

A way of getting Prolog to pretty-print all the rows found is to first put them in a list using one of its Finding all Solutions to a Goal builtins, of which I’m biased toward findall(+Template, :Goal, -Bag), because I don’t understand the caret notation used by setof(+Template, +Goal, -Set).

Once in a list, I can use format(+Format, :Arguments) combined with maplist(:Goal, ?List1) like so:

?- findall([SID, CName, 'CS', Decision], apply(SID, CName, 'CS', Decision), L),  maplist(format("~w ~w ~w ~w~n"), L).
123 Stanford CS Y
123 Berkeley CS Y
345 Cornell CS Y
987 Stanford CS Y
987 Berkeley CS Y
876 Stanford CS N
543 MIT CS N

In SQL, the equivalent of findall is array_agg ( anynonarray ), or if I want a JSON array jsonb_agg( anyelement ) or object jsonb_object_agg ( key "any", value "any" ) .

It hadn’t occured to me before that putting the results of database queries into lists (aka arrays) is a form of aggregation.

When dealing with some or other aggregate of database rows, in SQL there’s the need to expand the basic SELECT... FROM... WHEN syntax to include HAVING and GROUP BY.

Prolog also distinguishes between aggregated results and row results, using jargon that seems strange to me. Prolog’s aggregators are called deterministic, shortened to det in the documentation, and ones which return whatever number of matching rows are called non-deterministic or nondet.

If a deterministic predicate sometimes returns false, it’s called semidet. I’ll elaborate on that later explaining a subtle difference between findall (which is det) and setof (which is semidet).

The same jargon appears in automata, where non-deterministic refers to ambiguity because the same label can lead to more than one next state, requiring them to be grouped into sets… a totally different use of the word to Prolog.

Which students have not applied to major in ‘CS’?

To forshadow what I’ll cover in detail in negation, ponder how you would write the above query.

You might think apply(SID, CName, Major, Decision), Major \== 'CS'. but you’d get the wrong result because it would include those students who applied for ‘CS’ and an alternative. (Spoiler alert: everyone who applied to major in ‘EE’ also applied for ‘CS’).

What seems an easy query actually illustrates the subtleties of negation nicely, which I’ll delve into in that section.

For those who don’t know @Term1 \== @Term2 is Prolog for SQL’s a <> b or a != b.

Which students have applied to major in ‘CS’ at Stanford?

One way of looking at this query is as a simple form of AND, aka conjunction.

In SQL:

select * from apply where major='CS' and cname='Stanford';

And in Prolog

apply(SID, 'Stanford', 'CS', Decision).

Another way of thinking of this query is as what SQL calls a subquery in the FROM clause, which can be done either as:

select *
from (select * from apply where major = 'CS') as subset
where cname = 'Stanford';

which thanks to logical operators and and or, like their arithmetic counterparts multiplication and addition, being commutative, is exactly the same as.

select *
from (select * from apply where cname = 'Stanford') as subset
where major = 'CS';

In my view, when it comes to thinking of the two logical aggregators in terms of workflow, their commutative and associative properties break down, but that doesn’t affect us here.

Yet another way to write this in SQL is as a subquery in the WHERE clause:

select *
from apply
where major = 'CS' and sid in (select sid from apply where cname = 'Stanford');

Which students have applied to major in ‘CS’ and ‘EE’?

In Prolog:

apply(SID, CName1, 'CS', Decision1), apply(SID, CName2, 'EE', Decision2).

Note that by using the same variable name SID twice, I achieve what SQL calls a self-join. I need to alter the variables used for the college name and decision for the ‘CS’ and ‘EE’ queries, else Prolog will insist on matching those too.

If I only want the student IDs of those who’ve applied for both ‘CS’ and ‘EE’ (as I’ve done in the following SQL examples), I’d use Prolog’s underscore for don’t care variables.

apply(SID, _, 'CS', _), apply(SID, _, 'EE', _).

The above query yields 123 four times since that student applied to both Stanford and Berkely for ‘CS’, and both Stanford and Cornell for ‘EE’. A trick to strip out duplicates and get results in alphanumeric order is to combine findall(+Template, :Goal, -Bag) with sort(+List, -Sorted) like so:

?- findall(SID, (apply(SID, _, 'CS', _), apply(SID, _, 'EE', _)), K), sort(K, L).
K = [123, 123, 123, 123, 345],
L = [123, 345].

Prolog has a builtin setof(+Template, +Goal, -Set) which produces a sorted list, but introduces the caret symbol ^ to mark variables you don’t want in the output instead of using the underscore, and I find the caret minilanguage used by setof and bagof (the equivalent of findall) incomprehensible.

I’ll explain a subtle difference between findall(T, G, K), sort(K,L) and setof(T, G, L) after doing some SQL examples of this query.

In SQL, the most common way of doing this would be with a self-join. I found it a bit weird at first that I have to create separate instances of the apply table to do this query:

select distinct apply1.sid
from apply as apply1, apply as apply2
where apply1.sid = apply2.sid
 and apply1.major = 'CS'
 and apply2.major = 'EE';

As I mentioned in the intro, we can think of and and intersection being the same operation on different scales, making the ∧ symbol a handy mnemonic for ∩.

So an alternative way of writing this in SQL uses its INTERSECT operator:

select sid from apply where major = 'CS'
intersect
select sid from apply where major = 'EE';

If I try to match all the columns in the apply table by using SELECT * the intersection produces nothing since we hit the snag I circumvented in Prolog by using different variable names for the columns I don’t want matched. SQL’s set operators require both tables to have matching columns.

Again, another alternative is using a subquery in the FROM clause:

select distinct apply.sid
from apply, (select sid from apply where major = 'CS') as subset
where apply.major = 'EE' and apply.sid = subset.sid;

Or again as a subquery in WHERE:

select distinct sid
from apply
where major = 'EE' and sid in (select sid from apply where major = 'CS');

A problem with joins is they change the shape of the table by requiring additional columns, in this case for the college name the decision. To keep the original columns of the apply table in SQL I find a bit query I find a bit convoluted:

dbclass=> select * from apply 
where sid in (select sid from apply where major = 'CS'
              intersect
              select sid from apply where major = 'EE')
and major = any(ARRAY['CS','EE']);
 sid |  cname   | major | decision 
-----+----------+-------+----------
 123 | Stanford | CS    | Y
 123 | Stanford | EE    | N
 123 | Berkeley | CS    | Y
 123 | Cornell  | EE    | Y
 345 | Cornell  | CS    | Y
 345 | Cornell  | EE    | N
(6 rows)

The last line and major = any(ARRAY['CS','EE']) could have alternatively been written and major = 'CS' or major = 'EE', so membership of an array is a form of disjunction. For some reason, SQL (or at least PostgreSQL), reverts to Victorian syllogism for predicates involving arrays.

In Prolog, I could write this query using member(?Elem, ?List). Again I’ll use findall to listify the matched rows and maplist to pretty print them

findall([SID, CName, Major, Decision],
  (apply(SID, _, 'CS', _), apply(SID, _, 'EE', _), 
   apply(SID, CName, Major, Decision), member(Major, ['CS', 'EE'])), K),
sort(K, L),
maplist(format("~w ~w ~w ~w~n"), L).

Which students have applied to major in logic?

As pointed out in this youtube video titled “Panel: Since logic languages are so good, why aren’t they pervasive?”, that’s a trick question. Logic has fallen out of school and college syllabuses. I blame Victorian syllogism and Aristotle.

But I digress. Note the subtle difference betwen how findall and setof handle no result.

?- findall(SID, apply(SID, _, logic, _), L).
L = [].

?- setof(SID, apply(SID, _, logic, _), L).
false.

In SQL:

dbclass=> select * from apply where Major = 'logic';
 sid | cname | major | decision 
-----+-------+-------+----------
(0 rows)

A common snafu when querying SQL from another programing language is what’s actually returned when no matches are found is sometimes NULL, and sometimes an empty list.

When testing for implication (ie if P ⊆ Q) to figure out if P and Q are equal by testing that (P ⊆ Q) ∩ (Q ⊆ P) is not the empty set, treating no results as false is handy.