In this article we have another look at the proof system for separation logic that is introduced in the first author’s PhD thesis: New Foundations for Separation Logic [5] (publicly defended on Thursday, May 23rd, 2024).
By separation logic we mean the logic behind the assertion language used in Reynolds’ logic, the program logic for reasoning about the correctness of pointer programs that was introduced in 2002 by J.C. Reynolds [9]. In that article, Reynolds introduces both his program logic and axiomatizes the logic of separation logic by several axioms, but writes:
Finally, we give axiom schemata for the predicate \(\mapsto\).
(Regrettably, these are far from complete.)
In 2021, completeness of quantifierfree separation logic was established [3], and three year later completeness of the full language of separation logic [5].
The purpose of this article is to show the novel proof system of [5] in a straightforward way. The new proof system can be used to prove all valid formulas, which until now were impossible to prove using existing automatic and interactive tools for separation logic. In Section 2 we quickly revisit the formulas of separation logic, in Section 3 we introduce the proof system, and in Section 4 we have a look at a number of example proofs. We then continue the discussion that motivates the design of the proof system: in Section 5 we discuss referential transparency and the binding structure of separation logic, and in Section 6 we discuss issues such as univalence, wellfoundedness, and finiteness.
This article is part one of a series of articles about the new proof system for separation logic. In this article, we focus on the syntax of the proof system. In Section 7, the conclusion, we discuss the topics of the next parts of this series, namely semantics and the soundness and completeness of the proof system.
The syntax of formulas of separation logic is defined as follows: \[\phi,\psi \Coloneqq \bot \mid (x\hookrightarrow y) \mid P(x_1,\ldots,x_n) \mid (\phi\to\psi) \mid (\forall x)\phi \mid (\phi\mathrel{*}\psi) \mid (\phi\mathrel{\mkern4.5mu *}\psi)\] where we assume there is a countably infinite set of variables \(V\) with typical examples \(x,y,z\) (possibly with subscripts), and we have a signature which has a countably infinite set of nonlogical symbols each assigned to a fixed arity of which \(P\) is a typical example with arity \(n\). We have the usual logical symbols: \(\bot\) stands for falsity and \((\phi\to\psi)\) stands for logical implication. From these two symbols we can derive all other propositional connectives, such as negation \(\lnot\phi\), verum \(\top\), logical conjunction \((\phi\land\psi)\), and logical disjunction \((\phi\lor\psi)\). We have universal quantification \((\forall x)\phi\) where the variable \(x\) is bound in the usual way, and we can define existential quantification \((\exists x)\phi\) as the dual \(\lnot(\forall x)\lnot\phi\). By \(FV(\phi)\) we mean the set of free variables in \(\phi\). Quantification is firstorder in the sense that quantification ranges over individuals. Finally, we also have equality \((x = y)\) as a nonlogical symbol, but with a fixed meaning. (Our treatment of parentheses and resolution of ambiguity is standard: we may leave parentheses out as long as the result is not too ambiguous.)
What is different in separation logic compared to classical firstorder logic are the following socalled separation symbols (distinguished from the logical and nonlogical symbols). The primitive formula \((x \hookrightarrow y)\) is called points to (as in ‘\(x\) points to \(y\)’) or a reference (as in ‘\(x\) is a reference to \(y\)’). As complex formulas, two separating connectives are given: \((\phi\mathrel{*}\psi)\) is a separating conjunction, and \((\phi\mathrel{\mkern4.5mu *}\psi)\) is a separating implication. The latter connective is also called the magic wand by some authors. Having ‘points to’ \(\hookrightarrow\) as primitive allows us to define ‘strict points to’ \(\mapsto\) as follows, where we take \((x\mapsto y)\) to abbreviate \[(x \hookrightarrow y)\land (\forall z,w.\,(z \hookrightarrow w) \to x = z).\] The intention is that \((x \hookrightarrow y)\) expresses that location \(x\) has value \(y\), whereas \((x \mapsto y)\) expresses furthermore that \(x\) is the only location allocated. We also have the abbreviations \((x \hookrightarrow )\) and \((x \mapsto )\), where we immediately existentially quantify away the value. These express that \(x\) is allocated (possibly among other locations) and, moreover, that only \(x\) is allocated. By \(\mathbf{emp}\) we mean nothing is allocated, so it abbreviates \(\forall x(\lnot(x\hookrightarrow ))\), i.e. every location \(x\) does not point to some value, or equivalently \(\forall x,y.\,\lnot(x\hookrightarrow y)\).
In this section we introduce a novel proof system for separation logic. In this article we look at the proof system from a purely syntactical point of view. In the next article of this series, we give the standard semantics of separation logic.
The first device we introduce is a special \(\mathbf{let}\) construct, in the following sense: \[\mathbf{let}\ (x \hookrightarrow y) := \psi(x,y)\ \mathbf{in}\ \phi.\] This construct allows us to change the meaning of ‘point to’ in \(\phi\), by assigning it the meaning denoted by \(\psi(x,y)\). Intuitively speaking, to evaluate whether \(\mathbf{let}\ (x \hookrightarrow y) := \psi(x,y)\ \mathbf{in}\ \phi\) holds, we first consider the heap denoted by \(\psi\) (with free variables \(x\) and \(y\)) and then we evaluate whether \(\phi\) holds in the heap described by \(\psi\). We must be careful not having a too naïve interpretation of \(\mathbf{let}\): we cannot just simplify by replacing the occurrences of \((x\hookrightarrow y)\) in \(\phi\) by \(\psi(x,y)\), because separating connectives are referentially opaque (this is explained in more detail in Section 5). The purpose of our proof system is to reason about this \(\mathbf{let}\) construct in a formal way.
Working with \(\mathbf{let}\) takes much space, so instead we use the shorthand notation \(\phi\mathord{@}_{x,y}\psi\). Thus, the objects of our proof system involve not just the formulas of separation logic, but an extended language (called extended separation logic) in which we add this special construct: \[\phi,\psi \Coloneqq \ldots \mid (\phi\mathord{@}_{x,y}\psi)\]
Next, we introduce a proof system with as objects the formulas of extended separation logic. This proof system allows us to deduce formulas: a deduction is also called a proof, and we shall give a number of example proofs. Recall that we have a signature that has a countable infinite supply of nonlogical symbols. For any formula of extended separation logic, its parameters are the predicate symbols of the signature that occur somewhere in the formula. In particular, we shall make use of socalled ‘bookkeeping devices’, which are binary predicate symbols \(R\) (possibly with quotes or subscripts) from the signature. Sometimes we have the sideconditions that our bookkeeping devices are ‘fresh’, in the sense that they do not appear as parameters of formulas in the context.
We present the proof system as a Hilbertstyle axiom system, but nothing prevents us from also giving the proof system in the style of natural deduction. We have the usual proof rules and axioms of classical logic (but instantiated with formulas of extended separation logic), together with the following axioms:
\(\phi\leftrightarrow(\phi\mathord{@}(x\hookrightarrow y))\)(Lookup)
\(((x'\hookrightarrow y')\mathord{@}\psi)\leftrightarrow \psi[x,y := x',y']\)(Replace)
\((\mathbf{false}\mathord{@}\phi)\leftrightarrow\mathbf{false}\)(\(@\bot\))
\(((\phi\to\psi)\mathord{@}\chi)\leftrightarrow(\phi\mathord{@}\chi\to (\psi\mathord{@}\chi))\)(\(@\!\!\to\))
\(((\forall x\phi)\mathord{@}\psi)\leftrightarrow (\forall x)(\phi\mathord{@}\psi)\) if \(x\not\in FV(\psi)\)(\(@\forall\))
\((\phi\mathord{@}(\psi\mathord{@}\chi))\leftrightarrow ((\phi\mathord{@}\psi)\mathord{@}\chi)\)(Assoc)
\((\forall x,y(\psi\leftrightarrow\chi))\to ((\phi\mathord{@}\psi)\leftrightarrow(\phi\mathord{@}\chi))\)(Extent)
\(((\phi\mathrel{*}\psi)\mathord{@}\chi)\to (\chi=R_1\uplus R_2 \to (\phi\mathord{@}R_1)\to (\psi\mathord{@}R_2)\to \xi)\to\xi\)(\(\mathrel{*}\)E)
\(\chi=\chi_1\uplus \chi_2 \to (\phi\mathord{@}\chi_1)\to (\psi\mathord{@}\chi_2)\to((\phi\mathrel{*}\psi)\mathord{@}\chi)\)(\(\mathrel{*}\)I)
\(((\phi\mathrel{\mkern4.5mu *}\psi)\mathord{@}\chi)\to(\chi\perp\chi')\to(\phi\mathord{@}\chi')\to((\psi\mathord{@}(\chi\lor\chi'))\to \xi)\to\xi\)(\(\mathrel{\mkern4.5mu *}\)E)
\((\chi\perp R\to (\phi\mathord{@}R)\to (\psi\mathord{@}(\chi\lor R(x,y))))\to ((\phi\mathrel{\mkern4.5mu *}\psi)\mathord{@}\chi)\)(\(\mathrel{\mkern4.5mu *}\)I)
We have the sidecondition in the rule (\(\mathrel{*}\)E) that the symbols \(R_1\) and \(R_2\) are fresh, i.e. are not parameters of \(\phi,\psi,\chi,\xi\). Similarly, we have the sidecondition in the rule (\(\mathrel{\mkern4.5mu *}\)I) that the symbol \(R\) is fresh, i.e. is not a parameter of \(\phi,\psi,\chi\). We used \(@\) without subscripts instead of \(@_{x,y}\) to reduce notational clutter. To avoid confusion, we may use \(\mathbf{false}\) instead of \(\bot\) and \(\mathbf{true}\) instead of \(\top\).
\(\psi[x,y := x',y']\) is the result of simultaneous substitution of \(x\) by \(x'\) and \(y\) by \(y'\), respectively. The substitution operator \(\phi[x := x']\) is defined compositionally as usual, and has the following specification for the new connective: \[(\phi\mathord{@}_{x,y}\psi)[z := z'] = (\phi[z := z']\mathord{@}_{x,y}\psi[z := z'])\] where \(x,y,z\) are all distinct. If \(z\) is either the same variable as \(x\) or \(y\), then the substitution is not pushed down on the right side. A similar definition can be given for simultaneous substitution of distinct variables.
We let \(\chi=\chi_1\uplus \chi_2\) abbreviate the formula \[(\chi \equiv \chi_1\cup \chi_2) \land (\chi_1\perp \chi_2)\] and let \({\chi\equiv \chi_1\cup \chi_2}\) abbreviate the formula \[\forall x,y(\chi(x,y)\leftrightarrow \chi_1(x,y) \lor \chi_2(x,y))\] and let \({\chi_1 \perp \chi_2}\) abbreviate the formula \[\forall x,y\bigl(\chi_1(x,y) \to \forall z.\lnot \chi_2(x,z)\bigr).\] These abbreviations universally quantify \(x,y\): we let these quantifiers, on purpose, capture the free variables \(x\) and \(y\) of \(\chi,\chi_1,\chi_2\). When \(\chi_1\) and \(\chi_2\) are just the binary predicate symbols \(R_1\) and \(R_2\), we mean the formulas \(R_1(x,y)\) and \(R_2(x,y)\). One can also use set builder notation to make the intention more clear. Note that in the latter abbreviation, \(\chi_1\perp\chi_2\), we require the stronger notion of disjointness of the domains of the relation, not the weaker notion of disjointness of the two sets of pairs representing the pairs that are related by each relations.
Further, a useful result in practical reasoning is that we can replace equivalent subformulas in any formula. Moreover, the deduction theorem also holds for our proof system, hence we can apply the axioms under any context. We furthermore shall use the above proof system in a natural deduction style.
Let us now have a look at a number of example proofs. We shall write \(\vdash\phi\) to mean that \(\phi\) is demonstrable in the proof system given above without any premises, and \(\Gamma\vdash \phi\) to mean that \(\phi\) is demonstrable using the premises in \(\Gamma\).
The first example is given in Figure 1. The statement we want to prove has the following intuitive meaning: in the heap described by \(\bot\) we have that \(\mathbf{emp}\) is satisfied. The argument is the following: the heap described by \(\bot\) is the empty graph (no location is mapped to any value), so evaluating \(\mathbf{emp}\) in that heap indeed yields a true formula. In the proof that follows, we do not explicitly write down how to do classical reasoning, and instead we focus on the application of the new axioms.
The second example is given in Figure 2. We prove that for any (extended) separation logic formulas \(\phi\) and \(\psi\), their separating conjunction is commutative. The proof proceeds in two parts. In step 9, we have shown how to swap the two separated formulas relative to the heap \((x\hookrightarrow y)\). But this heap description has the same extension as the ‘outer’ heap, hence we obtain the nonrelative result in step 10! As such, we can obtain the result simply by putting the given formula under this \(@\)connective. We add formulas to the context by means of opening a box, so at step 6 we have established: \[(\phi\mathrel{*}\psi)\mathord{@}(x\hookrightarrow y),(x\hookrightarrow y) = R_1\uplus R_2,\phi\mathord{@}R_1,\psi\mathord{@}R_2\vdash (\psi\mathrel{*}\phi)\mathord{@}(x\hookrightarrow y).\]
See Figure 3 and Figure 4 for the third and fourth examples. Figure 3 is a generalization of the result in Figure 1. Note that in step 5 of Figure 4 we use the result proven in Figure 3. What should be obvious now is that the proofs are not very difficult: we use our set theoretic intuition for dealing with heaps. Both Figure 2 and Figure 4 show that (\(\mathrel{*}\)E) simply adds fresh parameters \(R_1,R_2\) and the corresponding assumptions to the context. This shows that separating connectives behave almost like a quantifier, if we compare it with the way firstorder quantification works (as in Figure 3).
The reader can now try and write down the proofs for the following formulas:
\(\vdash (\phi\lor\psi)\mathrel{*}\chi \leftrightarrow \phi\mathrel{*}\chi \lor \psi\mathrel{*}\chi\),
\(\vdash (\phi\land\psi)\mathrel{*}\chi \to \phi\mathrel{*}\chi \land \psi\mathrel{*}\chi\),
\(\vdash (\exists x \phi(x))\mathrel{*}\psi \leftrightarrow \exists x (\phi(x)\mathrel{*}\psi)\),
\(\vdash (\forall x \phi(x))\mathrel{*}\psi \to \forall x (\phi(x)\mathrel{*}\psi)\),
\(\vdash \phi\mathrel{*}(\phi\mathrel{\mkern4.5mu *}\psi)\to\psi\).
At last, we have the following nontrivial properties:
\(\vdash (x\hookrightarrow y) \leftrightarrow (x\mapsto y)\mathrel{*}\top\),
\(\vdash \lnot(x\hookrightarrow ) \to (((x\mapsto y) \mathrel{\mkern4.5mu *}(x\mapsto y)\mathrel{*}\phi)\leftrightarrow \phi)\),
\(\vdash ((\exists x(x\hookrightarrow y))\mathrel{*}(\exists x(x\hookrightarrow y))) \leftrightarrow (\exists x((x\hookrightarrow y)\land \exists z(z\neq x\land (z\hookrightarrow y))))\).
The last property is very important. It shows that separation logic can be used to express cardinality properties of the universe. The last property shows the separation logic equivalent of the classical expression of the property ‘there are at least two elements’. When we scale this property, to ‘there are at least \(n\) elements’, one will see that the separation logic formula grows linearly but the classical logic equivalent grows drastically faster: quadratically! This is the essence of the scalability argument motivating the use of separation logic.
Our proof system is able to prove this equivalence. However, existing proof systems for separation logic still lack the ability to prove this elementary fact. We have investigated whether the equivalence of these formulas can be proven in an interactive tool for reasoning about separation logic: the Iris project [6]. In current versions of that system, it is not possible to show the equivalence of these assertions, at least not without adding additional axioms.
The last example is a demonstration of the following equivalence: \[\begin{array}{c} (x \hookrightarrow ) \land ((x = y \land z = w) \lor (x \neq y \land (y\hookrightarrow z)))\\ \leftrightarrow\\ (x \mapsto ) \mathrel{*}((x \mapsto w) \mathrel{\mkern4.5mu *}(y \hookrightarrow z)). \end{array}\] This equivalence is expressed in quantifierfree separation logic, for which a complete axiomatization was already known [3]. We can also give a proof, see Figure 5. Surprisingly, this already exceeds the capability of all the automated separation logic provers in the benchmark competition SLCOMP. In fact, only the CVC4SL tool [8] supports the fragment of separation logic that includes the separating implication. However, from our own experiments with that tool, we have that it produces an incorrect counterexample and reported this as a bug to one of the maintainers of the project. In fact, the latest version, CVC5SL, reports the same input as ‘unknown’, indicating that the tool is incomplete.
So far, we have seen several valid formulas of separation logic, which in the novel proof system for separation logic we are actually able to prove. This alone already shows our proof system goes beyond the ability of existing tools for reasoning about separation logic! The novelty of this proof system lies in the fact of adding a \(\mathbf{let}\) binding construct, which in shorthand is written using the \(@\)connective, that relativates the heap with respect to which a formula is interpreted.
In this section we discuss the binding structure of separation logic, and how the concept of referential transparency applies. Referential transparancy is a general concept in formal languages and as such applies to both logical and programming languages. Although Whitehead and Russell already speak of it, Quine is often credited for introducing the term in his book Word and Object [7] originally published in 1960. In the case of separation logic, we shall see that the separating connectives fail referential transparency!
Separating connectives capture references, the ‘points to’, that occur in subformulas. In the binding structure of firstorder logic, one could resolve unintentional capturing by means of a socalled ‘capture avoiding’ substitution operator that renames quantified variables before actually performing a substitution. However in separation logic, one cannot define such a capture avoiding substitution operator since in separation logic there is only a single heap in scope that can not be renamed.
First, we shall make some general remarks about the binding structure of separation logic formulas. A formula is pure if no separation symbol \(\hookrightarrow,\mathrel{*},\mathrel{\mkern4.5mu *},\mathord{@}\) occurs in it. In that case the meaning of a formula does not depend on the heap, viz. the interpretation of \(\hookrightarrow\). Otherwise, a formula is semipure if only the separation symbol \(\hookrightarrow\) occurs in it. A formula in which one of the separating connectives occur is called a separating formula. We have the usual notions of free variable occurrence and bound variable occurrence, as our notion of variable binding is the same as in firstorder logic. But, in separation logic, we also have another binding structure, namely that of references: the meaning of ‘points to’ is different under the separating connectives.
To see why separation logic fails referential transparency, consider the reference to ‘the value of location \(y\)’ in the proposition ‘the value of location \(y\) has property \(P\)’. To avoid that ‘the value of location \(y\)’ is illdefined, when speaking of the value one implicitly intends there exists a unique value. Moreover, linguistically speaking, a reference is free if we can replace it by any other expression that is equal to it, without affecting the truth value of the proposition after replacement compared to the proposition before replacement. Often this is called the principle of substitutivity^{1}. For example, given that ‘the value of location \(y\)’ equals ‘the value of location \(z\)’, when we replace a reference of the former by the latter in the expression ‘the value of location \(y\) has property \(P\)’ to obtain ‘the value of location \(z\) has property \(P\)’, we obtain an equivalent proposition: so we have that the reference ‘the value of location \(y\)’ occurs free. A context is said to be referentially transparent whenever it preserves the free references: every free reference remains a free reference under the given context. Otherwise, the context is referentially opaque.
In classical logic all propositional connectives are referentially transparent. The only referentially opaque connectives are the quantifiers under specific circumstances. This is easy to see for a given formula \(P(x)\) with a free variable \(x\). Suppose \(x = 5\), then by substitutivity we know that \(P(5)\) is equivalent to \(P(x)\). However, some quantifiers fail referential transparency, since for example in the formula \(\exists x(P(x))\) we can no longer naïvely replace \(x\) with \(5\) when we know \(x = 5\). If the quantified variable is not the same as one of the free variables (either in the subformula or in the expression being substituted), we do maintain referential transparency. To ensure referential transparency there is the convention of keeping bound and free variables separate, analogous to the socalled Barendregt variable convention [4, Sect. 5.2].
In separation logic, however, many contexts involving separating connectives are referentially opaque. For example, in the context of a separating conjunction it is not always the case that we can freely replace references by equivalent expressions. An example is where the value of location \(y\) is equal to the value of location \(z\), and where we also separate the locations \(y\) and \(z\). Formally, we have the equality on the left, and the separation on the right: \[(\forall x_1.\,(y\hookrightarrow x_1) \to \forall x_2.\,(z\hookrightarrow x_2) \to x_1 = x_2) \land ((\exists x.\,(y \hookrightarrow x)) \mathrel{*}(\exists x.\,(z \hookrightarrow x))).\] Although we know that locations \(y\) and \(z\) have the same value, we cannot literally replace \(y\) for \(z\) in the left component of the separating conjunction, without also doing the reverse replacement (replacing \(z\) for \(y\)) in the other component of the separating connective. Thus we no longer have that the reference ‘the value of location \(y\)’ is free when it is nested under a separating conjunction: separating conjunction is referentially opaque!
To understand the binding structure of separation logic, we introduce the notions of direct and indirect binding. A reference (a ‘points to’ construct) or a separating connective is directly bound to the separating connective under which it is nested, without any other separating connective in between. Thus, here by nesting we only have to look at separating connectives in the immediate context, not at the logical connectives. A reference or a separating connective is said to be free whenever it is not directly bound. A reference or a separating connective is indirectly bound to all the separating connectives under which it is nested, but not immediately nested. In a sense, indirect binding is the transitive but irreflexive closure of direct binding.
Another example is the following formula involving magic wand: \[\forall y. (\forall z (z\mkern 2mu\not\mkern2mu\hookrightarrow y)) \to ((x\hookrightarrow y) \mathrel{*}(y \hookrightarrow x) \mathrel{\mkern4.5mu *}(\forall w.(w\hookrightarrow y) \leftrightarrow w = x))\] which expresses the following concept: for every value \(y\) that the heap does not refer to, if we were to extend the heap with a cycle between the locations \(x\) and \(y\), then in the resulting heap the location \(x\) is the only location which has value \(y\). So how does the binding structure of this formula look like? Syntactically, there are four references (‘points to’ constructs) in this formula and two separating connectives. Each of these entities are either bound or free. The leftmost reference is free, and the other three references are bound. These three references are nested under the magic wand, so directly or indirectly bound to that magic wand. The magic wand itself is free. The rightmost reference is directly bound to the magic wand. The other two references are directly bound to the separating conjunction. See Figure 6 for a graphical depiction of the parse tree and the binding structure of references and separating connectives to separating connectives.
There is a difference with the variable binding structure of firstorder logic: if a variable is bound to a quantifier, then it no longer necessarily has a relationship with the free variables of the same name. Quantifiers thus introduce a socalled scope for each variable. This is different for separation logic: although a reference can be directly bound to a separating connective, there still can be a necessary relationship with references that occur outside the connective to which it is bound. For example, in Figure 6 we have that both the free reference and the magic wand speak about the same heap (the ‘outer’ heap), but also the rightmost reference under the magic wands speaks about (part of) that outer heap: namely, for every \(z\neq x\) we also have \(\lnot(z\hookrightarrow y)\) due to the equivalence on the rightside of the magic wand.
The moral is that separation logic has ‘leaky scopes’. But it is also possible to define constructions in separation logic that have proper scopes. For example, the formula \(\blacksquare\phi\) has the intuitive meaning that \(\phi\) holds for all heaps (its formal definition is given in the next section). It thus acts as a universal quantifier for heaps. And we can also define \(\blacklozenge\phi\) as the dual \(\lnot\blacksquare\lnot \phi\), that acts as an existential quantifier for heaps. Just like quantifiers in firstorder logic, we have that \(\blacksquare\phi\) and \(\blacklozenge\phi\) introduce a proper scope of the ‘points to’ construct inside \(\phi\), which is different from the ‘points to’ construct outside.
The formula \(\blacksquare\phi\) is a socalled heap independent formula. A heap independent formula is a formula for which its truth value does not depend on the ‘current’ heap in which it is evaluated. For example, the pure fragment of separation logic, comprising no separation symbols, is heap independent. But also \(\blacksquare\phi\) is heap independent, even when it contains ‘points to’ constructs and separating connectives in \(\phi\). All references and connectives under \(\blacksquare\) are bound and the scope is closed: no ‘leaky scope’ for the black box.
We now introduce the modality \(\blacksquare\phi\) as the abbreviation \[\mathbf{true}\mathrel{*}(\mathbf{emp}\land (\mathbf{true}\mathrel{\mkern4.5mu *}\phi)).\] We also have the dual \(\blacklozenge\phi\) defined as \(\lnot\blacksquare\lnot\phi\). We have that both modalities have the same binding strength as classical negation. The intuitive reading of the modal operators is that \(\blacksquare\phi\) holds in a given ‘current’ heap whenever \(\phi\) holds for all heaps (including the current heap), and \(\blacklozenge\phi\) holds in a given current heap whenever \(\phi\) holds in some heap (which may be different from the current heap). As such, these modal operators change the heap with respect to which a formula is evaluated.
In fact, we have the following valid formulas involving these modalities:
\(\vdash \blacksquare (\phi \to \psi) \to \blacksquare\phi \to \blacksquare\psi\),
\(\vdash \blacksquare \phi \to \phi\),
\(\vdash \blacksquare \phi \to (\phi@\psi)\),
\(\vdash \blacksquare(\phi\to \phi') \to \blacksquare(\psi\to\psi') \to (\phi\mathrel{*}\psi) \to (\phi'\mathrel{*}\psi')\),
\(\vdash \blacksquare(\phi'\to \phi) \to \blacksquare(\psi\to\psi') \to (\phi\mathrel{\mkern4.5mu *}\psi) \to (\phi'\mathrel{\mkern4.5mu *}\psi')\).
These formulas can all be proven in our novel proof system (their proofs are great exercises for the reader). We also have that the rule of necessitation is admissible, in the sense that \(\vdash \phi\) implies \(\vdash \blacksquare \phi\), but whether this rule is also effectively derivable is not known to us.
We now consider an example of using the black box modality \(\blacksquare\). In our treatment of separation logic, we do not necessarily impose socalled ‘functionality’ or ‘univalence’ of the heap. This means that it is possible that \[(x\hookrightarrow y) \land (x\hookrightarrow z) \land y\neq z\] is true in some situation. We thus treat \(\hookrightarrow\) as a relation symbol. One intuitive way to interpret the ‘points to’ relation would be from objectoriented programming, where the object \(x\) has some reference to the object \(y\) by one of its fields, but we abstract away through which field object \(x\) references object \(y\). It is not difficult to obtain univalence by restricting ourselves to those situations where there is at most one value, by means of the property: \[\forall x,y,z.\,(x\hookrightarrow y) \land (x\hookrightarrow z) \to y = z.\] That all heaps are univalent can be simply expressed by: \[\blacksquare(\forall x,y,z.\,(x\hookrightarrow y)\land (x\hookrightarrow z)\to y = z).\]
We also have the following modality \(\Box\phi\), introduced as the abbreviation \[\lnot(\top\mathrel{*}\lnot\phi).\] Also the dual \(\Diamond\phi\) is defined as \(\lnot\Box\lnot\phi\). The intuitive reading of these modal operators is different, in the sense that \(\Box\phi\) holds in a given heap whenever \(\phi\) holds for all subheaps of the given heap. Similarly, \(\Diamond\phi\) holds in a given heap whenever \(\phi\) holds for some subheap of the given heap.
An example of the \(\Box\) modality is the following. We say that a value \(x\) is reachable if there is a location \(y\) which refers to it, so \(\exists y.(y\hookrightarrow x)\). Conversely, a location \(y\) is allocated whenever it refers to a value, so \(\exists x.(y\hookrightarrow x)\). Consider that allocated locations can also be used as values, so we can have an allocated and reachable location. This way, we can form chains of socalled traversals: \[x_0\hookrightarrow x_1\hookrightarrow x_2\hookrightarrow \ldots \hookrightarrow x_n\] which abbreviates the conjunction of \(x_i\hookrightarrow x_{i+1}\) for \(0\leq i < n\). Whenever \(x_n\) is not allocated, the traversal has reached a deadend. However, whenever in a traversal the first and last location are the same, we have a cycle: it is then possible to keep on traversing the heap indefinitely.
We say that a heap is wellfounded whenever for every nonempty subheap there is some allocated but unreachable location. This is expressed formally as: \[\Box(\lnot\mathbf{emp}\to \exists x.(x\hookrightarrow )\land \forall y.(y\mkern 2mu\not\mkern2mu\hookrightarrow x)).\] The claim is now that there are no cycles in a wellfounded heap. To see why, suppose towards contradiction we have a wellfounded heap (in which the above formula is true) in which there exists a cycle \[x_0\hookrightarrow x_1\hookrightarrow x_2\hookrightarrow \ldots \hookrightarrow x_n\hookrightarrow x_0.\] Then take the subheap which consists precisely of the locations \(\{x_0,\ldots,x_n\}\), that is, we ignore all the locations not visited as part of the cycle. This subheap is nonempty. But we can not take any \(x_i\) as witness, since every location is reachable! This is a contradiction.
When speaking of modal operators, it is useful to speak of the ‘current’ heap (with respect to which any formula in separation logic is evaluated), the ‘outer’ heap (which is the heap with respect to which an enclosing formula is evaluated) and the ‘inner’ heap (which is the current heap while evaluating a subformula). This terminology is also useful when speaking about the separating connective \((\phi\mathrel{*}\psi)\), where we would speak of the ‘outer’ heap with respect to which the entire formula is evaluated, and two ‘inner’ heaps corresponding to the evaluation of \(\phi\) and \(\psi\).
The point of the discussion above is that we can now understand more clearly what happens with the \(@\)connective. Suppose now that \(\psi\) is pure, so it does not have any (free) references. Then we have that \((\phi\mathord{@}\psi)\) and the formula \[\blacksquare((\forall x,y.\, (x\hookrightarrow y)\leftrightarrow \psi(x,y))\to \phi)\] are equivalent. (We discuss this and related formulas in more detail below.) Clearly, this is a heap independent formula, due to the black box! However, when \(\psi\) is not pure, the formula \((\phi\mathord{@}\psi)\) is not heap independent. In the \(@\)connective, the crux is that the ‘points to’ symbol in \(\psi\) is relevant and its meaning depends on the ‘outer’ heap, whereas the ‘points to’ symbol in \(\phi\) is intentionally captured by the \(@\)connective where its denotation is described by \(\psi\). The \(@\)connective thus changes what is the ‘current’ heap when evaluating \(\phi\). This is similar to what the modal operator \(\Box\phi\) does, in which also we have an ‘inner’ and ‘outer’ heap, but where the former is a subheap of the latter heap. In the \(@\)connective the ‘inner’ heap is described by \(\psi\), which may depend on the ‘outer’ heap when it is not a heap independent formula.
Existence of the empty heap, where nothing is allocated, is expressed by: \[\blacklozenge(\forall x,y.\, (x\mkern 2mu\not\mkern2mu\hookrightarrow y)).\] But what about the opposite, the existence of a heap in which every location is allocated? Could the formula \[\blacklozenge(\forall x\exists y.\, (x\hookrightarrow y))\] be true? Or what about the existence of a heap in which every value is reachable? Could the formula \[\blacklozenge(\forall y\exists x.\, (x\hookrightarrow y))\] be true? No, in the standard interpretation of separation logic, based on the integers, these formulas are false because heaps are finitelybased partial functions!
Suppose we work with the standard integers \(\mathbb{Z}\), and we have in our signature the usual arithmetical symbols. If we want to ensure we only deal with finite, univalent heaps, then we should take the following formulas as axioms: \[\begin{array}{c} \blacksquare(\forall x,y,z.\,(x\hookrightarrow y)\land (x\hookrightarrow z)\to y = z)\\ \blacksquare(\exists x_0,x_1.\,\forall x,y.\, (x\hookrightarrow y)\to x_0\leq x \leq x_1) \end{array}\] The first axiom expresses univalence. The second axiom expresses boundedness, that is, for every heap there is a bound on the domain, that is, there is a maximum and minimum location. Every finitelybased partial function satisfies these property (a finitelybased partial function can be seen as a finite list of locationvalue associations, and the maximum and minimum can be computed). Conversely, every heap that satisfies both axioms can be represented by a finitelybased partial function: there are only finitely many locations between the minimum and maximum location (due to boundedness) that can be assigned at most one value (due to univalence).
Note that in the standard interpretation of separation logic on the integers, we never treat the heap as a total map, where every location must have a value. It thus always remains a possibility for a location to be unallocated, i.e. the location \(x\) is unallocated in a situation whenever \[\forall z(x\mkern 2mu\not\mkern2mu\hookrightarrow z)\] holds—which expresses that there is no value to which \(x\) points. In nonstandard interpretations of separation logic, we do have the possibility of an infinite heap.
The proof system we introduce makes use of a new \(@\)connective which allows to interpret the pointsto relation in terms of a logical description. It bears some relation with hybrid logic [1] which features socalled nominals and satisfaction operators. Temporally, the nominals describe when is ‘now’, and the satisfaction operator allows to evaluate a formula with respect to a given nominal, thereby changing when is ‘now’. As such, hybrid logic allows to express more than modal logic: an example is “At 6 o’clock, the church bells ring six times.” This sentence is more timespecific than the usual modal operators for expressing ‘always’ or ‘sometimes’. Comparing with the \(@\)connective, we see that \(@\) is even more general notion than what a satisfaction operator provides, since we introduce it as a connective between formulas. This means that formulas can now also take the place of the nominals in hybrid logic, and this allow us to describe a situation, that is, the ‘current’ heap, by means of a formula.
An important result is that our new proof system allows us to show many more equivalences than existing proof systems for separation logic. Thus we go beyond the capability of many existing tools for (automatic or interactive) reasoning about separation logic! It is quite surprising that none of the existing tools can verify some of our particular equivalences. We think this is due to the abstract description of the separating connectives in terms of cancellative partially commutative monoids (cf. separation algebras [2]). How to combine this abstract description with a settheoretical interpretation of the pointsto relation is problematic. This seems to suggest we should start developing new kinds of tools for automatic or interactive reasoning about separation logic, or adjust the existing tools to be able to work around current limitations.
The presented proof system is sound and complete. This will be elaborated upon in following blog posts. In part two we study standard and nonstandard interpretations of separation logic, and give the main argument of relative completeness of the novel proof system. Relative completeness is a completeness argument relative to an oracle. This approach is necessary since absolute completeness for standard separation logic is not possible due to failure of compactness. Other topics that we will discuss in this series of articles concern the impact on Reynolds’ program logic [9], expressivity of separation logic and separation logic as an intermediate logic between firstorder logic and secondorder logic, and intuitionistic separation logic.
You can cite this article using BibTeX:
@misc{drheap2024sl1,
title={A sound and complete proof system for separation logic (part 1)},
author={Hiep, {HansDieter} A. and de Boer, Frank S.},
howpublished={{\bf dr.\,heap}},
month={June},
year={2024},
DOI={10.59350/2gkd1c0k49}
}
Or by including the following attribution:
HansDieter A. Hiep and Frank S. de Boer. A sound and complete proof system for separation logic (part 1). dr. heap, June 2024. DOI: 10.59350/2gkd1c0k49.
What is called a ‘free reference’ here comes from Quine’s ‘purely referential position’. But, we already use the word ‘pure’ in a different sense, namely that any formula that involves the ‘points to’ construct itself is not pure. Hence we use instead the term ‘free reference’.↩︎
How can you explain important concepts from program correctness in a simple and intuitive manner? In this blog post, we shall have a look at some puzzles and analyze them from the perspective of program correctness. This way we can nicely explain and demonstrate the usefulness of two important concepts, namely invariants and inconsistent specifications.
The puzzles we study here come from the book Algorithmic Puzzles [1] by Anany and Maria Levitin, published by Oxford University Press in 2011. This book presents 150 puzzles that are good candidates for applying analytical and logical thinking skills (the puzzles can also be used as challenging interview questions). We make a small selection of the puzzles, and we will see them answered from the perspective of program correctness. In program correctness, we consider a program to be correct with respect to a given program specification. A program specification is a specific formulation of a requirement. For example, a specification of what the output of a program must be given some input. More specifically (no pun intended), we can rephrase the puzzles in such way that a puzzle can seen as a program specification, and proving that there exists a program that is correct with respect to that specification would then solve the puzzle in question. Or, alternatively, we show that there is no solution to the puzzle, by arguing there cannot be a correct program in the first place.
First we shortly revisit preliminaries (Section 2). This article does assume the reader is already somewhat familiar with the basics of programming and program correctness, but we nevertheless quickly revisit the basic concepts. For a thorough introduction to program correctness, one could take a look at one of the following books (in order of appearance):
Then we shed light on the concept of an invariant by discussing the 5th puzzle of the book, ‘Row and Column Exchanges’ (Section 3). We also look at why declarative specifications are useful by discussing the 12th puzzle of the book, ‘Questionable Tiling’ (Section 4). But we also discuss more generally the importance of invariants and formulating consistent specifications (Section 5).
We shall restrict our attention to a simple imperative programming language: \[\begin{gathered} S \Coloneqq x \coloneqq a \mid S_1;S_2 \mid \mathbf{if}\ b\ \mathbf{then}\ S_1\ \mathbf{else}\ S_2\ \mathbf{fi} \mid \mathbf{while}\ b\ \mathbf{do}\ S\ \mathbf{od}\end{gathered}\] where we use not only \(x\) as variable but also \(y,z,\ldots\) (possibly with subscripts), where the terms \(a\) of the language are the usual arithmetical expressions: \[\begin{gathered} a \Coloneqq 0 \mid 1 \mid x \mid a \mid (a_1 + a_2) \mid (a_1 \times a_2)\end{gathered}\] and where the terms \(b\) of the language are the Boolean expressions: \[\begin{gathered} b \Coloneqq (a_1 = a_2) \mid (a_1 < a_2) \mid (b_1 \land b_2) \mid (b_1 \lor b_2) \mid \lnot b\end{gathered}\] We also have the usual abbreviations, such as \((a_1 \leq a_2)\), that abbreviate more complex expressions, such as \((a_1 < a_2) \lor (a_1 = a_2)\), respectively. The numerals \(2,3,4,\ldots\) are also abbreviations of complex expressions \((1+1), (1+2), (1+3),\ldots\)
We also have firstorder formulas, captured by the following syntax: \[\begin{gathered} \phi,\psi \Coloneqq b \mid (\phi \to \psi) \mid (\forall x)\phi\end{gathered}\] Other logical connectives, such as \((\phi \land \psi)\) and \(\lnot\phi\), can seen as abbreviations. Firstorder logic involves firstorder universal quantification \((\forall x)\phi\), and we have the dual of firstorder existential quantification \((\exists x)\phi\) as abbreviation of \(\lnot(\forall x)\lnot\phi\). Quantification only ranges over individuals, so in our case integers.
Now let us consider semantics. Let \(\sigma\) be a state (an assignment of variables to integer values). We have the usual semantics for arithmetical expressions \(a\) and Boolean expressions \(b\): \([\![a ]\!]_\sigma\) denotes an integer value and \([\![b]\!]_\sigma\) denotes a Boolean value. Note that an expression depends only on finitely many variables, and we only deal with pure expressions in our simple language. Each statement \(S\) of our programming language denotes a transition relation of states: \[[\![S]\!]\subseteq \Sigma\times\Sigma\] where \(\Sigma\) is the set of states (with typical element \(\sigma\)), and \(\Sigma\times\Sigma\) is the set of pairs of states. A statement denotes a binary relation between initial and final states. Each formula \(\phi\) denotes a set of states: \[[\![\phi]\!]\subseteq \Sigma\] in the sense that in each state \(\sigma\in[\![\phi]\!]\) the formula \(\phi\) is true, also written \(\sigma\models\phi\).
In program correctness we combine two languages: a programming language and a specification language. The programming language is already given above. As specification language we take the above firstorder language. Note that the variables of formulas in the assertion language are the same variables we use in the programming language. We can now form the Hoare triples: \[\{\phi\}\ S\ \{\psi\}\] where \(\phi\) is called the precondition and \(\psi\) is called the postcondition. A Hoare triple is correct when the statement \(S\) satisfies the input/output specification given by the precondition \(\phi\) and the postcondition \(\psi\), and a Hoare triple is incorrect otherwise. Note that the (global) variables of \(S\) and the (free) variables of the formulas \(\phi\) and \(\psi\) are bound to each other. Formally, we define \[\models \{\phi\}\ S\ \{\psi\} \text{ if and only if } [\![S]\!]([\![\phi]\!]) \subseteq [\![\psi]\!]\] where \(R(X)\) is the leftrestriction of the binary relation \(R\) by the set \(X\), that is, \(R(X) = \{y \mid xRy\text{ for some }x\in X\}\). Unpacking this formal definition gives us \[\models \{\phi\}\ S\ \{\psi\} \text{ if and only if } \sigma\in[\![\phi]\!]\text{ and } (\sigma,\tau)\in[\![S]\!]\text{ implies } \tau\in[\![\psi]\!].\]
Incorrectness means that \(S\) has a bug. Suppose we start in some initial state \(\sigma\) which satisfies the precondition \(\phi\), and we execute \(S\) from that state, and that execution results in some final state \(\tau\). If the final state \(\tau\) does not satisfy \(\psi\), then we have found a bug! Formally, \[\not\models \{\phi\}\ S\ \{\psi\} \text{ if and only if } \sigma\models\phi \text{ and }(\sigma,\tau)\in[\![S]\!]\text{ and }\tau\not\models\psi\text{ for some }\sigma,\tau.\]
Hoare logic is a formal system in which Hoare triples can be derived, in which case one writes \(\vdash \{\phi\}\ S\ \{\psi\}\). Hoare logic is sound and (relatively) complete, meaning that we have \[\vdash \{\phi\}\ S\ \{\psi\}\text{ if and only if }\models \{\phi\}\ S\ \{\psi\}\] under some reasonable assumptions.^{1} See one of the books mentioned in in the introduction for a presentation of Hoare logic, or Wikipedia.
A quick example is the following Hoare triple. Is it correct or not? \[\{y = 0\}\ x \coloneqq 1; \mathbf{while}\ x \leq z\ \mathbf{do}\ y \coloneqq y + x; x \coloneqq x + 1\ \mathbf{od}\ \{2\times y = z\times(z+1)\}\] To verify the loop, we need to come up with a socalled loop invariant: a condition that holds at four control points (1) before entering the loop, (2) before the loop body begins, (3) after the loop body ends, and (4) after the loop is exited. Finding loop invariants is difficult, and often it takes multiple tries until one finds a suitable invariant. In the above example, one can take: \[1 \leq x \leq z + 1 \land 2\times y = (x  1)\times x\] where \((x  1)\) abbreviates \((x + 1)\) and the chain of inequalities is conjunctive.
In this section we discuss a puzzle in which invariants play a prominent role. The 5th puzzle of the book Algorithmic Puzzles is ‘Row and Column Exchanges’:
Can you transform the left table into the right table of Figure 1 by exchanging its rows and columns?
(It is recommended that the reader first tries out solving this puzzle herself!)
To get a sense of what the puzzle asks for, let us perform the operations of swapping rows and columns in a table. An example of a sequence of successive applications of these operations is shown in Figure 2.
This figure shows:
The first table shows the initial table of Figure 1, our starting point in this puzzle.
After the first step, we have exchanged the first and last row. So we swapped the values \(1,2,3,4\) and \(13,14,15,16\).
After the second step, we have also exchanged the first and last column. So we swapped the values \(13,5,9,1\) and \(16,8,12,4\).
Notice that we have now obtained a table, in which the last row coincides with the values of the final table we wish to obtain (the last row is ‘correct’ with respect to the desired final table). To get closer to the final table, we can continue the series of operations as in Figure 3, where we perform two additional steps: we swap the first and third row, and we swap the second and third row.
In the resulting table, we have colored the cells that have values in the right place when comparing it to the final table in Figure 1. This particular example shows that we are not there yet. Click here for a Rust implementation of this example.
Just giving this single example, where we have not solved the puzzle (since the final table is not ‘correct’), is not a solution the puzzle! But one may wonder, whether there exists a solution at all. If there exists a solution, then we have not yet found it. But, if there is no solution to be found, then just showing this single counterexample is not sufficient proof.
Imagine that these tables are representations of state, where the state is an assignment of integers to variables (each cell in the table is modeled by its own variable, sixteen in total). There are two primitive operations that work on this state:
to swap two columns \(C(j,j')\), and
to swap two rows \(R(i,i')\).
The puzzle can be rephrased by asking whether we can come up with a program that is composed out of these two primitive operations. Instead of our simple programming language given above, where the only primitive operation is the assignment \(x := a\), we instead consider the programming language with only these two primitive operations. In this way we realize encapsulation, in the sense that the program may not directly modify the state by means of an assignment, only indirectly through the exposed operations.
This may remind the reader of objectoriented programming. Each table could be seen as an instance of a class of objects, which has an encapsulated internal state. The class of objects exposes a number of operations, viz. it has a welldefined interface. We ask ourselves now: does there exists a client, which can only work with the interface and not directly modify the internal state, that solves our puzzle?
What does it mean to solve the puzzle? We can formulate the Hoare triple \[\{\!\!\!\bigwedge_{i\in[1,4]}\bigwedge_{j\in[1,4]}\!\!\!x_{i,j} = (i  1) \times 4 + j\}\ S\ \{x_{1,1} = 12 \land \ldots \land x_{4,4} = 1\}\] where \(x_{1,1}\) until \(x_{4,4}\) are the sixteen variables corresponding to the cells of the table.^{2} Note that in the postcondition we simply require the variables to have the proper values, as indicated in Figure 1. If we can find a program \(S\) that is composed of only these primitive operations, and prove it correct, we have solved the puzzle!
To understand the meaning of the primitive operations, we give a set of Hoare triples that we take as axioms (technically, we give an axiom scheme). This approach is also known as the ‘axiomatic approach’, where we abstract from the exact semantics of the primitive operations. Here we go (assuming metavariables \(j\in[1,4]\) and \(j'\in[1,4]\)): \[\{\!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j} = y_r \land \!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j'} = z_r\}\ C(j,j')\ \{\!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j} = z_r \land \!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j'} = y_r\}\] The ‘freeze variables’ \(y_1,\ldots,y_4\) capture the old values at column \(j\), and \(z_1,\ldots,z_4\) capture old values at column \(j'\). In the postcondition, we use the (unchanged) freeze variables to refer to the old values at the beginning of the swapping operation. This argument crucially relies on the fact that the operation \(C(j,j')\) only changes the variables in the set \(\{x_{i,j},x_{i,j'}\mid i\in[1,4]\}\). By Hoare’s invariance rule, we know that any property about the other variables thus remains invariant. A similar axiom scheme can be given for swapping the rows.
We could think of an object invariant: a property that holds of the internal state of the object, that must be preserved by every operation that is performed by any client. Note that object invariants may be temporarily broken in the implementation of an operation, as long as the object invariant is restored before the implementation terminates.
The beauty of invariants is that they are a powerful tool for answering these kinds of puzzle questions. When we are able to find some invariant, that is true for the initial table but false for the final table, then we must know: the final table cannot be obtained by means of these operations only, since all the operations preserve the object invariant!
An example of an object invariant in this case would be the property: the table has the values \(\{1,\ldots,16\}\). In other words, every value in the table is in \(\{1,\ldots,16\}\) and every value of \(\{1,\ldots,16\}\) is somewhere in the table. Let’s formalize it (Equation 1): \[\{x_{i,j} \mid 1\leq i,j \leq 4\} = \{1,\ldots,16\}.\] The set comprehension on the left collects all values in the table in a set. The set expression on the right is the finite set consisting of the integers \(1\) up to and including \(16\). The property now expresses that these two sets are identical, i.e. have precisely the same members. This property holds for the initial state of the object, and it also is preserved by every operation: swapping two rows, or swapping two columns, does not introduce any new values and thus does not invalidate this property. Hence, this property is an object invariant.
The final table of Figure 1 also satisfies the object invariant of Equation 1. So this invariant, while nice to know, is not useful in answering the puzzle question. We can only prove that there is no solution to the puzzle when we find an invariant, that holds of the initial state and is preserved by the operations, but does not hold in the final state.
Just how finding loop invariants (to show the correctness of a program) is a difficult problem, finding object invariants (to show there can be no correct program) is also a difficult problem. Finding invariants may require several tries. Let us try another invariant. Consider that we not only have a set of values, but in fact we have a set of sets of values: \[\bigl\{\{x_{i,j} \mid 1\leq i\leq 4\}\mid j \leq 4\bigr\} = \bigl\{\{1,2,3,4\},\ldots,\{13,14,15,16\}\bigr\}\] The outer set consists of the sets corresponding to the values one finds at each row. And the inner sets consists of the values present at each row. If we swap two rows, the invariant is preserved because the outer set does not care about the order of its values (sets of integers). If we swap two columns, then the invariant is preserved, because the set of values at each row remain the same when we have swapped two columns.
Now, looking at Figure 1 we see that the initial table satisfies this property. However, if we look at the final table we see that it does not satisfy this property. The final table has as set of sets of integers: \[\bigl\{\{12,10,11,9\},\{16,14,15,13\},\{8,6,7,5\},\{4,3,2,1\}\bigr\}.\] Sure, the first and last row are correct, so we could focus on comparing the sets \[\bigl\{\{16,14,5,13\},\{8,6,7,15\}\bigr\} \text{ and } \bigl\{\{16,14,15,13\},\{8,6,7,5\}\bigr\}\] which cannot be equal because both sets contain values that are not contained in the other set. Hence the final table does not satisfy the invariant, which finally proves that there is no solution! (We shall further discuss this problem in Section 5.)
We have a look at the 12th puzzle of the Algorithmic Puzzles book, ‘Questionable Tiling’ (with a slightly different phrasing):
Is it possible to tile an 8by8 board with dominoes (2by1 tiles) such that no two dominoes lie next to each other in parallel?
(Again, the reader should first try to solve this puzzle herself!)
Before even beginning to solve the problem, we should first try to get an exact understanding of the puzzle by understanding each part of the question:
Suppose we formalize the 8by8 board, again by means of a table. Each cell of the table is again understood to be represented by the variables \(x_{i,j}\) where \(i\) is the row counted from the top and \(j\) is the column counted from the left. But what do the values of these variables mean? We could devise the following encoding:
If a variable \(x_{i,j}\) has value \(0\) it means that the cell is empty.
If a variable \(x_{i,j}\) has some positive value, then that positive value identifies a domino piece.
For example, see Figure 4 where we have a table that encodes an 8by8 board with only four dominoes. Note that in this and the following pictures, only the numbers in the cells are significant and not the colors. Colors are only for visual aid. Further, what is shown in Figure 4 is not a tiling yet, it is a partial tiling and towards becoming a complete tiling.
One fruitful approach would be looking for patterns. A pattern is, figuratively speaking, a small ‘frame’ or ‘scope’ that you locally could observe in the picture. These patterns are ‘timeless’ and observed of the outcome, and thus do not care about the intermediary state one has passed through to obtain the outcome. Finding patterns is a useful ability of a declarative programmer.
One can observe already the following properties:
(Number of dominoes in tiling)
In an 8by8 table, a complete tiling has exactly \(\frac{8\times 8}{2}=32\) numbers
identifying domino pieces.
(Size of single domino)
Every number identifying a domino piece occurs at most twice.
(Dominoes line up)
Given a cell of a table that contains a number identifying a domino. Now
consider its immediate neighborhood (the cells on the top, right,
bottom, left—but not the diagonal cells). We observe that the following
must hold: a cell above, on the left, below, or on the right of the
given cell exists and has the same domino identifying number. The other
neighboring cells must have a different value. See also Figure 5 for a picture, but note that these
patterns only work for interior cells. For cells on the border, the
pattern need not check outside bounds.
Now, consider completing the tiling in Figure 4. What domino do we place on the left of the domino identified by number three (the blue one)? It will form a 2by2 square. We also form a 2by2 square if we would place another domino directly below and in parallel with the domino identified by number four (the purple one). These are undesired according to the puzzle.
We end up with the following property:
(No parallel dominoes)
In each 2by2 square there are not exactly two dominoes. See Figure 6 for the two forbidden
patterns.
We can now formalize the properties, and obtain a program specification.
(Number of dominoes in tiling) \[D = 8\times 8\] where \(D\) is the set of domino identifying numbers that occur somewhere in the table, that is, \(D = \{x_{i,j} \mid 1 \leq i,j\leq 8\}\cap\{n\mid n > 0\}\).
(Size of single domino) \[\{(i,j) \mid k = x_{i,j}\text{ and }1\leq i,j\leq 8\} = 2\text{ for each }k\in D.\]
(Dominoes line up)
For every \(1\leq i,j\leq 8\) there is
some \(\ell\in K(i,j)\) such that \[x_{i,j} = x_\ell \land\mkern34mu\bigwedge_{k\in
K(i,j)\setminus\{\ell\}}\mkern26mu x_{i,j}\neq x_{k}\] where
\(K(i,j)\) is the set of neighboring
coordinates within bounds \[\{(i+1,j),(i,j+1),(i1,j),(i,j1)\} \cap
\{(i',j')\mid 1\leq i',j'\leq 8\}.\]
Technically, we have that \(x_{(i,j)}\) is defined to be equal to \(x_{i,j}\) so we can use the coordinates to refer to a particular subscripted variable.
(No parallel dominoes)
For every \(1 \leq i,j < 8\) we have
\[\{x_{(i,j)},x_{(i+1,j)},x_{(i,j+1)},x_{(i+1,j+1)}\}\cap\{n\mid
n > 0\} \neq 2.\]
These properties can be abbreviated to \(P1,P2,P3,P4\), respectively. Now the puzzle amounts to finding a program \(S\) that changes the variables \(x_{1,1},\ldots,x_{8,8}\) such that we can prove \[\{\mathbf{true}\}\ S\ \{P1\land P2\land P3\land P4\}.\] Consider a program that assigns the cells’ values according to Figure 7. We can now verify whether the program indeed satisfies the specification, by checking whether all properties hold.
\(P1\) holds because there are exactly 32 dominoes in the final state assigned to the variables. \(P2\) holds since every number identifying a domino piece occurs exactly twice. Also \(P3\) holds, and this can easily be seen by the different colors. However, checking \(P4\) shows that the property is violated (see the center).
If we slightly generalize the problem, then we see there is a solution. Suppose the board is infinite, i.e. we have an \(\infty\)by\(\infty\) board, which we start tiling from the topleft corner. Then the following brick laying pattern can be continued indefinitely (see Figure 8):
Start horizontally with the dominoes \(A_1,A_2,\ldots\) and lay the next on the right of the previous one until the entire first row is covered with dominoes.
Continue vertically with the dominoes \(B_1,B_2,\ldots\) and lay the next below the previous one until the entire first column is also covered with dominoes.
We are now in the same situation as before: we want to fill an \(\infty\)by\(\infty\) board, so we repeat the strategy of first laying \(C_1,C_2,\ldots\) horizontally and then laying \(D_1,D_2,\ldots\) vertically.
Such an infinite board would then satisfy these properties:
The number of dominoes on the infinite board are also infinite.
If we make sure that each domino is represented by a different number, then each such number occurs only twice. For example, we could take the numbering scheme where for each domino that lies on the coordinates \((i,j)\) and \((i',j')\) we take as identifier \(\min(2^i3^j,2^{i'}3^{j'})\).
The dominoes are placed correctly, as can be observed from the coloring.
There are no parallel dominoes, since each 2by2 square has exactly three dominoes.
Note that we avoided the occurrence of four dominoes within a 2by2 square, as shown in Figure 9.
More generally, we never have any of the patterns in Figure 10 occurring. These patterns are called the topright corner and bottomleft corner. Note that these patterns do occur in Figure 9, so already from knowing the absence of these two corners we also know that there can be no four different dominoes within a 2by2 square.
Now suppose we would cut off the board of Figure 8 so to obtain an 8by8 board. We then see problems occurring at the boundaries, with dominoes sticking out. Here are two instances:
On the first column, we see that the domino \(B_4\) falls out of bounds. Hence the only way to lay down that domino is by turning it 90 degrees.
On the second row, we see that the domino \(C_4\) also falls out of bounds. Also here we would need to lay down that domino turned by 90 degrees.
What we thus see, is that whenever the board is finite, it must have one of the corners of Figure 10. We shall now argue that it is impossible to satisfy \(P4\), the property that no dominoes are parallel, whenever we have the (necessary) topleft corner and also either the bottomleft or the topright corner on the board. We make a number of simplifying assumptions, but these do not hurt our demonstration (that is to say, these assumptions are without loss of generality):
we assume we work on an arbitrary \(n\)by\(n\) board where \(n\) is even,
we assume we start with the same type of topleft corner and topright corner where the horizontal domino is on top,
we assume that both corners occur on the same height.
Now consider the situation of Figure 11. Consider that, if we were to satisfy all properties \(P1\) until \(P4\), it is impossible to place a domino vertically next to \(B_1\) nor is it possible to place a domino vertically next to \(C_n\). If we were to place a domino horizontally at the low end at \(B_1\) (thus forming a bottomleft corner), then we need to place another domino on top that violates \(P4\). Hence the only dominoes that are possible are depicted in Figure 12. We end up with the other type of corner (where the vertical domino is on the side) and we can again analyze where to place the next domino in the corner next to \(B_1\) and below \(C_1\), and next to \(C_n\) and below \(C_{n1}\). After analyzing the possibilities and ruling out those that violate \(P4\) we end up with the situation depicted in Figure 13.
After continuing this way, we see that we construct two ‘lines’, one originating from each corner. It is necessarily the case that these two lines will intersect!
In Figure 14 we see the two lines coming diagonally out of the topleft corner and the topright corner intersect. The way this plays out is as follows: we start with the corner consisting of dominoes \(\{1,2\}\) (the topleft corner) and dominoes \(\{A,B\}\) (the topright corner). Then we necessarily place domino \(3\), but this takes the same place as we would take when we would place a domino in the other corner. We now have two corners, but they share a domino, namely the dominoes \(\{2,3\}\) (the topleft corner) and the dominoes \(\{3,B\}\) (the topright corner). We then place \(4\) and \(D\) in the only way possible inside these corners, but we see that this gives us a parallel pair of dominoes in a 2by2 square.
Summarizing, the argument goes as follows. If there are two corners on the board that induce a ‘diagonal line’ that intersect, this must give rise to a pair of parallel dominoes. Hence we can not have both a topleft corner and a topright corner on the board. However, for every \(n\)by\(n\) board tiling it is necessary to have both a topleft corner and a topright corner. Hence we cannot have a tiling of the \(8\)by\(8\) that also has no parallel dominoes.
We have now seen two example puzzles, which we phrased by means of asking whether we can come up with a program that satisfies certain requirements. In the first example (Section 3) we have seen that the program’s requirements can (1) be stated formally, and (2) a final state was imaginable that satisfies the end goal, but (3) there was no correct program that reaches the final state. In the second example (Section 4) we have seen that the requirements themselves can (1) be stated formally, but already that (2) a final state was not imaginable that satisfies the end goal. If there is no final state that satisfies the requirements, it is impossible to write a correct program. This must be a valid conclusion, since each program only moves from state to state, and there does not even exist a state that satisfied the requirements.
All this serves to show is that program correctness is a difficult subject. It shows that sometimes it is ‘easy to ask’ but ‘difficult to deliver’. Extensive analysis of a problem is required to obtain (1) a formal description of the problem, and (2) proof that the requirements are consistent. Even before one starts writing a program, one already has to face an undecidable problem: namely, to check that the requirements are consistent! And we have seen a concrete example that this is not always the case—even when the problem looks simple. If we then have requirements that are satisfiable, we then face the second difficult problem: does there exists a correct program? We have seen that, no, this is not obvious either. To show that there does not exists a correct program, we need to formulate an invariant that any program preserves but which the final state violates. On the other hand, whenever there exists a nontrivial program (i.e. involving a loop) we also face a difficult problem: to prove it correct requires us to come up with an invariant as well.
This finally gives us two slogans:
To show a program is correct, requires one to find an invariant.
To show there is no correct program, also requires one to find an invariant.
and
Correctness is impossible to attain if the requirements are inconsistent.
Access to an oracle that provides the valid formulas in arithmetic (this is an undecidable problem), and the expressivity of loop invariants.↩︎
Technically, the ‘quantifiers’ in the precondition are not firstorder quantifiers but instead abbreviations where \(i\) and \(j\) are metavariables that range over finitely many constant values: thus the formula is a big conjunction with sixteen clauses where each clause specifies the value of precisely one variable.↩︎
In 1937, shortly after the mathematician Lothar Collatz obtained his doctorate, he wrote down a problem in his notebook that later became known as Collatz’ problem or the \((3x+1)\)problem. The problem is remarkable since it is easy to state, but for more than eighty years no solution had been found.
Before the problem can be stated, we first need two ingredients: we define a function, and we recall what is repeated function application. The function \(f\) is defined on the positive natural numbers (\(1\), \(2\), \(3\), et cetera) with the following specification: \[\begin{gathered} f(x) = x \div 2 \text{ if }x\text{ is even},\\ f(x) = 3 \times x + 1 \text{ if }x\text{ is odd}.\end{gathered}\] Now take an arbitrary positive natural number \(n\). We can repeatedly apply the function \(f\) starting with \(n\), i.e. \(f(n)\), \(f(f(n))\), \(f(f(f(n)))\), et cetera. We write \(f^i(n)\) to mean that the function \(f\) is applied \(i\) times starting with \(n\), so that we have \(f^1(n)=f(n)\) and \(f^{i+1}(n)=f(f^i(n))\). The superscript notation, an operation on a function, should not be confused with exponentiation, an operation on a number. With this in mind, we can state the problem: for every positive natural number \(n\), is there an \(i\) such that \(f^i(n) = 1\)?
We can first try out a number of examples, to gain some intuition about the problem. If we take \(n = 1\), can we find an \(i\)? Clearly, \(f(1)\) is \(4\), since \(1\) is odd and so the first clause of the definition of \(f\) applies, and \(3\times 1 + 1\) is \(4\). Then we evaluate \(f(f(1))\), which is \(f(4)\): since \(4\) is even the second clause applies so we have \(f(4) = 2\). Finally, we evaluate \(f(f(f(1)))\), which is \(f(2)\): since \(2\) is even we have \(f(2)=1\). So \(f^3(1)=1\), hence we can take \(i\) to be three. For \(n = 1\) the problem can be solved.
The process we tried is called iteration. We find \(i\) by starting with the smallest value and try out successively larger values for \(i\) until we reach our desired destination. This process works efficiently, since the computations we did in the past to find out whether \(i\) was a solution can be reused to find out whether \(i+1\) is a solution. In the case of Collatz’ problem, we see that the outcome of the previous try, i.e. \(f^i(n)\), is what we feed back as input in the next try, i.e. \(f(f^i(n))\) computes \(f^{i+1}(n)\).
Such iterative processes can be written down neatly. Say we start with \(n = 5\). How often do we need to apply the function \(f\) until we reach the destination \(1\)? We write the following sequence, where each number is separated by an arrow: \[5 \longrightarrow 16 \longrightarrow 8 \longrightarrow 4 \longrightarrow 2 \longrightarrow 1.\] the computation sequence starts with \(5\) and we get \(16\) after applying \(f\) once (\(3\times 5 + 1 = 16\)), we get \(8\) after applying \(f\) again (so twice from our starting point), and so on, until we reach our destination after applying \(f\) five times from where we started. In other words, we have \(f^5(5) = 1\). So for \(n = 5\), we know there exists an \(i\), namely \(5\), such that \(f^i(n) = 1\). But meanwhile we also discovered for \(n = 16\), \(n = 8\), \(n = 4\) and \(n = 2\) a solution, since these were the intermediary numbers that turned up in the computation starting in \(5\).
As another example, take \(n = 3\). We then have \(f(3) = 3\times 3 + 1 = 10\), and \(f(10) = 5\), and from that point onward we already know what happens. Starting with the number \(5\), we see that we can extend the computation sequence towards the right (each time computing the function \(f\)), but we can also extend the computation sequence towards the left: \[\ldots \longrightarrow 3 \longrightarrow 10 \longrightarrow 5 \longrightarrow \ldots\] What would be the number before \(3\)? And how would the computation sequence starting from \(7\) look like? The reader may try answering these questions, to gain some intuition about Collatz’ problem.
For some numbers \(n\), the solution is not immediately obvious. The reader may try out \(n = 27\) (but, beware, the computation takes more than 100 steps). In fact, for large enough \(n\), the intermediary numbers that the computation goes through can be used to generate pseudorandom numbers that passes standard tests for randomness [12]. This fact may give us the impression that the numbers involved in the computation do not give much insight into solving the problem.
Now, the conjecture states that there is a solution for every \(n\). This is what is known as the ‘Collatz conjecture’. Although the problem is quite old, more recently a new interest for the problem emerged—as witnessed by the many published articles, including scientific articles (e.g. [2–4]) and articles in the popular press (e.g. The Simple Math Problem We Still Can’t Solve in QuantaMagazine). Also on social media such as MathOverflow, unanswered questions are raised, such as “How to know when the Collatz conjecture has been proved?”
In 2013, the late John H. Conway wrote about the problem in The American Mathematical Monthly, in an article entitled On unsettleable arithmetical problems [8]. In that article, Conway also speaks of wellknown results such as Turing’s unsolvable halting problem or Gödel’s incompleteness theorem. The halting problem is the question whether an algorithm halts (which, as we shall see, has Collatz’ problem as an instance) and Turing showed that it is in general unsolvable. The incompleteness theorem states that there are true statements in a formal system called Peano arithmetic that cannot be proven from Peano’s axioms. The two results are closely related: there is a proof of the incompleteness theorem by reduction to the halting problem [11].
Conway argues that it is very unlikely that the Collatz problem is settleable—a technical term he introduces for true assertions, as phrased in a set theoretical language, that can also be proven. A problem is unsettleable if it is true, but cannot be proven. There surely are unsettleable (or, unsettling?) assertions: the axioms of set theory are insufficient for proving all true assertions, by Gödel’s incompleteness theorem. But I find Conway’s argument, that Collatz conjecture is unlikely settleable, too difficult to grasp—it is probably because I lack the background knowledge and intuition behind his argument. However, Conway is not sure, and still leaves open the possibility for the conjecture to be proven:
I don’t want readers to take these words on trust but rather to encourage those who don’t find them convincing to try even harder to prove the Collatz Conjecture! [8]
A quick search on Google Scholar shows articles, that claim to have proven the Collatz conjecture. These articles are not published, but are available on preprint servers or institutional repositories on the Web. How can we evaluate such articles? Are they really proofs? Is it worth our time to review them?
By accident, I stumbled upon the preprint paper Collatz conjecture becomes theorem by Mirkowska and Salwicki [10]. Grażyna Mirkowska is a professor emeritus of Warsaw University, and she is an expert on mathematical logic, program semantics, and formal verification. Andrzej Salwicki is also professor emeritus, founder of the journal Fundamenta Informaticae, and he is an expert on the mathematical foundations of computer science. Both worked together on the 1978 book on Algorithmic Logic [9], among other works. Algorithmic logic is closely related to my field of expertise, Hoare’s logic and dynamic logic.
Maybe I can understand what is written in their article? In the final remarks, the authors write:
We know that our presentation is clumsy (we are of age, English is not our native language). [10]
It is not a good idea to reject this paper because of a clumsy presentation: the authors have a track record in the field, and their work must be taken seriously. Maybe I can figure out whether their paper makes sense to me? The adventure thus begins... And in this article, I do not present my own work, but I present what I could distill from what Mirkowska and Salwicki wrote.
First, we revisit the conjecture. We can understand the conjecture in a different way, by studying the program in Figure 1. Looking at the program as given, we have the following primitive operations and tests:
the test \(x\neq 1\) for deciding whether the number \(x\) is not equal to \(1\),
the test \(\mathit{even}(x)\) for deciding whether \(x\) is even,
the primitive operation \(x := x\div 2\) for dividing the number by two—but this operation is only executed in the context where we know that the old value of \(x\) is even and not equal to \(1\),
the primitive operation \(x := 3 \times x + 1\) that multiplies the old value of \(x\) by the constant three and adds the constant one—this operation is only executed in the context where we know that \(x\) is odd and not equal to \(1\).
If the program terminates on every input \(x \geq 1\), then the Collatz conjecture is true. From the terminating execution of the program we can then extract the computation sequence, simply by looking at the values that \(x\) take over time. If, however, the program runs infinitely for some input \(x\geq 1\), then we have a counterexample to the Collatz conjecture.
Let us abbreviate the program in Figure 1 by \(S\). We reformulate the question, whether the program \(S\) terminates or not, as follows:
If \(\{x\geq 1\land \phi\}\mathrel{S}\{\mathbf{false}\}\) is provable for some \(\phi\) in Hoare’s logic for partial correctness such that the precondition is satisfiable, then we know that the Collatz conjecture is false.
If \(\{x\geq 1\}\mathrel{S}\{\mathbf{true}\}\) is provable in Hoare’s logic for total correctness, then we know that the Collatz conjecture is true.
This formulation raises a number of question, as often is the case in Hoare’s logic. What is the language we use in assertions? What is the program theory? And what is the background theory?
What is the logical language we use in assertions? We restrict ourselves to a firstorder language consisting of addition only. This language consists of firstorder formulas with respect to a signature with:
the constant individual symbol \(0\),
the constant individual symbol \(1\),
the binary function symbol \(+\).
We can also introduce abbreviations: for every natural number \(n\) we have the numeral \(\underline{n}\). We have \(\underline{0} = 0\), and for any \(n \geq 0\) we have \(\underline{(n + 1)} = \underline{n} + 1\). In other words, we have the numerals: \[\underline{0} = 0,\ \underline{1} = 1,\ \underline{2} = 1 + 1,\ \underline{3} = (1 + 1) + 1,\ \underline{4} = ((1 + 1) + 1) + 1,\ \ldots\] Note that these numerals are terms constructed from constants and function symbols, hence do not depend on the value of variables: the numerals are all ground. Also, we implicitly used the fact that \(0 + 1 = 1\) since this follows from the background theory we introduce later.
We introduce the abbreviation \(x < y\) to stand for \((\exists z)(z\neq 0\land x + z = y)\) where \(z\) is fresh (so not equal to either \(x\) or \(y\)). \(y > x\) abbreviates \(x < y\), and \(x \geq y\) abbreviates \({y < x} \lor {y = x}\).
In a similar way as the numerals, can also introduce abbreviations for multiplication by a numeral: \(\underline{0}\times x = 0\), \(\underline{(n + 1)}\times x = x + (\underline{n}\times x)\) for \(n\geq 0\). In other words, we have the abbreviations \[\underline{0}\times x = 0,\ \underline{1}\times x = x,\ \underline{2}\times x = x + x,\ \underline{3}\times x = x + x + x,\ \ldots\] where we implicitly assume that \(+\) is associative, and \(x + 0 = x\) (again, we shall introduce the background theory later, from which these properties follow).
What is the program theory? This question amounts to showing how we axiomatize the primitive operations and tests. We introduce a predicate symbol for the test \(\mathit{even}(x)\), where \(x\) is a variable. We can define this predicate symbol in our language as follows: \[\mathit{even}(x) \equiv_\mathrm{def} (\exists y)(x = y + y).\] It is necessary that tests are decidable. Otherwise, if one would execute the program one can not make the case distinction in the \(\mathbf{if}\)statement.
Given that we have a formal understanding of the test, how do we axiomatize the two updates? We introduce the following axiom schemes. \[\{(\exists y)(x = y + y \land \phi[x := y])\}\mathrel{x := x \div 2}\{\phi\}\text{ where }y\text{ is fresh}\] The precondition of the divisionbytwo update states that the original value of \(x\) must be even before executing the operation. The witness of evenness, \(y\), is substituted for \(x\) in the postcondition—and we require the variable \(y\) to be fresh, that is, not already occurring in \(\phi\) and different from the variable \(x\). \[\{\phi[x := (\underline{3}\times x) + 1]\}\mathrel{x := 3\times x + 1}\{\phi\}\] The precondition of the timesthreeplusone update is the weakest precondition given the postcondition \(\phi\). Notice how we are able to express the new value of \(x\) in terms of the old value of \(x\) only by using the constant symbol \(1\) and function symbol \(+\) from our signature (recall that our multiplication by a numeral is an abbreviation, viz. \(x + x + x\)).
Now that we have axiomatized the test and primitive operations, we turn to the last question. What is the background theory? Until now, one may freely interpret what the language means. By means of a background theory we restrict the possible interpretations. From the background theory, we can derive consequences which must hold in every program state.
As background theory we take Presburger arithmetic. Presburger arithmetic is a restriction of Peano arithmetic, as it speaks only of the addition operation on numbers. By Gödel’s incompleteness theorem, the formal theory of Peano arithmetic can be shown to be incomplete, that is, there are valid sentences that are not consequences of the axioms. What is remarkable about Presburger arithmetic, however, is that it’s theory is complete: all valid sentences are also consequences of the axioms.
Furthermore, the axiomatization of Presburger arithmetic is recursive, that is, there exists a procedure to decide what are the axioms of Presburger arithmetic. From this, we also obtain that the formal theory of Presburger arithmetic is decidable: for any sentence, either we can use the standard proof system of firstorder logic and prove that the sentence is valid, or we can effectively find a model that satisfies all axioms but in which the sentence is false.
There are different presentations of the same theory. We shall give one that is short on paper, but it is presented by using the unary function \(s\). The unary function \(s\) is defined by \(s(x) = x + 1\), and we have that \(1 = s(0)\). The other axioms of Presbuger arithmetic are the following:
\((\forall x)(s(x) \neq 0)\),
\((\forall x)(\forall y)(s(x) = s(y) \to x = y)\),
\((\forall x)(x + 0 = x)\),
\((\forall x)(\forall y)(s(y) + x = s(y + x))\),
\(\phi(0) \land (\forall x)(\phi \to \phi(s(x))) \to (\forall x)\phi\).
The last is an axiom scheme for any formula \(\phi\), and where \(\phi(t)\) is the result of replacing the free variable occurrences of \(x\) by the term \(t\). From these axioms, the usual properties of addition follow. For example:
\((\forall x)(\forall y)(x + y = y + x)\),
\((\forall x)(\forall y)(\forall z)(x + (y + z) = (x + y) + z)\),
\((\forall x)(x \neq 0 \to (\exists y)(x = y + 1))\),
\((\forall x)(\forall y)(\forall z)(x + z = y + z \to x = y)\).
et cetera.
Now, we turn to the semantics of the program \(S\), of which the halting problem captures the Collatz conjecture. A program is nothing but a piece of text: a finite string of symbols. One may think of the intended meaning of a program, but nothing prevents two people looking at the same string of symbols and interpret it differently. To illustrate this concept, we shall look at two interpretations of \(S\). The fact that this simple program can be interpreted in different ways, somewhat delighted me when I first read the preprint paper by Mirkowska and Salwicki [10].
The standard interpretation is given by taking the following data structure:
Take the natural numbers \(\mathbb{N} = \{0,1,2,3,\ldots\}\) as domain.
Interpret the symbols in the usual way: \(+\) is addition of natural numbers, and \(<\) is less than.
This standard interpretation satisfies all the axioms of Presburger arithmetic.
Alternatively, one could give a complex interpretation, as follows:
Take the following subset of the complex numbers \(\mathbb{C}\) as domain: \[\{k + w\iota \mid k\in\mathbb{Z} \text{ and } w\in\mathbb{Q}^+ \text{ and if } w=0 \text{ then } k\geq 0\}\] where \(k\) is an integer, \(w\) is a nonnegative rational, and \(\iota\) is \(\sqrt{1}\). (We use the Greek \(\iota\) instead of the Latin \(i\), to avoid confusion with the natural number \(i\) we used earlier in function iteration \(f^i\).) The condition implies that we have no negative real numbers in our domain.
The constant individual symbols \(0\) and \(1\) are interpreted as the complex numbers \(0 + 0\iota\) and \(1 + 0\iota\), respectively.
The binary function symbol \(+\) is interpreted as usual: \[(k + w\iota) + (k' + w'\iota) = (k + k') + (w + w')\iota.\] Note in this equation alone, the symbol \(+\) has four different meanings: the function symbol \(+\) in our language, we have the \(+\) on complex numbers, we have the \(+\) on integers, and we have the \(+\) on nonnegative rationals!
This complex interpretation satisfies all the axioms of Presburger arithmetic.
In the complex interpretation, we have the socalled reachable elements and the unreachable elements. The reachable elements are the complex numbers \(k + w\iota\) in our domain with a zero imaginary part (and so, by the condition, we know that \(k\geq 0\)). The unreachable elements are the complex numbers in our domain with a nonzero imaginary part. Any operation performed on only reachable elements, gives us back a reachable element. However, an operation performed with at least one unreachable element, results in an unreachable element. This follows from the fact that the imaginary part is nonnegative.
In the complex interpretation, the order of elements is defined by \(x < y\), which abbreviates \((\exists z)(z\neq 0\land x + z = y)\). We can understand the order relation on complex numbers \((k + w\iota) < (k' + w'\iota)\) as it were a lexical order on pairs: \(\langle w,k\rangle < \langle w',k'\rangle\). This means that all reachable elements (where \(w=0\)) are below all unreachable elements (where \(w > 0\)). Further, in the complex interpretation we have that an element is even iff the real part is even.
With our complex interpretation, we can actually give an infinite run of the program! The following is a demonstration of a computation sequence: \[\begin{gathered} (8 + \frac{1}{2}\iota) \longrightarrow (4 + \frac{1}{4}\iota) \longrightarrow (2 + \frac{1}{8}\iota) \longrightarrow (1 + \frac{1}{16}\iota) \longrightarrow \qquad\\ (4 + \frac{3}{16}\iota) \longrightarrow (2 + \frac{3}{32}\iota) \longrightarrow (1 + \frac{3}{64}\iota) \longrightarrow (4 + \frac{9}{64}\iota) \longrightarrow \ldots\end{gathered}\] The computation sequence can be mapped back to an execution of the program (this takes a bit of work, but it is not difficult to see that the number generated each step is the value of the variable \(x\) right at the start of each iteration of the \(\mathbf{while}\)loop). Notice that, since we start with an unreachable value (the imaginary part is nonzero), the test of the loop never fails, and so the body of the loop is always taken. Any of the two operations (divisionbytwo or multiplybythreeaddone) on an unreachable value result in an unreachable value again, which can never be equal to the reachable value of \(1\).
However, this infinite run cannot be used to argue that the Collatz conjecture is false, since the conjecture speaks of the positive natural numbers and not the complex numbers we have introduced. So, coming back to our reformulated question: in Hoare’s partial correctness logic, can we prove \[\{x\geq 1\land \phi\}\mathrel{S}\{\mathbf{false}\}\] for some \(\phi\), and show that the precondition is satisfiable? No, we can not prove this based on our example above. The reason why is that the complex interpretation is elementarily equivalent to the standard interpretation. This means that the same firstorder sentences are true in both the standard interpretation as in the complex interpretation. Since our assertion language uses a firstorder language, we cannot express in \(\phi\) that the starting value of \(x\) has to be an unreachable element. Namely, the ability to express that \(x\) is an unreachable element (which would be false in the standard interpretation, but true in the complex interpretation under existential quantification) contradicts that the standard and complex interpretations have the same firstorder theories.
Note, however, that the above correctness formula may still be provable—showing the Collatz conjecture is false. If one gives a proof of \[\{x\geq 1\land \phi\}\mathrel{S}\{\mathbf{false}\}\] in Hoare’s logic for partial correctness, then the proof must be sound for the standard interpretation too. If one then also shows that \(\phi\) is satisfiable in the standard interpretation, then the Collatz conjecture is settled (to be false).
On the other hand, if we look at proving \[\{x\geq 1\}\mathrel{S}\{\mathbf{true}\}\] in Hoare’s logic for total correctness, are we sure that a proof means that the Collatz conjecture is true? In Hoare’s total correctness logic, we employ a proof rule for reasoning about the termination of the \(\mathbf{while}\)loop by giving two ingredients: the invariant (an assertion that holds before and after the loop and also before and after the loop body) and the variant (a term denoting a value that must decrease each iteration, and it must be shown that this term is larger than or equal to zero assuming the invariant).
There may be a problem with nonstandard interpretations. For example, in the complex interpretation, what does the variant denote? It may no longer express a quantity that ensures that the loop body is executed finitely many times. In fact, we already know that every unreachable element is larger than \(0\), and every unreachable element has infinitely many predecessors: so requiring that during the loop body the variant decreases no longer yields an argument that the computation must be finite. Yet, this is not a problem, for we can always ignore nonstandard interpretations. The proof rule is sound: interpreting total correctness with respect to the standard interpretation works fine. In that interpretation, the variant expresses that, before executing the loop, we can predict the maximum number of iterations that will be taken based on the values of program variables in the initial state.
Mirkowska and Salwicki suggest that the Collatz conjecture, formulated as questions of provability in Hoare logic as we did above, is not precise. They then start working on the level of the semantics in the complex interpretation, and restrict attention to those states in which the program variables have reachable values. However, here I disagree. That there exists nonstandard interpretations is not an excuse to not give a proof in Hoare’s logic: although such a proof can also be interpreted in a nonstandard interpretation, there is always the standard interpretation one can look at.
What I thus expected to see was a clear description from which I can extract a proof in Hoare’s total correctness logic. In particular, to perform such an extraction, I need to know the answers to the following questions:
What is the loop invariant of \(S\)?
Is the loop invariant expressible as an assertion?
What is the loop variant of \(S\)?
Is the loop variant expressible?
Unfortunately, the paper does not clearly give answers to these questions. When I read the paper, I was not able to verify that there actually is a proof. The paper presents many interesting ideas, though, but I feel that I am doing original research to see how the ideas presented are related to me trying to answer the questions above. I should not have the feeling I am doing original research when my task and only task is to verify a proof!
Unfortunately, I am unable to extract a proof from the preprint paper by Mirkowska and Salwicki [10]. Thus, the Collatz conjecture still remains unsettled—at least in my mind—and thus remains unsettling. The lesson, if there is a need for one, is that nonstandard interpretations of Hoare’s logic exists, and may shed light on a problem from a different angle.
At some point during reading, I got the following intuitions. I share them here—just for fun—but these closing remarks may not be valuable to anyone and may only cause confusion.
If we revisit the computation sequence above, that yielded an infinite run, we see the following pattern (we call these the even step and the odd step): \[\begin{aligned} (k + \frac{3^o}{2^e}\iota) &\longrightarrow (\frac{k}{2} + \frac{3^o}{2^{e+1}}\iota) \text{ if }k\text{ is even,}\\ (k + \frac{3^o}{2^e}\iota) &\longrightarrow ((3\times k + 1) + \frac{3^{o+1}}{2^e}\iota) \text{ if }k\text{ is odd.}\end{aligned}\] For simplicity, assume \(k\) is a positive natural number, and \(o,e\) are natural numbers. Forgetting about the imaginary part, we see simply that the function \(f\) we defined in the introduction is applied on \(k\). However, in the imaginary part, we see that the rational number is ‘counting’ how many times which part of the function was applied. If \(k\) was even, we increase the denominator from \(2^e\) to \(2^{e+1}\). If \(k\) was odd, we increase the numerator from \(3^o\) to \(3^{o+1}\).
Now, for any computation sequence on natural numbers of the domain of the standard interpretation, there exists infinitely many corresponding computation sequences on the complex numbers of the domain of the complex interpretation. We are free, so to say, to choose the imaginary part of the complex number. So, let us look at the complex interpretation. From now onward, my thoughts are a bit cloudy and imprecise—so we proceed in unclear territory.
Since in computations of the complex interpretation, ‘we are free’ to choose the imaginary part, could we treat the imaginary part as a ghost count of the number of times we see an even number and the number of times we see an odd number? If the even numbers tend to outnumber the odd numbers then we are going towards the exit—but at any moment we are doubtful, we may freely reset the counters. Even when we reach the destination, we may still continue (here \(\longrightarrow^*\) means multiple steps, even or odd): \[(1 + \iota) \longrightarrow^* (1 + \frac{3}{4}\iota) \longrightarrow^* (1 + \frac{9}{16}\iota) \longrightarrow^* (1 + \frac{27}{64}\iota) \longrightarrow \ldots\] Here the denominator grows faster than the numerator: the imaginary part tends towards zero in the limit if we never reset the ghost counter back to \(\iota\). Does this intuition help? I do not know. Maybe one can show that there is a computation sequence in the complex interpretation in which the imaginary part tends to zero if and only if there is a corresponding computation sequence in the standard interpretation? If so, can we then also show that a computation sequence tends to zero if and only if it contains a number with \(1\) as real part?
The intuition here is that we want to jump ahead a potential computation sequence starting in some number, as if we could travel in time. The target of our jump is a number with the same real part—that would be evidence of a loop. If the loop also exists in the standard interpretation, then all the intermediary numbers must be natural numbers. Suppose we are at an arbitrary section of the computation sequence: \[(f^0(k) + w_0\iota) \longrightarrow (f^1(k) + w_1\iota) \longrightarrow \ldots \longrightarrow (f^i(k) + w_i\iota)\] where \(f^0(k) = k\). The sequence \(w_0,w_1,\ldots\) are the values of the ghost counters. We now want to reset the ghost counters back to \(\iota\), but keep the real part. Note that the imaginary part of \(\iota\) is the fraction \({3^0}\div{2^0}=1\), meaning both even and odd counters are zero. After resetting the ghost counters, we have the section that starts with \((k + \iota)\): \[(f^0(k) + \frac{3^{o_0}}{2^{e_0}}\iota) \longrightarrow (f^1(k) + \frac{3^{o_1}}{2^{e_1}}\iota) \longrightarrow \ldots \longrightarrow (f^i(k) + \frac{3^{o_i}}{2^{e_i}}\iota)\] where the sequences \(o_0,o_1,o_2,\ldots\) and \(e_0,e_1,e_2,\ldots\) are running ghost counts with \(o_0=e_0=0\). For any index in the sequence \(0\leq j\leq i\), we have that \(o_j + e_j = j\).
Is it possible to have \(f^0(k) = f^i(k)\) but where the numerator grows faster than the denominator, i.e. \(3^{o_i} > 2^{e_i}\)? If so, can it then also have \(1\) occurring in the sequence? Note that \(1\) is never reached as a result of \(3\times x + 1\) (only \(x\) with zero real part would give \(1\), but that never occurs since \(0 + w\iota\) is even and thus stays even), so from that point onward the denominator would dominate the numerator.
And what about the nonrepeating computation sequences? Where there is no loop, but in which the imaginary part does tend to zero?
So many questions are still floating around in my mind, like butterflies...
Some say that an education is more than professional training. By such professional training, I mean equipping students with useful skills such as communicating with humans or with computers, finding interesting or profitable problems to work on that are relevant to society at large, and finding novel, practical, efficient, and elegant solutions to such problems, et cetera. One could speak of skill competency, being measured at levels that indicate how well an individual is able to perform in a particular skill. There are skills which are considered more useful than others in a professional setting, thus correspondingly considered worth more. Following the tradition of New Public Management, one can then do a backontheenvelope costsandbenefits calculation (in Dutch this is called 'studierendement'): surely, becoming proficient in computer science is worth the effort, to both the student at the individual level, and to society at the collective level. But all this skill development should not cost too much: there should be a positive Return On Investment!
Although professional training works, and could be very valuable, it also misses essential points of an education. But what, then, is an education more than training students and developing skills? Below, I shall outline the objectives of an education that go beyond the objectives of professional training. Generally speaking, during academic study, a student:
All these points contribute to an education. Simply put, an education helps students to become selfsufficient. I shall discuss the above points onebyone, and reflect on each general statement how, in particular, it applies to computer science education.
Small is the number of them that see with their own eyes and feel with their own hearts.
— Albert Einstein
A typical university student (age: 1825) progresses through the human developmental stage of emerging adulthood and enters early adulthood. In these stages of human development, students improve in functioning (e.g. mastering technical skill), become disciplined actors and critical thinkers (e.g. improving use of language, developing awareness of contextsensitivity, and engaging in selfreflection), and develop a clear sense of self and identity (e.g. by demonstrating intellectual curiosity and creativity). An excellent education aims to improve not only a student's capability in technical skill, communication, comprehension, and reasoning ability—but also in attaining and maintaining a position of (financial) stability, physical and mental health and safety, the perseverance to overcome challenges, and the resilience to handle failure gracefully.
In computer science there are lots of different technical skills and languages by which a student can develop, e.g. there are many programming languages, logical languages, and mathematical languages to study and practice with. Each language comes with its own set of challenges: some are easier to learn and others are more difficult. The required level of discipline varies: some languages are rigorous and exact, while others are more lenient and fuzzy. In a wellrounded curriculum, there should be a lot of room for exploration and creativity: I believe that the best form of learning is rediscovery, in which (practical and theoretical) results, and their importance, are arrived at by oneself.
Ideally, the objective of education is not to learn just new languages, but also to reflect on the learning process itself and learn how to learn. Each language is a tool to achieve this objective. The end goal is attained if a student is able to confront challenges and tackle difficulties encountered during learning, which often necessitates independent study and a disciplined approach to thinking and working. A mean to this end is progression through gradually more difficult subjects in time. To keep progressing further, it seems that the following factors are crucial for student engagement: confidence, interaction, and autonomy (CIA).
Confidence. A lack of confidence hampers student engagement with the subject. Confidence can be increased by allowing students to revisit the necessary background knowledge, allowing students to check their own work, and giving hints towards a solution to exercises in case students are blocked for too long. Confidence can be decreased by obscurity of reference material, a lack of (direct or indirect) feedback, or information overload.
Interaction. Ideally, students are intrinsically motivated to independently study the subject material. However, most times students need help from, and interaction with, their social surroundings. One effective way is to work together, in the sense of mentorship, to guide students through the material by means of a dialogue. Sometimes students can help one another (which ideally is useful for both), but this requires 'matching up' students at different levels. It is a clear signal of insufficient interaction if interaction only starts after a final examination.
Autonomy. Ideally, each student can autonomously choose their next direction to explore in the subject (depending on their starting point). Sometimes it is a joy to discover that one has entered a deadend: a valuable lesson! But finding oneself too often stuck in deadends is risky. Hints or nudges towards the right path are helpful to avoid going too far astray—however, these hints or nudges should never be enforced, and only offered as advice.
If an opinion contrary to your own makes you angry, that is a sign that you are subconsciously aware of having no good reason for thinking as you do.
— Bertrand Russell
Truth exists regardless of one's expectation. Something is not true because one wants it to be true or profits from it being true. One should seek the truth from a position of freedom: disinterestedness in the sense of free from bias or (self)interest. It may be difficult to engage all students in cultivating curiosity for why something is true. This could be the result of students being subjected to a prior nonfree learning experience, where students were instructed to 'just remember' what was said and take it for granted, or to 'just do' what an assignment asks for—rather than stimulating students to raise questions and take the initiative to study the subject more deeply. Surely, it is easier to let students stop thinking, than not!
To cultivate a disinterested love of truth, one question to keep in mind is: what is true? Or even deeper, what is truth? It is not wise to claim that something is true in the absolute sense, or that only one person knows the truth: truth is something each student has to find out for themselves. The most reliable form of truth, looking from an epistemological point of view, is experienced individually: to be convinced of truth requires honest, independent, and disinterested verification by means of individual mental construction (theory), or direct experience by ones own senses (practice). As a lover of truth, viz. a philosopher, a student seeks answers to questions of truth through diverse means. This requires students to carefully and critically assess their own perception, memory, and (prior) beliefs and skills. Every (including one's own) testimony, evidence, judgment, theory, or ideal should be regarded critically, or even skeptically when in (self)doubt.
However, criticism and skepticism is bottomless, and students also need to be able to overcome the Münchhausen trilemma: even the most rigorous methods for attaining truth have limitations, and there are no absolute or complete methods for obtaining truth or certainty. Generally speaking, one should be interested in comparing the authority (in the sense of convincing force) of different arguments, and should be able to follow and participate in academic debate that has culminated or is culminating in consensus.
Some questions fundamental to computer science are: what are the foundations of computer science? What are the limits of computing? What are the limits of logical, mathematical, and scientific reasoning? What languages can (not) be used to express ideas clearly? When comparing two entities, what differences are significant (or not)? What basic facts about computer science are relevant, and how does this depend on the context? What research agendas are out there and are they conflicting or complementary? What distinguishes fashions and opinions from necessities and facts? How can one turn vague ideas into concrete conceptualizations? What academic debates were relevant in the past, are still relevant today, and will become relevant in the future? Et cetera.
The originator of a new concept [...] finds, as a rule, that it is much more difficult to find out why other people do not understand him, than it was to discover the new truths.
— Hermann von Helmholtz
It is important for students to develop a scientific attitude that is critical of technological developments in the field. A scientific attitude is exploratory in nature and embraces exciting speculation, necessitates experimentation, but requires one to be scrupulous in publicly making claims, and only to present findings that others can independently verify. Ideally, a scientific discovery lays bare not only the presence of a mechanism, but also its underlying structure that explains why predictions are reliable and how they are correct. Technological progress should be considered within a wider ethical context, also keeping in mind the actual and potential detrimental effects on human society and nature.
Especially in the field of computer science, a critical outlook on technology is important, since not all developments in our field are necessarily beneficial to all of humankind or nature. For example, most computer technology only benefits a tiny portion of humanity: is this fair, or not? Are practitioners that apply the results of computer science misguided or misinformed, even when they form a large majority influencing consensus and best practices, or not? Consensus is not a replacement of truth, but standing up against consensus might be risky for one's career or public standing (e.g. nobody likes the partypooper).
Especially in computer security, sometimes one needs courage to raise issues and be very confident of one's case, also considering the potential (legal) repercussions instigated by large companies and governments. But also in theoretical computer science, it is too easy to forget 'the bigger picture' and lose oneself in a deluge of technicalities: how can one be certain that the subject is studied as simple as is possible, without being too simplistic? The cases above necessitate a scientific attitude, to ensure that one only takes positions that can withstand reasonable doubt.
Critical thinking is an important skill, especially in an academic discipline such as computer science. Many computer scientists fall in the trap of making things too complicated (unfortunately, myself included). How can one prevent such tendencies? Working on complicated and obscure subjects may scratch one's own itch of intellectual curiosity, but the trap is that it makes the field more difficult for outsiders to enter. It requires a good education to appreciate the fine balance between necessary complexity and accidental complexity of programs, theories, and systems. The key is that simplicity and elegance, while difficult and hard to attain, pays off by making it easier to explain to others what one did.
Don't for heaven's sake, be afraid of talking nonsense! But you must pay attention to your nonsense.
— Ludwig Wittgenstein
Finally, empowering students allows them to defend their own, outspoken position in public. A reasonable position is supported by rational and wellfounded argumentation, presented in an orderly (no chaos), timely (not too long), and, ideally, beautiful manner. To appreciate reason as a means of establishing truth, one has to be receptive of criticism, to be openminded towards the possibility of finding faults and flaws in one's own position. A good defender is capable of seeking out opponents, even if those opponents are established authorities, and engage in a mutually respectful, decent, and civilized debate. But, every defeat is a learning opportunity: losing is fun!
There are many opportunities in computer science for taking in and trying to defend a position, for example in designing programs (and surviving programming errors), theories (and surviving inconsistency), and systems (and surviving design flaws). There are qualitative and quantitative approaches by which any idea can be evaluated, such as (among many others) program correctness and efficiency, system resilience, error tolerance, usability and extensibility, and theoretical elegance and simplicity: an educated person is able to appreciate these fine details.
]]>On Wednesday, 8th of March, 2023, I gave a lecture about two sorting algorithms for the course Program Correctness (see also the lecture series on YouTube, available only in Dutch). Then, due to limited time, I only discussed a sketch of the correctness proof. But I promised the students that I would write down the correctness argument in more detail. So, in this article, I will revisit the two sorting algorithms and give the correctness proof in full detail.
We had a look at two sorting algorithms: gnome sort and bozosort. The purpose of a sorting algorithm is to operate on an array and rearrange its elements in order. The two algorithms presented are not the most efficient sorting algorithms, but that is not of our concern: instead, we will look at them from the perspective of their correctness.
The main questions answered in this article are:
When proving an algorithm correct, it is important to first have a rough, informal, idea how the correctness argument should go. Only then it makes sense to formalize the argument, by writing down a proof outline. During the latter activity, one can systematically check the argument to ensure there is no fault in one’s own reasoning. Thus, both aspects are important: the bigger picture and the devil’s in the details.
The purpose of a sorting algorithm is to rearrange elements in an array, so that the final result is an array where all elements are in order. For simplicity, we assume we are dealing with an array of integers. The ordering of integers is their natural order, i.e. \(\ldots\leq 2\leq 1\leq 0\leq 1\leq 2\leq \ldots\).
Given array \(a\) of type \(\mathbf{integer}\to\mathbf{integer}\). We define the following predicate: \[\mathit{Sorted}(a) \equiv_\mathrm{def} \forall i, j : i \leq j \to a[i] \leq a[j].\] The above predicate expresses that the whole array \(a\) is sorted. We also define: \[\mathit{Sorted}(a,f,t) \equiv_\mathrm{def} \forall i, j : f \leq i \leq j \leq t \to a[i] \leq a[j].\] This predicate expresses that array \(a\) is sorted on the range \([f,t]\), i.e. from index \(f\) until and including index \(t\).
Alternatively, we can define the following predicates: \[\mathit{Sorted\,}^\prime(a) \equiv_\mathrm{def} \forall i : a[i] \leq a[i + 1]\] and \[\mathit{Sorted\,}^\prime(a,f,t) \equiv_\mathrm{def} \forall i : f \leq i < t \to a[i] \leq a[i + 1].\] It can now be verified (e.g. using a proof system for predicate logic such as natural deduction) that \(\mathit{Sorted}(a) \equiv \mathit{Sorted\,}^\prime(a)\) and \(\mathit{Sorted}(a,f,t)\equiv\mathit{Sorted\,}^\prime(a,f,t)\).
We may use these predicates to describe the desired outcome of a sorting algorithm: namely, that array \(a\) is sorted (on a particular range). However, this property alone is not sufficient. We also require a relation between the input array and the output array, to specify that the algorithm did not insert new, duplicate old, or throw out any elements. Note that the input array and the output array are stored in the same place in memory, so it matters not where we look but when we look. By input array we mean the value of the array \(a\) before the algorithm runs, and by output array we mean the value of the (same) array \(a\) but after the algorithm finished running.
To avoid algorithms inserting new, duplicating old, or throwing out elements, we require there exists a onetoone correspondence between the input and output array.
Given array \(b\) of type \(\mathbf{integer}\to\mathbf{integer}\). We use \(b\) as the name for the input array, whereas \(a\) is the name for the output array. We now define the following predicate: \[\mathit{Permut}(a,b) \equiv_\mathrm{def} \exists \pi : \mathit{Inj}(\pi) \land \mathit{Surj}(\pi) \land \forall i : b[\pi(i)] = a[i].\] This predicate expresses that the whole array \(a\) is a permutation of array \(b\). The intuition of \(\mathit{Permut}(a,b)\) is that \(\pi\) represents a bijection between integers: a onetoone correspondence between the indices of the output array \(a\) and input array \(b\). Here we make use of the following definitions: \[\mathit{Inj}(\pi) \equiv_\mathrm{def} \forall x, y : \pi(x) = \pi(y) \to x = y\] and \[\mathit{Surj}(\pi) \equiv_\mathrm{def} \forall x : \exists y : \pi(y) = x.\]
(As an aside, i.e. not relevant for the rest of this article, note that it depends on the language in which we work whether quantification over \(\pi\) is firstorder or not. If we work in the language of Peano arithmetic, this quantifier is higherorder. But if we work in the language of set theory, this quantifier is firstorder where \(\pi\) ranges over sets representing functions \(\mathbf{integer}\to\mathbf{integer}\).)
We actually need a stronger predicate than \(\mathit{Permut}\), namely to expresses that array \(a\) is a permutation of array \(b\) for a particular range, and leaves all other elements in place. Compare this with how we have two predicates for being sorted: \(\mathit{Sorted}(a)\) and \(\mathit{Sorted}(a,f,t)\). So we now define the following predicate: \[\begin{aligned} \mathit{Permut}(a,b,f,t) &\equiv_\mathrm{def} \exists \pi : \mathit{Inj}(\pi) \land \mathit{Surj}(\pi) \land (\forall i : b[\pi(i)] = a[i])\ \land\\ &\qquad\quad (\forall i : f \leq i \leq t \lor \pi(i) = i).\end{aligned}\] The new condition requires of the bijection \(\pi\) that every index \(i\) that falls outside of the range \([f,t]\) is mapped identically. We could say that the latter predicate expresses a restricted permutation.
Gnome sort is a simple sorting algorithm. The story behind the algorithm is as follows. Suppose there is a garden with flower pots arranged next to each other on a table. Each flower pot contains a beautiful flower of a certain height. A gnome comes along, and being a pedantic gnome, wants to arrange the flowers in such way that the flowers in the pots are ordered from the smallest flower to the largest flower on the table.
How does the gnome achieve this? The gnome stands next to the table, in front of a single flower pot. The procedure is easy:
The gnome starts at the leftmost flower pot.
If there is no preceding flower pot, or if the flower in the preceding pot is smaller than the flower in the pot in front of the gnome, then the gnome takes one step to the right.
If there is a preceding flower pot and the flower in the preceding pot is larger than the flower in the pot in front of the gnome, the gnome swaps the two flowers and takes one step to the left.
If the gnome has not reached the end of the table, it goes back to step 2.
Now, we use an array \(a\) to represent an array of flower pots, each cell of the array is a flower pot, and the value stored in each cell represents the height of a flower. Swapping the values of two cells of the array would represent that the gnome, being a true gardener, would take out the flowers of the two pots and place them back in the other pot. Try to imagine how the gnome runs!
Note that gnome sort is slightly different from insertion sort. In the insertion sort algorithm, we need to keep track of two locations: the location of the element which is being inserted in the proper place, and the location of where to insert that element. Insertion sort is typically implemented using a nested loop: after the given element is inserted in the prefix (inner loop), we can continue with the next element after the prefix (outer loop). However, in gnome sort, there is only a single position that is tracked, namely the position of the gnome. The gnome has to walk back after it has placed the element in the proper position, and thus performs more comparisons than insertion sort.
We can write down an algorithm that encodes the procedure of the gnome: see Figure 1. We are given variables \(f\) and \(t\) of type \(\mathbf{integer}\). These represent the range of indices in the array \(a\) representing the flower pots, where \(f\) is the index of the first flower pot and \(t\) is the index of the last flower pot. The variable \(i\) of type \(\mathbf{integer}\) represents the position of the gnome, and the variable \(z\) of type \(\mathbf{integer}\) is a temporary variable used for swapping the flowers.
We can make the following observations of the algorithm in Figure 1:
If \(t \leq f\) then the algorithm terminates without modifying the array. Otherwise, the bounds of the position \(i\) are: \(f\leq i\leq t+1\).
The array from \(f\) to \(i  1\) is always sorted. This property is a loop invariant: it holds at the beginning of the loop, at the beginning of the loop body, at the end of the loop body, and after the loop.
The sorting algorithm does not insert new elements, duplicate old elements, or throw out elements: thus, the output array (that is, the value of \(a\) after running) is a permutation of the input array (the value of \(a\) before running) restricted to the given range \([f,t]\). At the beginning and end of the loop body, the current value of the array is also a restricted permutation of the input array, and this property is a loop invariant. However, this is temporarily broken when we swap around values in the array!
We now formalize the correctness proof, by means of a proof outline (see Figure 2). First, we introduce a freeze variable, the array \(b\) of type \(\mathbf{integer}\to\mathbf{integer}\). We may think of \(b\) as a snapshot of the array \(a\) at the time before running the algorithm. Since \(b\) is never modified by the program, it maintains its value over time, and thus allows us to compare the actual value of \(a\) with its original value.
The precondition as formulated expresses that the freeze variable \(b\) contains a snapshot of array \(a\) at this instant. Also, we restrict ourselves in this proof to the case where \(f\leq t+1\) holds (since, otherwise, the algorithm has no effect).
This assertion follows from (2.1) since the following properties hold: \[(\forall i : a[i] = b[i]) \to \mathit{Permut}(a,b)\text{ and }\mathit{Permut}(a,b) \to \mathit{Permut}(a,b,f,t).\] We take as a witness (for the existentially quantified \(\pi\) in the definition of \(\mathit{Permut}\)) the identity function, which is a bijection. Also, the following property holds: \[t \leq f \to \mathit{Sorted}(a,f,t),\] since the range \([f,t]\) must be empty if \(t < f\), and \(\mathit{Sorted}(a,f,f)\) also holds.
As we already observed from the algorithm, we can now formalize the loop invariant. The first part expresses that the array from \(f\) until (but excluding) \(i\) must be sorted. The second part expresses that the actual value of \(a\) is a permutation of the input array (given the name \(b\)), restricted to the range \([f,t]\). The third part expresses the bounds of the position \(i\). Note that by applying the substitution rule we obtain assertion (2.2), so we have verified that this loop invariant is initially valid. In the remainder of the proof outline we check whether the loop invariant is preserved by the loop body, and allows us to conclude our post condition.
This assertion always holds at the start of the loop body, where we know the loop test is true. Thus we can make the invariant stronger: we know that \(i\leq t\) subsumes \(i\leq t+1\).
We have obtained this assertion in the following way: we need to establish the loop invariant of (2.3) at the end of the loop body. Hence, it has to be a postcondition of the thenbranch of the conditional statement. We apply the substitution axiom that replaces \(i\) by \(i + 1\). Now, why does this assertion follow from the preceding assertion? We discriminate two cases:
Case \(i = f\). We can establish \(\mathit{Sorted}(a,f,f)\) from the general property mentioned at (2.2).
Case \(a[i  1] \leq a[i]\). The following property holds: \[\mathit{Sorted}(a,f,i  1) \land a[i  1] \leq a[i] \to \mathit{Sorted}(a,f,i)\] since we extend the range but also have the required property about \(a[i]\) and \(a[i+1]\) (and by transitivity of \(\leq\), also for all elements before \(a[i]\), cf. why the predicates \(\mathit{Sorted}\) and \(\mathit{Sorted}'\) are equivalent).
That \(\mathit{Permut}(a,b,f,t)\) follows from \(\mathit{Permut}(a,b,f,t)\) (shown with dots) is trivial, and it is easy to see that \(i\leq t \to i + 1 \leq t + 1\) and \(f\leq i \to f\leq i+1\).
The assertion above this one is obtained from (2.4) and the negation of the conditional test. This assertion follows from the one above by the following argument. We have the property: \[j < i \land \mathit{Sorted}(a,f,i) \to \mathit{Sorted}(a,f,j)\] since the range becomes smaller. \(\mathit{Permut}(a,b,f,t)\) implies itself is again trivial, and \(i\leq t \to i1 \leq t+1\) and \(f<i\to f\leq i1\) are easy to see.
But how did we obtain the assertion at (2.6) in the first place? Intuitively, it is the result of swapping \(a[i]\) and \(a[i1]\) in the assertion (2.8). Swapping does not affect \(f\leq i1\leq t+1\). We have the following property: \[t < i \to \mathit{Sorted}(a,f,t)[a[i] := v] \equiv \mathit{Sorted}(a,f,t)\] since \(a[x][a[i] := v] = a[x]\) for all \(x < i\): there is no aliasing here!
The proof outline has a missing link: why is it that \(a\) and \(b\) are still restricted permutations when we swap \(a[i]\) and \(a[i1]\)? The essence of the argument is that from the witness \(\pi\) of \(\mathit{Permut}(a,b,f,t)\), we can construct a new witness, say \(\pi'\), as follows: \({\pi'(i) = \pi(i  1)}\) and \({\pi'(i1) = \pi(i)}\) and for all other arguments \(\pi'\) and \(\pi\) agree. Further, \(i\) and \(i1\) fall within the restricted range \([f,t]\) of the permutation.
To phrase it in a different way, we have the following property: \[\mathit{Permut}(a,b,f,t)[a[i  1] := z][a[i] := a[i  1]][z := a[i]] \equiv \mathit{Permut}(a,b,f,t)\] if \(f\leq i1\leq t+1\).
Easily obtained from the loop invariant by replacing \(i\) by \(i1\).
We check that the loop invariant and the negation of the loop test actually leads to the desired postcondition: we have that \(i = t + 1\) must hold.
This concludes the correctness argument of the gnome sort algorithm.
Bozosort is also a simple sorting algorithm. Suppose you stand in front of a table with cards on its surface, and each card has a number printed on it. Now pick any two cards on the table and swap them. Repeat this, until the numbers on the cards are all sorted from small to large. In this algorithm, we make use of a source of randomness: namely, to pick two cards to swap.
Again, we use an array \(a\) of type \(\mathbf{integer}\to\mathbf{integer}\) to represent the cards on the table. Each cell of the array corresponds to a spot on the table, and the value stored in the array is the number printed on the card.
We can write down an algorithm that encodes this procedure, see Figure 3. Here, we are given variables \(f\) and \(t\) of type \(\mathbf{integer}\) representing the bounds of the array. The algorithm consists of an outer loop and an inner loop. The loop body of the outer loop has two components (separated by vertical space). By component I simply mean a subprogram. The first component chooses two random numbers and swaps the values in the array. The second component walks through the array to test whether it is actually sorted. We use the variables \(i\) and \(j\) of type \(\mathbf{integer}\), and the variable \(c\) of type \(\mathbf{boolean}\).
To understand the meaning of the random assignment, we can make use of the following axiom: \[\{p\}\mathrel{x := \mathbf{random}(l,h)}\{p \land l\leq x\leq h\}\] where \(p\) is an arbitrary formula where \(x\) does not occur free, and \(x\) does not occur in the arbitrary expressions \(l\) or \(h\). Intuitively, this statement selects (nondeterministically) an integer between \(l\) and \(h\) and updates the value of variable \(x\) with the selected value.
What happens when \(l > h\) is the case and we perform the random assignment? According to the above axiom, we obtain \[\{l > h\}\mathrel{x := \mathbf{random}(l,h)}\{l > h \land l \leq x \leq h\}.\] The postcondition is contradictory, so equivalent to \(\mathbf{false}\). Thus, operationally, we could think that running the random assignment from such a situation is equivalent to running a program that never finishes. This works because we look at Hoare triples in their partial correctness sense.
We can make the following observations of the algorithm in Figure 3:
The actual value of array \(a\) is a permutation of the input array \(a\) at any control point, except in the middle of the first component where we perform the swapping of two elements. Thus, we could use this fact as a loop invariant of both the inner and the outer loop.
After the second component finishes its execution, the variable \(c\) represents whether the array is actually sorted or not. Hence, it is a loop invariant of the outer loop that if \(c\) is true, then the array is sorted. This means that the loop only exists when the array \(a\) is actually sorted!
The inner loop that checks whether the array is sorted looks a bit like gnome sort: the position variable \(i\) is moved to the right whenever we have tested the array elements are in order. But, instead of walking to the left, the inner loop has an early exit in case it encounters two elements that are not properly ordered. By setting \(c\) to \(\mathbf{false}\), the outer loop must run again.
We now formalize the correctness proof, by means of a proof outline (see Figure 4). Again, we introduce a freeze variable, the array \(b\) of type \(\mathbf{integer}\to\mathbf{integer}\), with the same purpose as before.
This assertion follows from the precondition, since \((\mathbf{false}\to p)\) is vacuous for any formula \(p\), and the precondition implies \(\mathit{Permut}(a,b,f,t)\) for the same reason as given in (2.2).
Here we have formalized our intuition of the loop invariant for the outer loop. Note that the assignment axiom allows us to obtain (4.1).
When entering the loop body we know that \(c\) must be \(\mathbf{false}\), so we can adapt the loop invariant: only information about the array \(a\) being a restricted permutation remains present here.
Obtained by applying our axiom for random assignments.
Also obtained by applying our axiom for random assignments. Note that from this assertion it already follows that \(f\leq t\).
Again we have the swapping of two elements. The argument needed here is a slight generalization of the argument of (2.7) above, where the essence is this property (given \(f\leq i\leq t\) and \(f\leq j\leq t\)): \[\mathit{Permut}(a,b,f,t)[a[j] := z][a[i] := a[j]][z := a[i]] \equiv \mathit{Permut}(a,b,f,t).\] Note that if the two random variables have the same value, the swap has no effect.
Comparing to (4.3), we now also know that \(f\leq t\). Clearly, the three preceding assignments cannot affect \(f\leq t\) and we already knew it holds in (4.5). In the other case (that \(f > t\) holds), we would not even reach this point in the program.
We here formulate the loop invariant for the inner body. The essence is that we know that everything on the left (and including) of \(i\) must be sorted. This is initially valid, since \(\mathit{Sorted}(a,f,f)\) is true as in (2.2). We still need to establish that this is indeed a loop invariant.
When the loop body is entered, we know both \(c\) and \(i < t\). The latter is used to make our assertion stronger than the loop invariant.
In the case the elements are properly ordered, we can again apply the property as mentioned in the second case of (2.5).
Easy to see that \(i<t\) implies \(i\leq t\). It is quite surprising that the assertion at this point does not say anything about variable \(c\).
We now establish that the outer loop invariant follows from the inner loop invariant, under the condition that the inner loop has terminated (so it’s test must be false). There are two cases:
Case \(\lnot c\): we have an early exit of the inner loop. Since \(c\) is \(\mathbf{false}\), the left conjunct is vacuous. The right conjunct trivially follows from the inner loop invariant.
Case \(\lnot (i < t)\): we know the inner loop has fully executed, so \(i = t\) from the upper bound in the loop invariant. So \(\mathit{Sorted}(a,f,t)\) must hold, regardless of the value of \(c\).
The postcondition is obtained by a double negation elimination on the test of the outer loop: \(\lnot\lnot c\) implies \(c\), and so from \((c\to \mathit{Sorted}(a,f,t))\) we obtain that the array is actually sorted. Again, \(\mathit{Permut}(a,b,f,t)\) follows trivially from the outer loop invariant.
This concludes the correctness argument of the bozosort algorithm.
We have seen two sorting algorithms, and discussed their correctness proofs. Although almost every detail is present here, there still remains a good exercise for practicing with array variable substitutions: to write down the proof outline of swapping the array elements above, and working out in all detail how it affects the restricted permutation predicate \(\mathit{Permut}\).
In this article, we only look at program correctness in the sense of partial correctness. An interesting question remains: what can we say about the termination of these algorithms? Under what conditions do these algorithms terminate? In the next following weeks of the Program Correctness course, we will look at total correctness, where we shall prove not only the correctness of a program with respect to a specification of its input and output behavior, but also whether the program terminates!
I thank Dominique Lawson and Roos Wensveen (both student assistants of the Program Correctness course) for suggesting improvements, and discovering an error in a previous version of this article. The error was that the postcondition needs to be \(\mathit{Permut}(a,b,f,t)\) and not \(\mathit{Permut}(a,b)\) (can you see why?). All remaining errors remain my own.
ChatGPT is a research project developed by OpenAI, which gives the following charter on their website: "OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome." I have used version Februari 13 (2023) of ChatGPT.
After creating an account and logging in, the user interface for ChatGPT gives the following disclaimer: "While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice." So, we're better beware of the answers...
I give it the following first input: Hi. I would like to ask you a number of exam questions about the course Concepts of Programming Languages.
ChatGPT: Of course, I'll do my best to help you with your exam questions about Concepts of Programming Languages. What would you like to ask?
So, here we go. All quotations are generated by ChatGPT.
What distinguishes a highlevel programming language from a lowlevel programming language? Choose one or more correct answer from below.
ChatGPT: The correct answer is 3. Highlevel programming languages allow (some) concepts to be expressed more directly, whereas lowlevel programming languages require (convoluted, errorprone) encodings to express the same concepts.
Explanation: Highlevel programming languages are designed to provide a more abstract view of programming, allowing the programmer to express concepts and algorithms more easily and directly. They typically use more humanreadable syntax and provide builtin abstractions for common programming tasks, such as memory management, string manipulation, and I/O operations. In contrast, lowlevel programming languages require the programmer to work more directly with the underlying hardware, requiring more detailed knowledge of the system architecture and often relying on manual memory management and other lowlevel operations. While lowlevel programming languages may offer greater control over the hardware, they can be more errorprone and harder to use effectively than highlevel languages.
Give a short motivation for each choice of the question above, why you made your selection (that is, why it is correct if you selected it, or why it is incorrect if you did not select it). Keep your motivation short: write at most two sentences per choice.
ChatGPT: Sure, here are the motivations for each choice:
1. Incorrect: The number of programmers using a language is not a factor in distinguishing between highlevel and lowlevel programming languages.
2. Incorrect: While some highlevel programming languages may be compiled into lowlevel machine code, this is not always the case, and lowlevel languages can also be compiled.
3. Correct: Highlevel programming languages provide builtin abstractions and a more abstract view of programming, making it easier to express concepts and algorithms more directly. In contrast, lowlevel languages require more direct manipulation of the hardware, making it necessary to use convoluted encodings to express the same concepts.
4. Incorrect: The efficiency of computations is not directly related to whether a language is highlevel or lowlevel. Both types of languages can be used to write efficient code, depending on the specific task and the skill of the programmer.
What statements about pure and impure expressions are true? Choose one or more correct answer from below.
ChatGPT: The correct statement is 1. There exists an expression that is considered impure if you look at its parts, but which has no effect on the program state after it is fully evaluated.
Explanation: Pure expressions are those that have no side effects and always return the same result given the same input. Impure expressions, on the other hand, have side effects or may return different results for the same input. However, it is possible for an expression to be impure by definition but have no effect on the program state when evaluated. For example, an impure expression might generate a warning or log a message, but if the program does not use or check for these outputs, the expression is effectively pure.
Statement 2 is incorrect because the time or space required for evaluation is not a factor in determining whether an expression is pure or impure.
Statement 3 is incorrect because impure expressions by definition have side effects, so they do affect the program state when evaluated.
Statement 4 is incorrect because assignment statements with impure expressions on the righthand side cannot always be eliminated without changing the behavior of the program. For example, if the impure expression generates a warning or logs a message, eliminating the assignment would result in the loss of this information.
Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.
(ChatGPT already did that.)
What statements about exceptions are true? Choose one or more correct answer from below.
ChatGPT: The correct statements are 3 and 4.
Explanation:
1. Incorrect: Numerical errors such as division by zero can be handled by throwing an exception, rather than causing a program crash.
2. Incorrect: It is possible to rethrow an exception that was caught before using the "throw" statement.
3. Correct: When an exception is thrown, the call stack is unwound until an appropriate exception handler is found. This means that the function that threw the exception and any functions that it called will be exited, but this does not necessarily mean that the whole call stack will be unwound.
4. Correct: If an exception is not caught by any handler, it will cause the program to terminate with an error message. This can lead to a program crash if the error is not handled gracefully.
Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.
(ChatGPT already did that.)
What statements about objectoriented programming are true? Choose one or more correct answer from below.
ChatGPT: The correct statements are 1, 2, and 4.
Explanation:
1. Correct: Dynamic dispatch is a key feature of objectoriented programming, where the method body that is actually invoked depends on the runtime type of the object that the method is invoked on.
2. Correct: Class inheritance can lead to violations of Liskov's substitution principle if the subclass behavior is not compatible with the superclass behavior. For example, if a subclass violates the preconditions or postconditions of a superclass method, code that works on the superclass object may not work correctly on the subclass object.
3. Incorrect: While every object belongs to a class, not every class has an object as an instance. A class can be abstract or simply serve as a blueprint for creating objects.
4. Correct: Objects in objectoriented programming are typically allocated on a heap, and in garbagecollected languages, unreachable objects may be removed by the garbage collector while the program is running. This allows the programmer to focus on the program logic and not worry about memory management.
Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.
(ChatGPT already did that.)
Given the following contextfree grammar in BackusNaur Form (BNF):
<expr> ::= (<expr>)  <expr> + <expr>  <expr> * <expr>  <num>  <var>
<num> ::= <digit>  <digit><num>
<digit> ::= 0  1  2  3  4  5  6  7  8  9
<var> ::= <letter>  <letter><var>
<letter> ::= a  b  c  ...
<stmt> ::= {<stmt>}  <var> := <expr>  if <expr> = 0 then <stmt> else <stmt>  <stmt>; <stmt>
The starting symbol is <stmt>.
Which of the following strings are in the language defined by the grammar above? Choose one or more correct answer from below.
ChatGPT: The correct strings are 1, 2, and 4.
Explanation:
1. Correct: The string abc := a+b; can be generated by the rule <var> := <expr>. The variable "abc" is a sequence of letters, which is generated by the rule <letter><var>, and the expression "a+b" is generated by the rule <expr> + <expr> with <expr> being the variable "a" and the variable "b", which are generated by the rule <var>.
2. Correct: The string (((c))) can be generated by the rule <expr> with <expr> being (<expr>) and <expr> being (<expr>) and <var> being "c".
3. Incorrect: The string a := b + a; b := a  b cannot be generated by the given grammar because it is not a valid <stmt>. The grammar allows only one assignment statement per <stmt>.
4. Correct: The string if (a + b) = 0 then a := b else b := a can be generated by the rule if <expr> = 0 then <stmt> else <stmt>, with <expr> being (a+b) generated by the rule <expr> + <expr>, and <stmt> being the assignment statements "a := b" and "b := a" generated by the rule <var> := <expr>.
Motivate, for each string in the previous question above, why you made your selection (i.e. why the string is in the language if you selected it, or why it is not if you did not select it).
(ChatGPT already did that.)
Is the grammar given above ambiguous? Give a reasoned answer (i.e. not only "yes" or "no", but also why and how you found out).
ChatGPT: Yes, the grammar given above is ambiguous.
Reasoning:
Ambiguity in a contextfree grammar means that there exists at least one string in the language that can be generated by the grammar in more than one way. In the given grammar, the expression <expr> + <expr> can be associated with either left or right recursion, which leads to different interpretations of the same input. For example, the input string "1 + 2 * 3" can be parsed as either "(1 + 2) * 3" or "1 + (2 * 3)", which have different values. This indicates that the grammar is ambiguous.
One way to demonstrate the ambiguity is by constructing a parse tree for an ambiguous string. For instance, consider the string "1 + 2 * 3". It can be parsed in two different ways, resulting in two different parse trees:
+ *
/ \ / \
1 * + 3
/ \ / \
2 3 1 2
The left tree corresponds to the interpretation "(1 + 2) * 3", while the right tree corresponds to the interpretation "1 + (2 * 3)". Since the grammar allows both of these parse trees, it is ambiguous.
Given the following contextfree grammar of the (untyped) lambda calculus:
<expr> ::= <var>  λ<var><expr>  <expr><expr>
We use only single letter variables. Use the standard way of resolving ambiguity: application associates to the left, abstraction precedes application (that is, abstraction binds more strongly than application, or, in other words, abstraction has a higher precedence than application). Expressions can use parentheses for disambiguation (not shown in the grammar).
Draw the abstract syntax tree of the following lambda expressions (in black). Circle the free variable occurrences (in red). Show the binding of variables (in green).
ChatGPT: 1.
xyz


 
x y
/ \
x z
Variable occurrences: x, y, z. All three are free.
Binding of variables: none.
1.
λxxx


 
λ x

x
Variable occurrences: none.
Binding of variables: x is bound by the abstraction λxxx.
1.
λyx


 
x 
 
y 
 
x y

y
Variable occurrences: y appears once as a free variable.
Binding of variables: x is bound by the first abstraction λx(yx), and y is bound by the second abstraction λyx.
Explain the meaning (what kind of statement) and execution (what happens when statement is executed) of each statement of the following program with dynamic memory management:
int *x = new(5);
int *z = null;
{
int *y = new(6);
z := y;
delete z;
*x := *y;
}
ChatGPT: This is a program written in C++ with dynamic memory management. Here's what each statement does:
1.int *x = new(5);
: This statement dynamically allocates memory for an integer array of size 5 and assigns the address of the first element to the variablex
.
2.int *z = null;
: This statement initializes a pointer variablez
to null.
3.{
: This is the beginning of a block statement.
4.int *y = new(6);
: This statement dynamically allocates memory for an integer array of size 6 and assigns the address of the first element to the variabley
.
5.z := y;
: This statement assigns the value ofy
toz
, soz
now points to the same memory location asy
.
6.delete z;
: This statement deallocates the memory thatz
points to, which is the memory thaty
also points to. This means that the memory for the integer array of size 6 that was allocated on the heap is now freed.
7.*x := *y;
: This statement copies the value of the first element of the integer array thaty
points to, which is also the value of the first element of the integer array thatx
points to. This means that the value ofx
is now equal to the value ofy
.
The execution of this program involves allocating and deallocating memory on the heap and copying values between memory locations. When the program runs, it first allocates memory for the integer array of size 5 and assigns its address tox
. Then it initializesz
to null. Inside the block statement, it allocates memory for the integer array of size 6 and assigns its address toy
. It then assigns the value ofy
toz
, soz
points to the same memory location asy
. It then deallocates the memory thatz
points to, which also frees the memory thaty
points to. Finally, it copies the value of the first element of the integer array thaty
points to, which is also the value of the first element of the integer array thatx
points to. After the block statement, the memory for the integer array of size 5 is still allocated and assigned tox
, while the memory for the integer array of size 6 that was allocated inside the block statement has been deallocated.
Consider the following two programs (on the left and right) that are executed concurrently (x and y are shared integer variables):
x := 5; y := 3;
y := 2; x := 2;
if (x < 5) { x := x  y;
y := 4 y := y + 6;
} print x
x := x + y;
Give five different executions of the program, i.e. five executions that each prints a different value. The result (of the five executions) alone is not sufficient. Please also describe, for each execution, what happened/how you obtained the result.
ChatGPT: Here are five possible executions:
Execution 1:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 1: x := x + y;
Thread 2: x := x  y; // x = 2  3 = 1
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: print x; // x = 9
Execution 2:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 2: x := x  y; // x = 2  9 = 7
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 1: print x; // x = 9
Execution 3:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 2: x := x  y; // x = 2  9 = 7
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: print x; // x = 7
Execution 4:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: x := x  y; // x = 2  9 = 7
Thread 1: print x; // x = 9
Execution 5:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: print x; // x = 7
In general, the value printed depends on the order in which the threads execute their statements and whether the if statement is executed or skipped. In each execution, we simulate the interleaving
(Note that I have removed some questions which are hard to give as input to ChatGPT textually.)
Decide for yourself what you think of the answers. I am impressed: the answers generated by ChatGPT seem closely related to the question, and make sense grammatically. However, when looking more closely into the answers, it is revealed that the chat bot is not able to give reasonably correct answers to all questions.
]]>