How can you explain important concepts from program correctness in a simple and intuitive manner? In this blog post, we shall have a look at some puzzles and analyze them from the perspective of program correctness. This way we can nicely explain and demonstrate the usefulness of two important concepts, namely invariants and inconsistent specifications.
The puzzles we study here come from the book Algorithmic Puzzles [1] by Anany and Maria Levitin, published by Oxford University Press in 2011. This book presents 150 puzzles that are good candidates for applying analytical and logical thinking skills (the puzzles can also be used as challenging interview questions). We make a small selection of the puzzles, and we will see them answered from the perspective of program correctness. In program correctness, we consider a program to be correct with respect to a given program specification. A program specification is a specific formulation of a requirement. For example, a specification of what the output of a program must be given some input. More specifically (no pun intended), we can rephrase the puzzles in such way that a puzzle can seen as a program specification, and proving that there exists a program that is correct with respect to that specification would then solve the puzzle in question. Or, alternatively, we show that there is no solution to the puzzle, by arguing there cannot be a correct program in the first place.
First we shortly revisit preliminaries (Section 2). This article does assume the reader is already somewhat familiar with the basics of programming and program correctness, but we nevertheless quickly revisit the basic concepts. For a thorough introduction to program correctness, one could take a look at one of the following books (in order of appearance):
Then we shed light on the concept of an invariant by discussing the 5th puzzle of the book, ‘Row and Column Exchanges’ (Section 3). We also look at why declarative specifications are useful by discussing the 12th puzzle of the book, ‘Questionable Tiling’ (Section 4). But we also discuss more generally the importance of invariants and formulating consistent specifications (Section 5).
We shall restrict our attention to a simple imperative programming language: \[\begin{gathered} S \Coloneqq x \coloneqq a \mid S_1;S_2 \mid \mathbf{if}\ b\ \mathbf{then}\ S_1\ \mathbf{else}\ S_2\ \mathbf{fi} \mid \mathbf{while}\ b\ \mathbf{do}\ S\ \mathbf{od}\end{gathered}\] where we use not only \(x\) as variable but also \(y,z,\ldots\) (possibly with subscripts), where the terms \(a\) of the language are the usual arithmetical expressions: \[\begin{gathered} a \Coloneqq 0 \mid 1 \mid x \mid a \mid (a_1 + a_2) \mid (a_1 \times a_2)\end{gathered}\] and where the terms \(b\) of the language are the Boolean expressions: \[\begin{gathered} b \Coloneqq (a_1 = a_2) \mid (a_1 < a_2) \mid (b_1 \land b_2) \mid (b_1 \lor b_2) \mid \lnot b\end{gathered}\] We also have the usual abbreviations, such as \((a_1 \leq a_2)\), that abbreviate more complex expressions, such as \((a_1 < a_2) \lor (a_1 = a_2)\), respectively. The numerals \(2,3,4,\ldots\) are also abbreviations of complex expressions \((1+1), (1+2), (1+3),\ldots\)
We also have firstorder formulas, captured by the following syntax: \[\begin{gathered} \phi,\psi \Coloneqq b \mid (\phi \to \psi) \mid (\forall x)\phi\end{gathered}\] Other logical connectives, such as \((\phi \land \psi)\) and \(\lnot\phi\), can seen as abbreviations. Firstorder logic involves firstorder universal quantification \((\forall x)\phi\), and we have the dual of firstorder existential quantification \((\exists x)\phi\) as abbreviation of \(\lnot(\forall x)\lnot\phi\). Quantification only ranges over individuals, so in our case integers.
Now let us consider semantics. Let \(\sigma\) be a state (an assignment of variables to integer values). We have the usual semantics for arithmetical expressions \(a\) and Boolean expressions \(b\): \([\![a ]\!]_\sigma\) denotes an integer value and \([\![b]\!]_\sigma\) denotes a Boolean value. Note that an expression depends only on finitely many variables, and we only deal with pure expressions in our simple language. Each statement \(S\) of our programming language denotes a transition relation of states: \[[\![S]\!]\subseteq \Sigma\times\Sigma\] where \(\Sigma\) is the set of states (with typical element \(\sigma\)), and \(\Sigma\times\Sigma\) is the set of pairs of states. A statement denotes a binary relation between initial and final states. Each formula \(\phi\) denotes a set of states: \[[\![\phi]\!]\subseteq \Sigma\] in the sense that in each state \(\sigma\in[\![\phi]\!]\) the formula \(\phi\) is true, also written \(\sigma\models\phi\).
In program correctness we combine two languages: a programming language and a specification language. The programming language is already given above. As specification language we take the above firstorder language. Note that the variables of formulas in the assertion language are the same variables we use in the programming language. We can now form the Hoare triples: \[\{\phi\}\ S\ \{\psi\}\] where \(\phi\) is called the precondition and \(\psi\) is called the postcondition. A Hoare triple is correct when the statement \(S\) satisfies the input/output specification given by the precondition \(\phi\) and the postcondition \(\psi\), and a Hoare triple is incorrect otherwise. Note that the (global) variables of \(S\) and the (free) variables of the formulas \(\phi\) and \(\psi\) are bound to each other. Formally, we define \[\models \{\phi\}\ S\ \{\psi\} \text{ if and only if } [\![S]\!]([\![\phi]\!]) \subseteq [\![\psi]\!]\] where \(R(X)\) is the leftrestriction of the binary relation \(R\) by the set \(X\), that is, \(R(X) = \{y \mid xRy\text{ for some }x\in X\}\). Unpacking this formal definition gives us \[\models \{\phi\}\ S\ \{\psi\} \text{ if and only if } \sigma\in[\![\phi]\!]\text{ and } (\sigma,\tau)\in[\![S]\!]\text{ implies } \tau\in[\![\psi]\!].\]
Incorrectness means that \(S\) has a bug. Suppose we start in some initial state \(\sigma\) which satisfies the precondition \(\phi\), and we execute \(S\) from that state, and that execution results in some final state \(\tau\). If the final state \(\tau\) does not satisfy \(\psi\), then we have found a bug! Formally, \[\not\models \{\phi\}\ S\ \{\psi\} \text{ if and only if } \sigma\models\phi \text{ and }(\sigma,\tau)\in[\![S]\!]\text{ and }\tau\not\models\psi\text{ for some }\sigma,\tau.\]
Hoare logic is a formal system in which Hoare triples can be derived, in which case one writes \(\vdash \{\phi\}\ S\ \{\psi\}\). Hoare logic is sound and (relatively) complete, meaning that we have \[\vdash \{\phi\}\ S\ \{\psi\}\text{ if and only if }\models \{\phi\}\ S\ \{\psi\}\] under some reasonable assumptions.^{1} See one of the books mentioned in in the introduction for a presentation of Hoare logic, or Wikipedia.
A quick example is the following Hoare triple. Is it correct or not? \[\{y = 0\}\ x \coloneqq 1; \mathbf{while}\ x \leq z\ \mathbf{do}\ y \coloneqq y + x; x \coloneqq x + 1\ \mathbf{od}\ \{2\times y = z\times(z+1)\}\] To verify the loop, we need to come up with a socalled loop invariant: a condition that holds at four control points (1) before entering the loop, (2) before the loop body begins, (3) after the loop body ends, and (4) after the loop is exited. Finding loop invariants is difficult, and often it takes multiple tries until one finds a suitable invariant. In the above example, one can take: \[1 \leq x \leq z + 1 \land 2\times y = (x  1)\times x\] where \((x  1)\) abbreviates \((x + 1)\) and the chain of inequalities is conjunctive.
In this section we discuss a puzzle in which invariants play a prominent role. The 5th puzzle of the book Algorithmic Puzzles is ‘Row and Column Exchanges’:
Can you transform the left table into the right table of Figure 1 by exchanging its rows and columns?
(It is recommended that the reader first tries out solving this puzzle herself!)
To get a sense of what the puzzle asks for, let us perform the operations of swapping rows and columns in a table. An example of a sequence of successive applications of these operations is shown in Figure 2.
This figure shows:
The first table shows the initial table of Figure 1, our starting point in this puzzle.
After the first step, we have exchanged the first and last row. So we swapped the values \(1,2,3,4\) and \(13,14,15,16\).
After the second step, we have also exchanged the first and last column. So we swapped the values \(13,5,9,1\) and \(16,8,12,4\).
Notice that we have now obtained a table, in which the last row coincides with the values of the final table we wish to obtain (the last row is ‘correct’ with respect to the desired final table). To get closer to the final table, we can continue the series of operations as in Figure 3, where we perform two additional steps: we swap the first and third row, and we swap the second and third row.
In the resulting table, we have colored the cells that have values in the right place when comparing it to the final table in Figure 1. This particular example shows that we are not there yet. Click here for a Rust implementation of this example.
Just giving this single example, where we have not solved the puzzle (since the final table is not ‘correct’), is not a solution the puzzle! But one may wonder, whether there exists a solution at all. If there exists a solution, then we have not yet found it. But, if there is no solution to be found, then just showing this single counterexample is not sufficient proof.
Imagine that these tables are representations of state, where the state is an assignment of integers to variables (each cell in the table is modeled by its own variable, sixteen in total). There are two primitive operations that work on this state:
to swap two columns \(C(j,j')\), and
to swap two rows \(R(i,i')\).
The puzzle can be rephrased by asking whether we can come up with a program that is composed out of these two primitive operations. Instead of our simple programming language given above, where the only primitive operation is the assignment \(x := a\), we instead consider the programming language with only these two primitive operations. In this way we realize encapsulation, in the sense that the program may not directly modify the state by means of an assignment, only indirectly through the exposed operations.
This may remind the reader of objectoriented programming. Each table could be seen as an instance of a class of objects, which has an encapsulated internal state. The class of objects exposes a number of operations, viz. it has a welldefined interface. We ask ourselves now: does there exists a client, which can only work with the interface and not directly modify the internal state, that solves our puzzle?
What does it mean to solve the puzzle? We can formulate the Hoare triple \[\{\!\!\!\bigwedge_{i\in[1,4]}\bigwedge_{j\in[1,4]}\!\!\!x_{i,j} = (i  1) \times 4 + j\}\ S\ \{x_{1,1} = 12 \land \ldots \land x_{4,4} = 1\}\] where \(x_{1,1}\) until \(x_{4,4}\) are the sixteen variables corresponding to the cells of the table.^{2} Note that in the postcondition we simply require the variables to have the proper values, as indicated in Figure 1. If we can find a program \(S\) that is composed of only these primitive operations, and prove it correct, we have solved the puzzle!
To understand the meaning of the primitive operations, we give a set of Hoare triples that we take as axioms (technically, we give an axiom scheme). This approach is also known as the ‘axiomatic approach’, where we abstract from the exact semantics of the primitive operations. Here we go (assuming metavariables \(j\in[1,4]\) and \(j'\in[1,4]\)): \[\{\!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j} = y_r \land \!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j'} = z_r\}\ C(j,j')\ \{\!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j} = z_r \land \!\!\!\bigwedge_{i\in[1,4]}\!\!\!x_{i,j'} = y_r\}\] The ‘freeze variables’ \(y_1,\ldots,y_4\) capture the old values at column \(j\), and \(z_1,\ldots,z_4\) capture old values at column \(j'\). In the postcondition, we use the (unchanged) freeze variables to refer to the old values at the beginning of the swapping operation. This argument crucially relies on the fact that the operation \(C(j,j')\) only changes the variables in the set \(\{x_{i,j},x_{i,j'}\mid i\in[1,4]\}\). By Hoare’s invariance rule, we know that any property about the other variables thus remains invariant. A similar axiom scheme can be given for swapping the rows.
We could think of an object invariant: a property that holds of the internal state of the object, that must be preserved by every operation that is performed by any client. Note that object invariants may be temporarily broken in the implementation of an operation, as long as the object invariant is restored before the implementation terminates.
The beauty of invariants is that they are a powerful tool for answering these kinds of puzzle questions. When we are able to find some invariant, that is true for the initial table but false for the final table, then we must know: the final table cannot be obtained by means of these operations only, since all the operations preserve the object invariant!
An example of an object invariant in this case would be the property: the table has the values \(\{1,\ldots,16\}\). In other words, every value in the table is in \(\{1,\ldots,16\}\) and every value of \(\{1,\ldots,16\}\) is somewhere in the table. Let’s formalize it (Equation 1): \[\{x_{i,j} \mid 1\leq i,j \leq 4\} = \{1,\ldots,16\}.\] The set comprehension on the left collects all values in the table in a set. The set expression on the right is the finite set consisting of the integers \(1\) up to and including \(16\). The property now expresses that these two sets are identical, i.e. have precisely the same members. This property holds for the initial state of the object, and it also is preserved by every operation: swapping two rows, or swapping two columns, does not introduce any new values and thus does not invalidate this property. Hence, this property is an object invariant.
The final table of Figure 1 also satisfies the object invariant of Equation 1. So this invariant, while nice to know, is not useful in answering the puzzle question. We can only prove that there is no solution to the puzzle when we find an invariant, that holds of the initial state and is preserved by the operations, but does not hold in the final state.
Just how finding loop invariants (to show the correctness of a program) is a difficult problem, finding object invariants (to show there can be no correct program) is also a difficult problem. Finding invariants may require several tries. Let us try another invariant. Consider that we not only have a set of values, but in fact we have a set of sets of values: \[\bigl\{\{x_{i,j} \mid 1\leq i\leq 4\}\mid j \leq 4\bigr\} = \bigl\{\{1,2,3,4\},\ldots,\{13,14,15,16\}\bigr\}\] The outer set consists of the sets corresponding to the values one finds at each row. And the inner sets consists of the values present at each row. If we swap two rows, the invariant is preserved because the outer set does not care about the order of its values (sets of integers). If we swap two columns, then the invariant is preserved, because the set of values at each row remain the same when we have swapped two columns.
Now, looking at Figure 1 we see that the initial table satisfies this property. However, if we look at the final table we see that it does not satisfy this property. The final table has as set of sets of integers: \[\bigl\{\{12,10,11,9\},\{16,14,15,13\},\{8,6,7,5\},\{4,3,2,1\}\bigr\}.\] Sure, the first and last row are correct, so we could focus on comparing the sets \[\bigl\{\{16,14,5,13\},\{8,6,7,15\}\bigr\} \text{ and } \bigl\{\{16,14,15,13\},\{8,6,7,5\}\bigr\}\] which cannot be equal because both sets contain values that are not contained in the other set. Hence the final table does not satisfy the invariant, which finally proves that there is no solution! (We shall further discuss this problem in Section 5.)
We have a look at the 12th puzzle of the Algorithmic Puzzles book, ‘Questionable Tiling’ (with a slightly different phrasing):
Is it possible to tile an 8by8 board with dominoes (2by1 tiles) such that no two dominoes lie next to each other in parallel?
(Again, the reader should first try to solve this puzzle herself!)
Before even beginning to solve the problem, we should first try to get an exact understanding of the puzzle by understanding each part of the question:
Suppose we formalize the 8by8 board, again by means of a table. Each cell of the table is again understood to be represented by the variables \(x_{i,j}\) where \(i\) is the row counted from the top and \(j\) is the column counted from the left. But what do the values of these variables mean? We could devise the following encoding:
If a variable \(x_{i,j}\) has value \(0\) it means that the cell is empty.
If a variable \(x_{i,j}\) has some positive value, then that positive value identifies a domino piece.
For example, see Figure 4 where we have a table that encodes an 8by8 board with only four dominoes. Note that in this and the following pictures, only the numbers in the cells are significant and not the colors. Colors are only for visual aid. Further, what is shown in Figure 4 is not a tiling yet, it is a partial tiling and towards becoming a complete tiling.
One fruitful approach would be looking for patterns. A pattern is, figuratively speaking, a small ‘frame’ or ‘scope’ that you locally could observe in the picture. These patterns are ‘timeless’ and observed of the outcome, and thus do not care about the intermediary state one has passed through to obtain the outcome. Finding patterns is a useful ability of a declarative programmer.
One can observe already the following properties:
(Number of dominoes in tiling)
In an 8by8 table, a complete tiling has exactly \(\frac{8\times 8}{2}=32\) numbers
identifying domino pieces.
(Size of single domino)
Every number identifying a domino piece occurs at most twice.
(Dominoes line up)
Given a cell of a table that contains a number identifying a domino. Now
consider its immediate neighborhood (the cells on the top, right,
bottom, left—but not the diagonal cells). We observe that the following
must hold: a cell above, on the left, below, or on the right of the
given cell exists and has the same domino identifying number. The other
neighboring cells must have a different value. See also Figure 5 for a picture, but note that these
patterns only work for interior cells. For cells on the border, the
pattern need not check outside bounds.
Now, consider completing the tiling in Figure 4. What domino do we place on the left of the domino identified by number three (the blue one)? It will form a 2by2 square. We also form a 2by2 square if we would place another domino directly below and in parallel with the domino identified by number four (the purple one). These are undesired according to the puzzle.
We end up with the following property:
(No parallel dominoes)
In each 2by2 square there are not exactly two dominoes. See Figure 6 for the two forbidden
patterns.
We can now formalize the properties, and obtain a program specification.
(Number of dominoes in tiling) \[D = 8\times 8\] where \(D\) is the set of domino identifying numbers that occur somewhere in the table, that is, \(D = \{x_{i,j} \mid 1 \leq i,j\leq 8\}\cap\{n\mid n > 0\}\).
(Size of single domino) \[\{(i,j) \mid k = x_{i,j}\text{ and }1\leq i,j\leq 8\} = 2\text{ for each }k\in D.\]
(Dominoes line up)
For every \(1\leq i,j\leq 8\) there is
some \(\ell\in K(i,j)\) such that \[x_{i,j} = x_\ell \land\mkern34mu\bigwedge_{k\in
K(i,j)\setminus\{\ell\}}\mkern26mu x_{i,j}\neq x_{k}\] where
\(K(i,j)\) is the set of neighboring
coordinates within bounds \[\{(i+1,j),(i,j+1),(i1,j),(i,j1)\} \cap
\{(i',j')\mid 1\leq i',j'\leq 8\}.\]
Technically, we have that \(x_{(i,j)}\) is defined to be equal to \(x_{i,j}\) so we can use the coordinates to refer to a particular subscripted variable.
(No parallel dominoes)
For every \(1 \leq i,j < 8\) we have
\[\{x_{(i,j)},x_{(i+1,j)},x_{(i,j+1)},x_{(i+1,j+1)}\}\cap\{n\mid
n > 0\} \neq 2.\]
These properties can be abbreviated to \(P1,P2,P3,P4\), respectively. Now the puzzle amounts to finding a program \(S\) that changes the variables \(x_{1,1},\ldots,x_{8,8}\) such that we can prove \[\{\mathbf{true}\}\ S\ \{P1\land P2\land P3\land P4\}.\] Consider a program that assigns the cells’ values according to Figure 7. We can now verify whether the program indeed satisfies the specification, by checking whether all properties hold.
\(P1\) holds because there are exactly 32 dominoes in the final state assigned to the variables. \(P2\) holds since every number identifying a domino piece occurs exactly twice. Also \(P3\) holds, and this can easily be seen by the different colors. However, checking \(P4\) shows that the property is violated (see the center).
If we slightly generalize the problem, then we see there is a solution. Suppose the board is infinite, i.e. we have an \(\infty\)by\(\infty\) board, which we start tiling from the topleft corner. Then the following brick laying pattern can be continued indefinitely (see Figure 8):
Start horizontally with the dominoes \(A_1,A_2,\ldots\) and lay the next on the right of the previous one until the entire first row is covered with dominoes.
Continue vertically with the dominoes \(B_1,B_2,\ldots\) and lay the next below the previous one until the entire first column is also covered with dominoes.
We are now in the same situation as before: we want to fill an \(\infty\)by\(\infty\) board, so we repeat the strategy of first laying \(C_1,C_2,\ldots\) horizontally and then laying \(D_1,D_2,\ldots\) vertically.
Such an infinite board would then satisfy these properties:
The number of dominoes on the infinite board are also infinite.
If we make sure that each domino is represented by a different number, then each such number occurs only twice. For example, we could take the numbering scheme where for each domino that lies on the coordinates \((i,j)\) and \((i',j')\) we take as identifier \(\min(2^i3^j,2^{i'}3^{j'})\).
The dominoes are placed correctly, as can be observed from the coloring.
There are no parallel dominoes, since each 2by2 square has exactly three dominoes.
Note that we avoided the occurrence of four dominoes within a 2by2 square, as shown in Figure 9.
More generally, we never have any of the patterns in Figure 10 occurring. These patterns are called the topright corner and bottomleft corner. Note that these patterns do occur in Figure 9, so already from knowing the absence of these two corners we also know that there can be no four different dominoes within a 2by2 square.
Now suppose we would cut off the board of Figure 8 so to obtain an 8by8 board. We then see problems occurring at the boundaries, with dominoes sticking out. Here are two instances:
On the first column, we see that the domino \(B_4\) falls out of bounds. Hence the only way to lay down that domino is by turning it 90 degrees.
On the second row, we see that the domino \(C_4\) also falls out of bounds. Also here we would need to lay down that domino turned by 90 degrees.
What we thus see, is that whenever the board is finite, it must have one of the corners of Figure 10. We shall now argue that it is impossible to satisfy \(P4\), the property that no dominoes are parallel, whenever we have the (necessary) topleft corner and also either the bottomleft or the topright corner on the board. We make a number of simplifying assumptions, but these do not hurt our demonstration (that is to say, these assumptions are without loss of generality):
we assume we work on an arbitrary \(n\)by\(n\) board where \(n\) is even,
we assume we start with the same type of topleft corner and topright corner where the horizontal domino is on top,
we assume that both corners occur on the same height.
Now consider the situation of Figure 11. Consider that, if we were to satisfy all properties \(P1\) until \(P4\), it is impossible to place a domino vertically next to \(B_1\) nor is it possible to place a domino vertically next to \(C_n\). If we were to place a domino horizontally at the low end at \(B_1\) (thus forming a bottomleft corner), then we need to place another domino on top that violates \(P4\). Hence the only dominoes that are possible are depicted in Figure 12. We end up with the other type of corner (where the vertical domino is on the side) and we can again analyze where to place the next domino in the corner next to \(B_1\) and below \(C_1\), and next to \(C_n\) and below \(C_{n1}\). After analyzing the possibilities and ruling out those that violate \(P4\) we end up with the situation depicted in Figure 13.
After continuing this way, we see that we construct two ‘lines’, one originating from each corner. It is necessarily the case that these two lines will intersect!
In Figure 14 we see the two lines coming diagonally out of the topleft corner and the topright corner intersect. The way this plays out is as follows: we start with the corner consisting of dominoes \(\{1,2\}\) (the topleft corner) and dominoes \(\{A,B\}\) (the topright corner). Then we necessarily place domino \(3\), but this takes the same place as we would take when we would place a domino in the other corner. We now have two corners, but they share a domino, namely the dominoes \(\{2,3\}\) (the topleft corner) and the dominoes \(\{3,B\}\) (the topright corner). We then place \(4\) and \(D\) in the only way possible inside these corners, but we see that this gives us a parallel pair of dominoes in a 2by2 square.
Summarizing, the argument goes as follows. If there are two corners on the board that induce a ‘diagonal line’ that intersect, this must give rise to a pair of parallel dominoes. Hence we can not have both a topleft corner and a topright corner on the board. However, for every \(n\)by\(n\) board tiling it is necessary to have both a topleft corner and a topright corner. Hence we cannot have a tiling of the \(8\)by\(8\) that also has no parallel dominoes.
We have now seen two example puzzles, which we phrased by means of asking whether we can come up with a program that satisfies certain requirements. In the first example (Section 3) we have seen that the program’s requirements can (1) be stated formally, and (2) a final state was imaginable that satisfies the end goal, but (3) there was no correct program that reaches the final state. In the second example (Section 4) we have seen that the requirements themselves can (1) be stated formally, but already that (2) a final state was not imaginable that satisfies the end goal. If there is no final state that satisfies the requirements, it is impossible to write a correct program. This must be a valid conclusion, since each program only moves from state to state, and there does not even exist a state that satisfied the requirements.
All this serves to show is that program correctness is a difficult subject. It shows that sometimes it is ‘easy to ask’ but ‘difficult to deliver’. Extensive analysis of a problem is required to obtain (1) a formal description of the problem, and (2) proof that the requirements are consistent. Even before one starts writing a program, one already has to face an undecidable problem: namely, to check that the requirements are consistent! And we have seen a concrete example that this is not always the case—even when the problem looks simple. If we then have requirements that are satisfiable, we then face the second difficult problem: does there exists a correct program? We have seen that, no, this is not obvious either. To show that there does not exists a correct program, we need to formulate an invariant that any program preserves but which the final state violates. On the other hand, whenever there exists a nontrivial program (i.e. involving a loop) we also face a difficult problem: to prove it correct requires us to come up with an invariant as well.
This finally gives us two slogans:
To show a program is correct, requires one to find an invariant.
To show there is no correct program, also requires one to find an invariant.
and
Correctness is impossible to attain if the requirements are inconsistent.
Access to an oracle that provides the valid formulas in arithmetic (this is an undecidable problem), and the expressivity of loop invariants.↩︎
Technically, the ‘quantifiers’ in the precondition are not firstorder quantifiers but instead abbreviations where \(i\) and \(j\) are metavariables that range over finitely many constant values: thus the formula is a big conjunction with sixteen clauses where each clause specifies the value of precisely one variable.↩︎
In 1937, shortly after the mathematician Lothar Collatz obtained his doctorate, he wrote down a problem in his notebook that later became known as Collatz’ problem or the \((3x+1)\)problem. The problem is remarkable since it is easy to state, but for more than eighty years no solution had been found.
Before the problem can be stated, we first need two ingredients: we define a function, and we recall what is repeated function application. The function \(f\) is defined on the positive natural numbers (\(1\), \(2\), \(3\), et cetera) with the following specification: \[\begin{gathered} f(x) = x \div 2 \text{ if }x\text{ is even},\\ f(x) = 3 \times x + 1 \text{ if }x\text{ is odd}.\end{gathered}\] Now take an arbitrary positive natural number \(n\). We can repeatedly apply the function \(f\) starting with \(n\), i.e. \(f(n)\), \(f(f(n))\), \(f(f(f(n)))\), et cetera. We write \(f^i(n)\) to mean that the function \(f\) is applied \(i\) times starting with \(n\), so that we have \(f^1(n)=f(n)\) and \(f^{i+1}(n)=f(f^i(n))\). The superscript notation, an operation on a function, should not be confused with exponentiation, an operation on a number. With this in mind, we can state the problem: for every positive natural number \(n\), is there an \(i\) such that \(f^i(n) = 1\)?
We can first try out a number of examples, to gain some intuition about the problem. If we take \(n = 1\), can we find an \(i\)? Clearly, \(f(1)\) is \(4\), since \(1\) is odd and so the first clause of the definition of \(f\) applies, and \(3\times 1 + 1\) is \(4\). Then we evaluate \(f(f(1))\), which is \(f(4)\): since \(4\) is even the second clause applies so we have \(f(4) = 2\). Finally, we evaluate \(f(f(f(1)))\), which is \(f(2)\): since \(2\) is even we have \(f(2)=1\). So \(f^3(1)=1\), hence we can take \(i\) to be three. For \(n = 1\) the problem can be solved.
The process we tried is called iteration. We find \(i\) by starting with the smallest value and try out successively larger values for \(i\) until we reach our desired destination. This process works efficiently, since the computations we did in the past to find out whether \(i\) was a solution can be reused to find out whether \(i+1\) is a solution. In the case of Collatz’ problem, we see that the outcome of the previous try, i.e. \(f^i(n)\), is what we feed back as input in the next try, i.e. \(f(f^i(n))\) computes \(f^{i+1}(n)\).
Such iterative processes can be written down neatly. Say we start with \(n = 5\). How often do we need to apply the function \(f\) until we reach the destination \(1\)? We write the following sequence, where each number is separated by an arrow: \[5 \longrightarrow 16 \longrightarrow 8 \longrightarrow 4 \longrightarrow 2 \longrightarrow 1.\] the computation sequence starts with \(5\) and we get \(16\) after applying \(f\) once (\(3\times 5 + 1 = 16\)), we get \(8\) after applying \(f\) again (so twice from our starting point), and so on, until we reach our destination after applying \(f\) five times from where we started. In other words, we have \(f^5(5) = 1\). So for \(n = 5\), we know there exists an \(i\), namely \(5\), such that \(f^i(n) = 1\). But meanwhile we also discovered for \(n = 16\), \(n = 8\), \(n = 4\) and \(n = 2\) a solution, since these were the intermediary numbers that turned up in the computation starting in \(5\).
As another example, take \(n = 3\). We then have \(f(3) = 3\times 3 + 1 = 10\), and \(f(10) = 5\), and from that point onward we already know what happens. Starting with the number \(5\), we see that we can extend the computation sequence towards the right (each time computing the function \(f\)), but we can also extend the computation sequence towards the left: \[\ldots \longrightarrow 3 \longrightarrow 10 \longrightarrow 5 \longrightarrow \ldots\] What would be the number before \(3\)? And how would the computation sequence starting from \(7\) look like? The reader may try answering these questions, to gain some intuition about Collatz’ problem.
For some numbers \(n\), the solution is not immediately obvious. The reader may try out \(n = 27\) (but, beware, the computation takes more than 100 steps). In fact, for large enough \(n\), the intermediary numbers that the computation goes through can be used to generate pseudorandom numbers that passes standard tests for randomness [12]. This fact may give us the impression that the numbers involved in the computation do not give much insight into solving the problem.
Now, the conjecture states that there is a solution for every \(n\). This is what is known as the ‘Collatz conjecture’. Although the problem is quite old, more recently a new interest for the problem emerged—as witnessed by the many published articles, including scientific articles (e.g. [2–4]) and articles in the popular press (e.g. The Simple Math Problem We Still Can’t Solve in QuantaMagazine). Also on social media such as MathOverflow, unanswered questions are raised, such as “How to know when the Collatz conjecture has been proved?”
In 2013, the late John H. Conway wrote about the problem in The American Mathematical Monthly, in an article entitled On unsettleable arithmetical problems [8]. In that article, Conway also speaks of wellknown results such as Turing’s unsolvable halting problem or Gödel’s incompleteness theorem. The halting problem is the question whether an algorithm halts (which, as we shall see, has Collatz’ problem as an instance) and Turing showed that it is in general unsolvable. The incompleteness theorem states that there are true statements in a formal system called Peano arithmetic that cannot be proven from Peano’s axioms. The two results are closely related: there is a proof of the incompleteness theorem by reduction to the halting problem [11].
Conway argues that it is very unlikely that the Collatz problem is settleable—a technical term he introduces for true assertions, as phrased in a set theoretical language, that can also be proven. A problem is unsettleable if it is true, but cannot be proven. There surely are unsettleable (or, unsettling?) assertions: the axioms of set theory are insufficient for proving all true assertions, by Gödel’s incompleteness theorem. But I find Conway’s argument, that Collatz conjecture is unlikely settleable, too difficult to grasp—it is probably because I lack the background knowledge and intuition behind his argument. However, Conway is not sure, and still leaves open the possibility for the conjecture to be proven:
I don’t want readers to take these words on trust but rather to encourage those who don’t find them convincing to try even harder to prove the Collatz Conjecture! [8]
A quick search on Google Scholar shows articles, that claim to have proven the Collatz conjecture. These articles are not published, but are available on preprint servers or institutional repositories on the Web. How can we evaluate such articles? Are they really proofs? Is it worth our time to review them?
By accident, I stumbled upon the preprint paper Collatz conjecture becomes theorem by Mirkowska and Salwicki [10]. Grażyna Mirkowska is a professor emeritus of Warsaw University, and she is an expert on mathematical logic, program semantics, and formal verification. Andrzej Salwicki is also professor emeritus, founder of the journal Fundamenta Informaticae, and he is an expert on the mathematical foundations of computer science. Both worked together on the 1978 book on Algorithmic Logic [9], among other works. Algorithmic logic is closely related to my field of expertise, Hoare’s logic and dynamic logic.
Maybe I can understand what is written in their article? In the final remarks, the authors write:
We know that our presentation is clumsy (we are of age, English is not our native language). [10]
It is not a good idea to reject this paper because of a clumsy presentation: the authors have a track record in the field, and their work must be taken seriously. Maybe I can figure out whether their paper makes sense to me? The adventure thus begins... And in this article, I do not present my own work, but I present what I could distill from what Mirkowska and Salwicki wrote.
First, we revisit the conjecture. We can understand the conjecture in a different way, by studying the program in Figure 1. Looking at the program as given, we have the following primitive operations and tests:
the test \(x\neq 1\) for deciding whether the number \(x\) is not equal to \(1\),
the test \(\mathit{even}(x)\) for deciding whether \(x\) is even,
the primitive operation \(x := x\div 2\) for dividing the number by two—but this operation is only executed in the context where we know that the old value of \(x\) is even and not equal to \(1\),
the primitive operation \(x := 3 \times x + 1\) that multiplies the old value of \(x\) by the constant three and adds the constant one—this operation is only executed in the context where we know that \(x\) is odd and not equal to \(1\).
If the program terminates on every input \(x \geq 1\), then the Collatz conjecture is true. From the terminating execution of the program we can then extract the computation sequence, simply by looking at the values that \(x\) take over time. If, however, the program runs infinitely for some input \(x\geq 1\), then we have a counterexample to the Collatz conjecture.
Let us abbreviate the program in Figure 1 by \(S\). We reformulate the question, whether the program \(S\) terminates or not, as follows:
If \(\{x\geq 1\land \phi\}\mathrel{S}\{\mathbf{false}\}\) is provable for some \(\phi\) in Hoare’s logic for partial correctness such that the precondition is satisfiable, then we know that the Collatz conjecture is false.
If \(\{x\geq 1\}\mathrel{S}\{\mathbf{true}\}\) is provable in Hoare’s logic for total correctness, then we know that the Collatz conjecture is true.
This formulation raises a number of question, as often is the case in Hoare’s logic. What is the language we use in assertions? What is the program theory? And what is the background theory?
What is the logical language we use in assertions? We restrict ourselves to a firstorder language consisting of addition only. This language consists of firstorder formulas with respect to a signature with:
the constant individual symbol \(0\),
the constant individual symbol \(1\),
the binary function symbol \(+\).
We can also introduce abbreviations: for every natural number \(n\) we have the numeral \(\underline{n}\). We have \(\underline{0} = 0\), and for any \(n \geq 0\) we have \(\underline{(n + 1)} = \underline{n} + 1\). In other words, we have the numerals: \[\underline{0} = 0,\ \underline{1} = 1,\ \underline{2} = 1 + 1,\ \underline{3} = (1 + 1) + 1,\ \underline{4} = ((1 + 1) + 1) + 1,\ \ldots\] Note that these numerals are terms constructed from constants and function symbols, hence do not depend on the value of variables: the numerals are all ground. Also, we implicitly used the fact that \(0 + 1 = 1\) since this follows from the background theory we introduce later.
We introduce the abbreviation \(x < y\) to stand for \((\exists z)(z\neq 0\land x + z = y)\) where \(z\) is fresh (so not equal to either \(x\) or \(y\)). \(y > x\) abbreviates \(x < y\), and \(x \geq y\) abbreviates \({y < x} \lor {y = x}\).
In a similar way as the numerals, can also introduce abbreviations for multiplication by a numeral: \(\underline{0}\times x = 0\), \(\underline{(n + 1)}\times x = x + (\underline{n}\times x)\) for \(n\geq 0\). In other words, we have the abbreviations \[\underline{0}\times x = 0,\ \underline{1}\times x = x,\ \underline{2}\times x = x + x,\ \underline{3}\times x = x + x + x,\ \ldots\] where we implicitly assume that \(+\) is associative, and \(x + 0 = x\) (again, we shall introduce the background theory later, from which these properties follow).
What is the program theory? This question amounts to showing how we axiomatize the primitive operations and tests. We introduce a predicate symbol for the test \(\mathit{even}(x)\), where \(x\) is a variable. We can define this predicate symbol in our language as follows: \[\mathit{even}(x) \equiv_\mathrm{def} (\exists y)(x = y + y).\] It is necessary that tests are decidable. Otherwise, if one would execute the program one can not make the case distinction in the \(\mathbf{if}\)statement.
Given that we have a formal understanding of the test, how do we axiomatize the two updates? We introduce the following axiom schemes. \[\{(\exists y)(x = y + y \land \phi[x := y])\}\mathrel{x := x \div 2}\{\phi\}\text{ where }y\text{ is fresh}\] The precondition of the divisionbytwo update states that the original value of \(x\) must be even before executing the operation. The witness of evenness, \(y\), is substituted for \(x\) in the postcondition—and we require the variable \(y\) to be fresh, that is, not already occurring in \(\phi\) and different from the variable \(x\). \[\{\phi[x := (\underline{3}\times x) + 1]\}\mathrel{x := 3\times x + 1}\{\phi\}\] The precondition of the timesthreeplusone update is the weakest precondition given the postcondition \(\phi\). Notice how we are able to express the new value of \(x\) in terms of the old value of \(x\) only by using the constant symbol \(1\) and function symbol \(+\) from our signature (recall that our multiplication by a numeral is an abbreviation, viz. \(x + x + x\)).
Now that we have axiomatized the test and primitive operations, we turn to the last question. What is the background theory? Until now, one may freely interpret what the language means. By means of a background theory we restrict the possible interpretations. From the background theory, we can derive consequences which must hold in every program state.
As background theory we take Presburger arithmetic. Presburger arithmetic is a restriction of Peano arithmetic, as it speaks only of the addition operation on numbers. By Gödel’s incompleteness theorem, the formal theory of Peano arithmetic can be shown to be incomplete, that is, there are valid sentences that are not consequences of the axioms. What is remarkable about Presburger arithmetic, however, is that it’s theory is complete: all valid sentences are also consequences of the axioms.
Furthermore, the axiomatization of Presburger arithmetic is recursive, that is, there exists a procedure to decide what are the axioms of Presburger arithmetic. From this, we also obtain that the formal theory of Presburger arithmetic is decidable: for any sentence, either we can use the standard proof system of firstorder logic and prove that the sentence is valid, or we can effectively find a model that satisfies all axioms but in which the sentence is false.
There are different presentations of the same theory. We shall give one that is short on paper, but it is presented by using the unary function \(s\). The unary function \(s\) is defined by \(s(x) = x + 1\), and we have that \(1 = s(0)\). The other axioms of Presbuger arithmetic are the following:
\((\forall x)(s(x) \neq 0)\),
\((\forall x)(\forall y)(s(x) = s(y) \to x = y)\),
\((\forall x)(x + 0 = x)\),
\((\forall x)(\forall y)(s(y) + x = s(y + x))\),
\(\phi(0) \land (\forall x)(\phi \to \phi(s(x))) \to (\forall x)\phi\).
The last is an axiom scheme for any formula \(\phi\), and where \(\phi(t)\) is the result of replacing the free variable occurrences of \(x\) by the term \(t\). From these axioms, the usual properties of addition follow. For example:
\((\forall x)(\forall y)(x + y = y + x)\),
\((\forall x)(\forall y)(\forall z)(x + (y + z) = (x + y) + z)\),
\((\forall x)(x \neq 0 \to (\exists y)(x = y + 1))\),
\((\forall x)(\forall y)(\forall z)(x + z = y + z \to x = y)\).
et cetera.
Now, we turn to the semantics of the program \(S\), of which the halting problem captures the Collatz conjecture. A program is nothing but a piece of text: a finite string of symbols. One may think of the intended meaning of a program, but nothing prevents two people looking at the same string of symbols and interpret it differently. To illustrate this concept, we shall look at two interpretations of \(S\). The fact that this simple program can be interpreted in different ways, somewhat delighted me when I first read the preprint paper by Mirkowska and Salwicki [10].
The standard interpretation is given by taking the following data structure:
Take the natural numbers \(\mathbb{N} = \{0,1,2,3,\ldots\}\) as domain.
Interpret the symbols in the usual way: \(+\) is addition of natural numbers, and \(<\) is less than.
This standard interpretation satisfies all the axioms of Presburger arithmetic.
Alternatively, one could give a complex interpretation, as follows:
Take the following subset of the complex numbers \(\mathbb{C}\) as domain: \[\{k + w\iota \mid k\in\mathbb{Z} \text{ and } w\in\mathbb{Q}^+ \text{ and if } w=0 \text{ then } k\geq 0\}\] where \(k\) is an integer, \(w\) is a nonnegative rational, and \(\iota\) is \(\sqrt{1}\). (We use the Greek \(\iota\) instead of the Latin \(i\), to avoid confusion with the natural number \(i\) we used earlier in function iteration \(f^i\).) The condition implies that we have no negative real numbers in our domain.
The constant individual symbols \(0\) and \(1\) are interpreted as the complex numbers \(0 + 0\iota\) and \(1 + 0\iota\), respectively.
The binary function symbol \(+\) is interpreted as usual: \[(k + w\iota) + (k' + w'\iota) = (k + k') + (w + w')\iota.\] Note in this equation alone, the symbol \(+\) has four different meanings: the function symbol \(+\) in our language, we have the \(+\) on complex numbers, we have the \(+\) on integers, and we have the \(+\) on nonnegative rationals!
This complex interpretation satisfies all the axioms of Presburger arithmetic.
In the complex interpretation, we have the socalled reachable elements and the unreachable elements. The reachable elements are the complex numbers \(k + w\iota\) in our domain with a zero imaginary part (and so, by the condition, we know that \(k\geq 0\)). The unreachable elements are the complex numbers in our domain with a nonzero imaginary part. Any operation performed on only reachable elements, gives us back a reachable element. However, an operation performed with at least one unreachable element, results in an unreachable element. This follows from the fact that the imaginary part is nonnegative.
In the complex interpretation, the order of elements is defined by \(x < y\), which abbreviates \((\exists z)(z\neq 0\land x + z = y)\). We can understand the order relation on complex numbers \((k + w\iota) < (k' + w'\iota)\) as it were a lexical order on pairs: \(\langle w,k\rangle < \langle w',k'\rangle\). This means that all reachable elements (where \(w=0\)) are below all unreachable elements (where \(w > 0\)). Further, in the complex interpretation we have that an element is even iff the real part is even.
With our complex interpretation, we can actually give an infinite run of the program! The following is a demonstration of a computation sequence: \[\begin{gathered} (8 + \frac{1}{2}\iota) \longrightarrow (4 + \frac{1}{4}\iota) \longrightarrow (2 + \frac{1}{8}\iota) \longrightarrow (1 + \frac{1}{16}\iota) \longrightarrow \qquad\\ (4 + \frac{3}{16}\iota) \longrightarrow (2 + \frac{3}{32}\iota) \longrightarrow (1 + \frac{3}{64}\iota) \longrightarrow (4 + \frac{9}{64}\iota) \longrightarrow \ldots\end{gathered}\] The computation sequence can be mapped back to an execution of the program (this takes a bit of work, but it is not difficult to see that the number generated each step is the value of the variable \(x\) right at the start of each iteration of the \(\mathbf{while}\)loop). Notice that, since we start with an unreachable value (the imaginary part is nonzero), the test of the loop never fails, and so the body of the loop is always taken. Any of the two operations (divisionbytwo or multiplybythreeaddone) on an unreachable value result in an unreachable value again, which can never be equal to the reachable value of \(1\).
However, this infinite run cannot be used to argue that the Collatz conjecture is false, since the conjecture speaks of the positive natural numbers and not the complex numbers we have introduced. So, coming back to our reformulated question: in Hoare’s partial correctness logic, can we prove \[\{x\geq 1\land \phi\}\mathrel{S}\{\mathbf{false}\}\] for some \(\phi\), and show that the precondition is satisfiable? No, we can not prove this based on our example above. The reason why is that the complex interpretation is elementarily equivalent to the standard interpretation. This means that the same firstorder sentences are true in both the standard interpretation as in the complex interpretation. Since our assertion language uses a firstorder language, we cannot express in \(\phi\) that the starting value of \(x\) has to be an unreachable element. Namely, the ability to express that \(x\) is an unreachable element (which would be false in the standard interpretation, but true in the complex interpretation under existential quantification) contradicts that the standard and complex interpretations have the same firstorder theories.
Note, however, that the above correctness formula may still be provable—showing the Collatz conjecture is false. If one gives a proof of \[\{x\geq 1\land \phi\}\mathrel{S}\{\mathbf{false}\}\] in Hoare’s logic for partial correctness, then the proof must be sound for the standard interpretation too. If one then also shows that \(\phi\) is satisfiable in the standard interpretation, then the Collatz conjecture is settled (to be false).
On the other hand, if we look at proving \[\{x\geq 1\}\mathrel{S}\{\mathbf{true}\}\] in Hoare’s logic for total correctness, are we sure that a proof means that the Collatz conjecture is true? In Hoare’s total correctness logic, we employ a proof rule for reasoning about the termination of the \(\mathbf{while}\)loop by giving two ingredients: the invariant (an assertion that holds before and after the loop and also before and after the loop body) and the variant (a term denoting a value that must decrease each iteration, and it must be shown that this term is larger than or equal to zero assuming the invariant).
There may be a problem with nonstandard interpretations. For example, in the complex interpretation, what does the variant denote? It may no longer express a quantity that ensures that the loop body is executed finitely many times. In fact, we already know that every unreachable element is larger than \(0\), and every unreachable element has infinitely many predecessors: so requiring that during the loop body the variant decreases no longer yields an argument that the computation must be finite. Yet, this is not a problem, for we can always ignore nonstandard interpretations. The proof rule is sound: interpreting total correctness with respect to the standard interpretation works fine. In that interpretation, the variant expresses that, before executing the loop, we can predict the maximum number of iterations that will be taken based on the values of program variables in the initial state.
Mirkowska and Salwicki suggest that the Collatz conjecture, formulated as questions of provability in Hoare logic as we did above, is not precise. They then start working on the level of the semantics in the complex interpretation, and restrict attention to those states in which the program variables have reachable values. However, here I disagree. That there exists nonstandard interpretations is not an excuse to not give a proof in Hoare’s logic: although such a proof can also be interpreted in a nonstandard interpretation, there is always the standard interpretation one can look at.
What I thus expected to see was a clear description from which I can extract a proof in Hoare’s total correctness logic. In particular, to perform such an extraction, I need to know the answers to the following questions:
What is the loop invariant of \(S\)?
Is the loop invariant expressible as an assertion?
What is the loop variant of \(S\)?
Is the loop variant expressible?
Unfortunately, the paper does not clearly give answers to these questions. When I read the paper, I was not able to verify that there actually is a proof. The paper presents many interesting ideas, though, but I feel that I am doing original research to see how the ideas presented are related to me trying to answer the questions above. I should not have the feeling I am doing original research when my task and only task is to verify a proof!
Unfortunately, I am unable to extract a proof from the preprint paper by Mirkowska and Salwicki [10]. Thus, the Collatz conjecture still remains unsettled—at least in my mind—and thus remains unsettling. The lesson, if there is a need for one, is that nonstandard interpretations of Hoare’s logic exists, and may shed light on a problem from a different angle.
At some point during reading, I got the following intuitions. I share them here—just for fun—but these closing remarks may not be valuable to anyone and may only cause confusion.
If we revisit the computation sequence above, that yielded an infinite run, we see the following pattern (we call these the even step and the odd step): \[\begin{aligned} (k + \frac{3^o}{2^e}\iota) &\longrightarrow (\frac{k}{2} + \frac{3^o}{2^{e+1}}\iota) \text{ if }k\text{ is even,}\\ (k + \frac{3^o}{2^e}\iota) &\longrightarrow ((3\times k + 1) + \frac{3^{o+1}}{2^e}\iota) \text{ if }k\text{ is odd.}\end{aligned}\] For simplicity, assume \(k\) is a positive natural number, and \(o,e\) are natural numbers. Forgetting about the imaginary part, we see simply that the function \(f\) we defined in the introduction is applied on \(k\). However, in the imaginary part, we see that the rational number is ‘counting’ how many times which part of the function was applied. If \(k\) was even, we increase the denominator from \(2^e\) to \(2^{e+1}\). If \(k\) was odd, we increase the numerator from \(3^o\) to \(3^{o+1}\).
Now, for any computation sequence on natural numbers of the domain of the standard interpretation, there exists infinitely many corresponding computation sequences on the complex numbers of the domain of the complex interpretation. We are free, so to say, to choose the imaginary part of the complex number. So, let us look at the complex interpretation. From now onward, my thoughts are a bit cloudy and imprecise—so we proceed in unclear territory.
Since in computations of the complex interpretation, ‘we are free’ to choose the imaginary part, could we treat the imaginary part as a ghost count of the number of times we see an even number and the number of times we see an odd number? If the even numbers tend to outnumber the odd numbers then we are going towards the exit—but at any moment we are doubtful, we may freely reset the counters. Even when we reach the destination, we may still continue (here \(\longrightarrow^*\) means multiple steps, even or odd): \[(1 + \iota) \longrightarrow^* (1 + \frac{3}{4}\iota) \longrightarrow^* (1 + \frac{9}{16}\iota) \longrightarrow^* (1 + \frac{27}{64}\iota) \longrightarrow \ldots\] Here the denominator grows faster than the numerator: the imaginary part tends towards zero in the limit if we never reset the ghost counter back to \(\iota\). Does this intuition help? I do not know. Maybe one can show that there is a computation sequence in the complex interpretation in which the imaginary part tends to zero if and only if there is a corresponding computation sequence in the standard interpretation? If so, can we then also show that a computation sequence tends to zero if and only if it contains a number with \(1\) as real part?
The intuition here is that we want to jump ahead a potential computation sequence starting in some number, as if we could travel in time. The target of our jump is a number with the same real part—that would be evidence of a loop. If the loop also exists in the standard interpretation, then all the intermediary numbers must be natural numbers. Suppose we are at an arbitrary section of the computation sequence: \[(f^0(k) + w_0\iota) \longrightarrow (f^1(k) + w_1\iota) \longrightarrow \ldots \longrightarrow (f^i(k) + w_i\iota)\] where \(f^0(k) = k\). The sequence \(w_0,w_1,\ldots\) are the values of the ghost counters. We now want to reset the ghost counters back to \(\iota\), but keep the real part. Note that the imaginary part of \(\iota\) is the fraction \({3^0}\div{2^0}=1\), meaning both even and odd counters are zero. After resetting the ghost counters, we have the section that starts with \((k + \iota)\): \[(f^0(k) + \frac{3^{o_0}}{2^{e_0}}\iota) \longrightarrow (f^1(k) + \frac{3^{o_1}}{2^{e_1}}\iota) \longrightarrow \ldots \longrightarrow (f^i(k) + \frac{3^{o_i}}{2^{e_i}}\iota)\] where the sequences \(o_0,o_1,o_2,\ldots\) and \(e_0,e_1,e_2,\ldots\) are running ghost counts with \(o_0=e_0=0\). For any index in the sequence \(0\leq j\leq i\), we have that \(o_j + e_j = j\).
Is it possible to have \(f^0(k) = f^i(k)\) but where the numerator grows faster than the denominator, i.e. \(3^{o_i} > 2^{e_i}\)? If so, can it then also have \(1\) occurring in the sequence? Note that \(1\) is never reached as a result of \(3\times x + 1\) (only \(x\) with zero real part would give \(1\), but that never occurs since \(0 + w\iota\) is even and thus stays even), so from that point onward the denominator would dominate the numerator.
And what about the nonrepeating computation sequences? Where there is no loop, but in which the imaginary part does tend to zero?
So many questions are still floating around in my mind, like butterflies...
Some say that an education is more than professional training. By such professional training, I mean equipping students with useful skills such as communicating with humans or with computers, finding interesting or profitable problems to work on that are relevant to society at large, and finding novel, practical, efficient, and elegant solutions to such problems, et cetera. One could speak of skill competency, being measured at levels that indicate how well an individual is able to perform in a particular skill. There are skills which are considered more useful than others in a professional setting, thus correspondingly considered worth more. Following the tradition of New Public Management, one can then do a backontheenvelope costsandbenefits calculation (in Dutch this is called 'studierendement'): surely, becoming proficient in computer science is worth the effort, to both the student at the individual level, and to society at the collective level. But all this skill development should not cost too much: there should be a positive Return On Investment!
Although professional training works, and could be very valuable, it also misses essential points of an education. But what, then, is an education more than training students and developing skills? Below, I shall outline the objectives of an education that go beyond the objectives of professional training. Generally speaking, during academic study, a student:
All these points contribute to an education. Simply put, an education helps students to become selfsufficient. I shall discuss the above points onebyone, and reflect on each general statement how, in particular, it applies to computer science education.
Small is the number of them that see with their own eyes and feel with their own hearts.
— Albert Einstein
A typical university student (age: 1825) progresses through the human developmental stage of emerging adulthood and enters early adulthood. In these stages of human development, students improve in functioning (e.g. mastering technical skill), become disciplined actors and critical thinkers (e.g. improving use of language, developing awareness of contextsensitivity, and engaging in selfreflection), and develop a clear sense of self and identity (e.g. by demonstrating intellectual curiosity and creativity). An excellent education aims to improve not only a student's capability in technical skill, communication, comprehension, and reasoning ability—but also in attaining and maintaining a position of (financial) stability, physical and mental health and safety, the perseverance to overcome challenges, and the resilience to handle failure gracefully.
In computer science there are lots of different technical skills and languages by which a student can develop, e.g. there are many programming languages, logical languages, and mathematical languages to study and practice with. Each language comes with its own set of challenges: some are easier to learn and others are more difficult. The required level of discipline varies: some languages are rigorous and exact, while others are more lenient and fuzzy. In a wellrounded curriculum, there should be a lot of room for exploration and creativity: I believe that the best form of learning is rediscovery, in which (practical and theoretical) results, and their importance, are arrived at by oneself.
Ideally, the objective of education is not to learn just new languages, but also to reflect on the learning process itself and learn how to learn. Each language is a tool to achieve this objective. The end goal is attained if a student is able to confront challenges and tackle difficulties encountered during learning, which often necessitates independent study and a disciplined approach to thinking and working. A mean to this end is progression through gradually more difficult subjects in time. To keep progressing further, it seems that the following factors are crucial for student engagement: confidence, interaction, and autonomy (CIA).
Confidence. A lack of confidence hampers student engagement with the subject. Confidence can be increased by allowing students to revisit the necessary background knowledge, allowing students to check their own work, and giving hints towards a solution to exercises in case students are blocked for too long. Confidence can be decreased by obscurity of reference material, a lack of (direct or indirect) feedback, or information overload.
Interaction. Ideally, students are intrinsically motivated to independently study the subject material. However, sometimes students need help from, and interaction with, their social surroundings. One effective way is to work together, in the sense of mentorship, to guide students through the material by means of a dialogue. Sometimes students can help one another (which ideally is useful for both), but this requires 'matching up' students at different levels. It is a clear signal of insufficient interaction if interaction only starts after a final examination.
Autonomy. Ideally, each student can autonomously choose their next direction to explore in the subject (depending on their starting point). Sometimes it is a joy to discover that one has entered a deadend: a valuable lesson! But finding oneself too often stuck in deadends is risky. Hints or nudges towards the right path are helpful to avoid going too far astray—however, these hints or nudges should never be enforced, and only offered as advice.
If an opinion contrary to your own makes you angry, that is a sign that you are subconsciously aware of having no good reason for thinking as you do.
— Bertrand Russell
Truth exists regardless of one's expectation. Something is not true because one wants it to be true or profits from it being true. One should seek the truth from a position of freedom: disinterestedness in the sense of free from bias or (self)interest. It may be difficult to engage all students in cultivating curiosity for why something is true. This could be the result of students being subjected to a prior nonfree learning experience, where students were instructed to 'just remember' what was said and take it for granted, or to 'just do' what an assignment asks for—rather than stimulating students to raise questions and take the initiative to study the subject more deeply. Surely, it is easier to let students stop thinking, than not!
To cultivate a disinterested love of truth, one question to keep in mind is: what is true? Or even deeper, what is truth? It is not wise to claim that something is true in the absolute sense, or that only one person knows the truth: truth is something each student has to find out for themselves. The most reliable form of truth, looking from an epistemological point of view, is experienced individually: to be convinced of truth requires honest, independent, and disinterested verification by means of individual mental construction (theory), or direct experience by ones own senses (practice). As a lover of truth, viz. a philosopher, a student seeks answers to questions of truth through diverse means. This requires students to carefully and critically assess their own perception, memory, and (prior) beliefs and skills. Every (including one's own) testimony, evidence, judgment, theory, or ideal should be regarded critically, or even skeptically when in (self)doubt.
However, criticism and skepticism is bottomless, and students also need to be able to overcome the Münchhausen trilemma: even the most rigorous methods for attaining truth have limitations, and there are no absolute or complete methods for obtaining truth or certainty. Generally speaking, one should be interested in comparing the authority (in the sense of convincing force) of different arguments, and should be able to follow and participate in academic debate that has culminated or is culminating in consensus.
Some questions fundamental to computer science are: what are the foundations of computer science? What are the limits of computing? What are the limits of logical, mathematical, and scientific reasoning? What languages can (not) be used to express ideas clearly? When comparing two entities, what differences are significant (or not)? What basic facts about computer science are relevant, and how does this depend on the context? What research agendas are out there and are they conflicting or complementary? What distinguishes fashions and opinions from necessities and facts? How can one turn vague ideas into concrete conceptualizations? What academic debates were relevant in the past, are still relevant today, and will become relevant in the future? Et cetera.
The originator of a new concept [...] finds, as a rule, that it is much more difficult to find out why other people do not understand him, than it was to discover the new truths.
— Hermann von Helmholtz
It is important for students to develop a scientific attitude that is critical of technological developments in the field. A scientific attitude is exploratory in nature and embraces exciting speculation, necessitates experimentation, but requires one to be scrupulous in publicly making claims, and only to present findings that others can independently verify. Ideally, a scientific discovery lays bare not only the presence of a mechanism, but also its underlying structure that explains why predictions are reliable and how they are correct. Technological progress should be considered within a wider ethical context, also keeping in mind the actual and potential detrimental effects on human society and nature.
Especially in the field of computer science, a critical outlook on technology is important, since not all developments in our field are necessarily beneficial to all of humankind or nature. For example, most computer technology only benefits a tiny portion of humanity: is this fair, or not? Are practitioners that apply the results of computer science misguided or misinformed, even when they form a large majority influencing consensus and best practices, or not? Consensus is not a replacement of truth, but standing up against consensus might be risky for one's career or public standing (e.g. nobody likes the partypooper).
Especially in computer security, sometimes one needs courage to raise issues and be very confident of one's case, also considering the potential (legal) repercussions instigated by large companies and governments. But also in theoretical computer science, it is too easy to forget 'the bigger picture' and lose oneself in a deluge of technicalities: how can one be certain that the subject is studied as simple as is possible, without being too simplistic? The cases above necessitate a scientific attitude, to ensure that one only takes positions that can withstand reasonable doubt.
Critical thinking is an important skill, especially in an academic discipline such as computer science. Many computer scientists fall in the trap of making things too complicated (unfortunately, myself included). How can one prevent such tendencies? Working on complicated and obscure subjects may scratch one's own itch of intellectual curiosity, but the trap is that it makes the field more difficult for outsiders to enter. It requires a good education to appreciate the fine balance between necessary complexity and accidental complexity of programs, theories, and systems. The key is that simplicity and elegance, while difficult and hard to attain, pays off by making it easier to explain to others what one did.
Don't for heaven's sake, be afraid of talking nonsense! But you must pay attention to your nonsense.
— Ludwig Wittgenstein
Finally, empowering students allows them to defend their own, outspoken position in public. A reasonable position is supported by rational and wellfounded argumentation, presented in an orderly (no chaos), timely (not too long), and, ideally, beautiful manner. To appreciate reason as a means of establishing truth, one has to be receptive of criticism, to be openminded towards the possibility of finding faults and flaws in one's own position. A good defender is capable of seeking out opponents, even if those opponents are established authorities, and engage in a mutually respectful, decent, and civilized debate. But, every defeat is a learning opportunity: losing is fun!
There are many opportunities in computer science for taking in and trying to defend a position, for example in designing programs (and surviving programming errors), theories (and surviving inconsistency), and systems (and surviving design flaws). There are qualitative and quantitative approaches by which any idea can be evaluated, such as (among many others) program correctness and efficiency, system resilience, error tolerance, usability and extensibility, and theoretical elegance and simplicity: an educated person is able to appreciate these fine details.
]]>On Wednesday, 8th of March, 2023, I gave a lecture about two sorting algorithms for the course Program Correctness (see also the lecture series on YouTube, available only in Dutch). Then, due to limited time, I only discussed a sketch of the correctness proof. But I promised the students that I would write down the correctness argument in more detail. So, in this article, I will revisit the two sorting algorithms and give the correctness proof in full detail.
We had a look at two sorting algorithms: gnome sort and bozosort. The purpose of a sorting algorithm is to operate on an array and rearrange its elements in order. The two algorithms presented are not the most efficient sorting algorithms, but that is not of our concern: instead, we will look at them from the perspective of their correctness.
The main questions answered in this article are:
When proving an algorithm correct, it is important to first have a rough, informal, idea how the correctness argument should go. Only then it makes sense to formalize the argument, by writing down a proof outline. During the latter activity, one can systematically check the argument to ensure there is no fault in one’s own reasoning. Thus, both aspects are important: the bigger picture and the devil’s in the details.
The purpose of a sorting algorithm is to rearrange elements in an array, so that the final result is an array where all elements are in order. For simplicity, we assume we are dealing with an array of integers. The ordering of integers is their natural order, i.e. \(\ldots\leq 2\leq 1\leq 0\leq 1\leq 2\leq \ldots\).
Given array \(a\) of type \(\mathbf{integer}\to\mathbf{integer}\). We define the following predicate: \[\mathit{Sorted}(a) \equiv_\mathrm{def} \forall i, j : i \leq j \to a[i] \leq a[j].\] The above predicate expresses that the whole array \(a\) is sorted. We also define: \[\mathit{Sorted}(a,f,t) \equiv_\mathrm{def} \forall i, j : f \leq i \leq j \leq t \to a[i] \leq a[j].\] This predicate expresses that array \(a\) is sorted on the range \([f,t]\), i.e. from index \(f\) until and including index \(t\).
Alternatively, we can define the following predicates: \[\mathit{Sorted\,}^\prime(a) \equiv_\mathrm{def} \forall i : a[i] \leq a[i + 1]\] and \[\mathit{Sorted\,}^\prime(a,f,t) \equiv_\mathrm{def} \forall i : f \leq i < t \to a[i] \leq a[i + 1].\] It can now be verified (e.g. using a proof system for predicate logic such as natural deduction) that \(\mathit{Sorted}(a) \equiv \mathit{Sorted\,}^\prime(a)\) and \(\mathit{Sorted}(a,f,t)\equiv\mathit{Sorted\,}^\prime(a,f,t)\).
We may use these predicates to describe the desired outcome of a sorting algorithm: namely, that array \(a\) is sorted (on a particular range). However, this property alone is not sufficient. We also require a relation between the input array and the output array, to specify that the algorithm did not insert new, duplicate old, or throw out any elements. Note that the input array and the output array are stored in the same place in memory, so it matters not where we look but when we look. By input array we mean the value of the array \(a\) before the algorithm runs, and by output array we mean the value of the (same) array \(a\) but after the algorithm finished running.
To avoid algorithms inserting new, duplicating old, or throwing out elements, we require there exists a onetoone correspondence between the input and output array.
Given array \(b\) of type \(\mathbf{integer}\to\mathbf{integer}\). We use \(b\) as the name for the input array, whereas \(a\) is the name for the output array. We now define the following predicate: \[\mathit{Permut}(a,b) \equiv_\mathrm{def} \exists \pi : \mathit{Inj}(\pi) \land \mathit{Surj}(\pi) \land \forall i : b[\pi(i)] = a[i].\] This predicate expresses that the whole array \(a\) is a permutation of array \(b\). The intuition of \(\mathit{Permut}(a,b)\) is that \(\pi\) represents a bijection between integers: a onetoone correspondence between the indices of the output array \(a\) and input array \(b\). Here we make use of the following definitions: \[\mathit{Inj}(\pi) \equiv_\mathrm{def} \forall x, y : \pi(x) = \pi(y) \to x = y\] and \[\mathit{Surj}(\pi) \equiv_\mathrm{def} \forall x : \exists y : \pi(y) = x.\]
(As an aside, i.e. not relevant for the rest of this article, note that it depends on the language in which we work whether quantification over \(\pi\) is firstorder or not. If we work in the language of Peano arithmetic, this quantifier is higherorder. But if we work in the language of set theory, this quantifier is firstorder where \(\pi\) ranges over sets representing functions \(\mathbf{integer}\to\mathbf{integer}\).)
We actually need a stronger predicate than \(\mathit{Permut}\), namely to expresses that array \(a\) is a permutation of array \(b\) for a particular range, and leaves all other elements in place. Compare this with how we have two predicates for being sorted: \(\mathit{Sorted}(a)\) and \(\mathit{Sorted}(a,f,t)\). So we now define the following predicate: \[\begin{aligned} \mathit{Permut}(a,b,f,t) &\equiv_\mathrm{def} \exists \pi : \mathit{Inj}(\pi) \land \mathit{Surj}(\pi) \land (\forall i : b[\pi(i)] = a[i])\ \land\\ &\qquad\quad (\forall i : f \leq i \leq t \lor \pi(i) = i).\end{aligned}\] The new condition requires of the bijection \(\pi\) that every index \(i\) that falls outside of the range \([f,t]\) is mapped identically. We could say that the latter predicate expresses a restricted permutation.
Gnome sort is a simple sorting algorithm. The story behind the algorithm is as follows. Suppose there is a garden with flower pots arranged next to each other on a table. Each flower pot contains a beautiful flower of a certain height. A gnome comes along, and being a pedantic gnome, wants to arrange the flowers in such way that the flowers in the pots are ordered from the smallest flower to the largest flower on the table.
How does the gnome achieve this? The gnome stands next to the table, in front of a single flower pot. The procedure is easy:
The gnome starts at the leftmost flower pot.
If there is no preceding flower pot, or if the flower in the preceding pot is smaller than the flower in the pot in front of the gnome, then the gnome takes one step to the right.
If there is a preceding flower pot and the flower in the preceding pot is larger than the flower in the pot in front of the gnome, the gnome swaps the two flowers and takes one step to the left.
If the gnome has not reached the end of the table, it goes back to step 2.
Now, we use an array \(a\) to represent an array of flower pots, each cell of the array is a flower pot, and the value stored in each cell represents the height of a flower. Swapping the values of two cells of the array would represent that the gnome, being a true gardener, would take out the flowers of the two pots and place them back in the other pot. Try to imagine how the gnome runs!
Note that gnome sort is slightly different from insertion sort. In the insertion sort algorithm, we need to keep track of two locations: the location of the element which is being inserted in the proper place, and the location of where to insert that element. Insertion sort is typically implemented using a nested loop: after the given element is inserted in the prefix (inner loop), we can continue with the next element after the prefix (outer loop). However, in gnome sort, there is only a single position that is tracked, namely the position of the gnome. The gnome has to walk back after it has placed the element in the proper position, and thus performs more comparisons than insertion sort.
We can write down an algorithm that encodes the procedure of the gnome: see Figure 1. We are given variables \(f\) and \(t\) of type \(\mathbf{integer}\). These represent the range of indices in the array \(a\) representing the flower pots, where \(f\) is the index of the first flower pot and \(t\) is the index of the last flower pot. The variable \(i\) of type \(\mathbf{integer}\) represents the position of the gnome, and the variable \(z\) of type \(\mathbf{integer}\) is a temporary variable used for swapping the flowers.
We can make the following observations of the algorithm in Figure 1:
If \(t \leq f\) then the algorithm terminates without modifying the array. Otherwise, the bounds of the position \(i\) are: \(f\leq i\leq t+1\).
The array from \(f\) to \(i  1\) is always sorted. This property is a loop invariant: it holds at the beginning of the loop, at the beginning of the loop body, at the end of the loop body, and after the loop.
The sorting algorithm does not insert new elements, duplicate old elements, or throw out elements: thus, the output array (that is, the value of \(a\) after running) is a permutation of the input array (the value of \(a\) before running) restricted to the given range \([f,t]\). At the beginning and end of the loop body, the current value of the array is also a restricted permutation of the input array, and this property is a loop invariant. However, this is temporarily broken when we swap around values in the array!
We now formalize the correctness proof, by means of a proof outline (see Figure 2). First, we introduce a freeze variable, the array \(b\) of type \(\mathbf{integer}\to\mathbf{integer}\). We may think of \(b\) as a snapshot of the array \(a\) at the time before running the algorithm. Since \(b\) is never modified by the program, it maintains its value over time, and thus allows us to compare the actual value of \(a\) with its original value.
The precondition as formulated expresses that the freeze variable \(b\) contains a snapshot of array \(a\) at this instant. Also, we restrict ourselves in this proof to the case where \(f\leq t+1\) holds (since, otherwise, the algorithm has no effect).
This assertion follows from (2.1) since the following properties hold: \[(\forall i : a[i] = b[i]) \to \mathit{Permut}(a,b)\text{ and }\mathit{Permut}(a,b) \to \mathit{Permut}(a,b,f,t).\] We take as a witness (for the existentially quantified \(\pi\) in the definition of \(\mathit{Permut}\)) the identity function, which is a bijection. Also, the following property holds: \[t \leq f \to \mathit{Sorted}(a,f,t),\] since the range \([f,t]\) must be empty if \(t < f\), and \(\mathit{Sorted}(a,f,f)\) also holds.
As we already observed from the algorithm, we can now formalize the loop invariant. The first part expresses that the array from \(f\) until (but excluding) \(i\) must be sorted. The second part expresses that the actual value of \(a\) is a permutation of the input array (given the name \(b\)), restricted to the range \([f,t]\). The third part expresses the bounds of the position \(i\). Note that by applying the substitution rule we obtain assertion (2.2), so we have verified that this loop invariant is initially valid. In the remainder of the proof outline we check whether the loop invariant is preserved by the loop body, and allows us to conclude our post condition.
This assertion always holds at the start of the loop body, where we know the loop test is true. Thus we can make the invariant stronger: we know that \(i\leq t\) subsumes \(i\leq t+1\).
We have obtained this assertion in the following way: we need to establish the loop invariant of (2.3) at the end of the loop body. Hence, it has to be a postcondition of the thenbranch of the conditional statement. We apply the substitution axiom that replaces \(i\) by \(i + 1\). Now, why does this assertion follow from the preceding assertion? We discriminate two cases:
Case \(i = f\). We can establish \(\mathit{Sorted}(a,f,f)\) from the general property mentioned at (2.2).
Case \(a[i  1] \leq a[i]\). The following property holds: \[\mathit{Sorted}(a,f,i  1) \land a[i  1] \leq a[i] \to \mathit{Sorted}(a,f,i)\] since we extend the range but also have the required property about \(a[i]\) and \(a[i+1]\) (and by transitivity of \(\leq\), also for all elements before \(a[i]\), cf. why the predicates \(\mathit{Sorted}\) and \(\mathit{Sorted}'\) are equivalent).
That \(\mathit{Permut}(a,b,f,t)\) follows from \(\mathit{Permut}(a,b,f,t)\) (shown with dots) is trivial, and it is easy to see that \(i\leq t \to i + 1 \leq t + 1\) and \(f\leq i \to f\leq i+1\).
The assertion above this one is obtained from (2.4) and the negation of the conditional test. This assertion follows from the one above by the following argument. We have the property: \[j < i \land \mathit{Sorted}(a,f,i) \to \mathit{Sorted}(a,f,j)\] since the range becomes smaller. \(\mathit{Permut}(a,b,f,t)\) implies itself is again trivial, and \(i\leq t \to i1 \leq t+1\) and \(f<i\to f\leq i1\) are easy to see.
But how did we obtain the assertion at (2.6) in the first place? Intuitively, it is the result of swapping \(a[i]\) and \(a[i1]\) in the assertion (2.8). Swapping does not affect \(f\leq i1\leq t+1\). We have the following property: \[t < i \to \mathit{Sorted}(a,f,t)[a[i] := v] \equiv \mathit{Sorted}(a,f,t)\] since \(a[x][a[i] := v] = a[x]\) for all \(x < i\): there is no aliasing here!
The proof outline has a missing link: why is it that \(a\) and \(b\) are still restricted permutations when we swap \(a[i]\) and \(a[i1]\)? The essence of the argument is that from the witness \(\pi\) of \(\mathit{Permut}(a,b,f,t)\), we can construct a new witness, say \(\pi'\), as follows: \({\pi'(i) = \pi(i  1)}\) and \({\pi'(i1) = \pi(i)}\) and for all other arguments \(\pi'\) and \(\pi\) agree. Further, \(i\) and \(i1\) fall within the restricted range \([f,t]\) of the permutation.
To phrase it in a different way, we have the following property: \[\mathit{Permut}(a,b,f,t)[a[i  1] := z][a[i] := a[i  1]][z := a[i]] \equiv \mathit{Permut}(a,b,f,t)\] if \(f\leq i1\leq t+1\).
Easily obtained from the loop invariant by replacing \(i\) by \(i1\).
We check that the loop invariant and the negation of the loop test actually leads to the desired postcondition: we have that \(i = t + 1\) must hold.
This concludes the correctness argument of the gnome sort algorithm.
Bozosort is also a simple sorting algorithm. Suppose you stand in front of a table with cards on its surface, and each card has a number printed on it. Now pick any two cards on the table and swap them. Repeat this, until the numbers on the cards are all sorted from small to large. In this algorithm, we make use of a source of randomness: namely, to pick two cards to swap.
Again, we use an array \(a\) of type \(\mathbf{integer}\to\mathbf{integer}\) to represent the cards on the table. Each cell of the array corresponds to a spot on the table, and the value stored in the array is the number printed on the card.
We can write down an algorithm that encodes this procedure, see Figure 3. Here, we are given variables \(f\) and \(t\) of type \(\mathbf{integer}\) representing the bounds of the array. The algorithm consists of an outer loop and an inner loop. The loop body of the outer loop has two components (separated by vertical space). By component I simply mean a subprogram. The first component chooses two random numbers and swaps the values in the array. The second component walks through the array to test whether it is actually sorted. We use the variables \(i\) and \(j\) of type \(\mathbf{integer}\), and the variable \(c\) of type \(\mathbf{boolean}\).
To understand the meaning of the random assignment, we can make use of the following axiom: \[\{p\}\mathrel{x := \mathbf{random}(l,h)}\{p \land l\leq x\leq h\}\] where \(p\) is an arbitrary formula where \(x\) does not occur free, and \(x\) does not occur in the arbitrary expressions \(l\) or \(h\). Intuitively, this statement selects (nondeterministically) an integer between \(l\) and \(h\) and updates the value of variable \(x\) with the selected value.
What happens when \(l > h\) is the case and we perform the random assignment? According to the above axiom, we obtain \[\{l > h\}\mathrel{x := \mathbf{random}(l,h)}\{l > h \land l \leq x \leq h\}.\] The postcondition is contradictory, so equivalent to \(\mathbf{false}\). Thus, operationally, we could think that running the random assignment from such a situation is equivalent to running a program that never finishes. This works because we look at Hoare triples in their partial correctness sense.
We can make the following observations of the algorithm in Figure 3:
The actual value of array \(a\) is a permutation of the input array \(a\) at any control point, except in the middle of the first component where we perform the swapping of two elements. Thus, we could use this fact as a loop invariant of both the inner and the outer loop.
After the second component finishes its execution, the variable \(c\) represents whether the array is actually sorted or not. Hence, it is a loop invariant of the outer loop that if \(c\) is true, then the array is sorted. This means that the loop only exists when the array \(a\) is actually sorted!
The inner loop that checks whether the array is sorted looks a bit like gnome sort: the position variable \(i\) is moved to the right whenever we have tested the array elements are in order. But, instead of walking to the left, the inner loop has an early exit in case it encounters two elements that are not properly ordered. By setting \(c\) to \(\mathbf{false}\), the outer loop must run again.
We now formalize the correctness proof, by means of a proof outline (see Figure 4). Again, we introduce a freeze variable, the array \(b\) of type \(\mathbf{integer}\to\mathbf{integer}\), with the same purpose as before.
This assertion follows from the precondition, since \((\mathbf{false}\to p)\) is vacuous for any formula \(p\), and the precondition implies \(\mathit{Permut}(a,b,f,t)\) for the same reason as given in (2.2).
Here we have formalized our intuition of the loop invariant for the outer loop. Note that the assignment axiom allows us to obtain (4.1).
When entering the loop body we know that \(c\) must be \(\mathbf{false}\), so we can adapt the loop invariant: only information about the array \(a\) being a restricted permutation remains present here.
Obtained by applying our axiom for random assignments.
Also obtained by applying our axiom for random assignments. Note that from this assertion it already follows that \(f\leq t\).
Again we have the swapping of two elements. The argument needed here is a slight generalization of the argument of (2.7) above, where the essence is this property (given \(f\leq i\leq t\) and \(f\leq j\leq t\)): \[\mathit{Permut}(a,b,f,t)[a[j] := z][a[i] := a[j]][z := a[i]] \equiv \mathit{Permut}(a,b,f,t).\] Note that if the two random variables have the same value, the swap has no effect.
Comparing to (4.3), we now also know that \(f\leq t\). Clearly, the three preceding assignments cannot affect \(f\leq t\) and we already knew it holds in (4.5). In the other case (that \(f > t\) holds), we would not even reach this point in the program.
We here formulate the loop invariant for the inner body. The essence is that we know that everything on the left (and including) of \(i\) must be sorted. This is initially valid, since \(\mathit{Sorted}(a,f,f)\) is true as in (2.2). We still need to establish that this is indeed a loop invariant.
When the loop body is entered, we know both \(c\) and \(i < t\). The latter is used to make our assertion stronger than the loop invariant.
In the case the elements are properly ordered, we can again apply the property as mentioned in the second case of (2.5).
Easy to see that \(i<t\) implies \(i\leq t\). It is quite surprising that the assertion at this point does not say anything about variable \(c\).
We now establish that the outer loop invariant follows from the inner loop invariant, under the condition that the inner loop has terminated (so it’s test must be false). There are two cases:
Case \(\lnot c\): we have an early exit of the inner loop. Since \(c\) is \(\mathbf{false}\), the left conjunct is vacuous. The right conjunct trivially follows from the inner loop invariant.
Case \(\lnot (i < t)\): we know the inner loop has fully executed, so \(i = t\) from the upper bound in the loop invariant. So \(\mathit{Sorted}(a,f,t)\) must hold, regardless of the value of \(c\).
The postcondition is obtained by a double negation elimination on the test of the outer loop: \(\lnot\lnot c\) implies \(c\), and so from \((c\to \mathit{Sorted}(a,f,t))\) we obtain that the array is actually sorted. Again, \(\mathit{Permut}(a,b,f,t)\) follows trivially from the outer loop invariant.
This concludes the correctness argument of the bozosort algorithm.
We have seen two sorting algorithms, and discussed their correctness proofs. Although almost every detail is present here, there still remains a good exercise for practicing with array variable substitutions: to write down the proof outline of swapping the array elements above, and working out in all detail how it affects the restricted permutation predicate \(\mathit{Permut}\).
In this article, we only look at program correctness in the sense of partial correctness. An interesting question remains: what can we say about the termination of these algorithms? Under what conditions do these algorithms terminate? In the next following weeks of the Program Correctness course, we will look at total correctness, where we shall prove not only the correctness of a program with respect to a specification of its input and output behavior, but also whether the program terminates!
I thank Dominique Lawson and Roos Wensveen (both student assistants of the Program Correctness course) for suggesting improvements, and discovering an error in a previous version of this article. The error was that the postcondition needs to be \(\mathit{Permut}(a,b,f,t)\) and not \(\mathit{Permut}(a,b)\) (can you see why?). All remaining errors remain my own.
ChatGPT is a research project developed by OpenAI, which gives the following charter on their website: "OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome." I have used version Februari 13 (2023) of ChatGPT.
After creating an account and logging in, the user interface for ChatGPT gives the following disclaimer: "While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice." So, we're better beware of the answers...
I give it the following first input: Hi. I would like to ask you a number of exam questions about the course Concepts of Programming Languages.
ChatGPT: Of course, I'll do my best to help you with your exam questions about Concepts of Programming Languages. What would you like to ask?
So, here we go. All quotations are generated by ChatGPT.
What distinguishes a highlevel programming language from a lowlevel programming language? Choose one or more correct answer from below.
ChatGPT: The correct answer is 3. Highlevel programming languages allow (some) concepts to be expressed more directly, whereas lowlevel programming languages require (convoluted, errorprone) encodings to express the same concepts.
Explanation: Highlevel programming languages are designed to provide a more abstract view of programming, allowing the programmer to express concepts and algorithms more easily and directly. They typically use more humanreadable syntax and provide builtin abstractions for common programming tasks, such as memory management, string manipulation, and I/O operations. In contrast, lowlevel programming languages require the programmer to work more directly with the underlying hardware, requiring more detailed knowledge of the system architecture and often relying on manual memory management and other lowlevel operations. While lowlevel programming languages may offer greater control over the hardware, they can be more errorprone and harder to use effectively than highlevel languages.
Give a short motivation for each choice of the question above, why you made your selection (that is, why it is correct if you selected it, or why it is incorrect if you did not select it). Keep your motivation short: write at most two sentences per choice.
ChatGPT: Sure, here are the motivations for each choice:
1. Incorrect: The number of programmers using a language is not a factor in distinguishing between highlevel and lowlevel programming languages.
2. Incorrect: While some highlevel programming languages may be compiled into lowlevel machine code, this is not always the case, and lowlevel languages can also be compiled.
3. Correct: Highlevel programming languages provide builtin abstractions and a more abstract view of programming, making it easier to express concepts and algorithms more directly. In contrast, lowlevel languages require more direct manipulation of the hardware, making it necessary to use convoluted encodings to express the same concepts.
4. Incorrect: The efficiency of computations is not directly related to whether a language is highlevel or lowlevel. Both types of languages can be used to write efficient code, depending on the specific task and the skill of the programmer.
What statements about pure and impure expressions are true? Choose one or more correct answer from below.
ChatGPT: The correct statement is 1. There exists an expression that is considered impure if you look at its parts, but which has no effect on the program state after it is fully evaluated.
Explanation: Pure expressions are those that have no side effects and always return the same result given the same input. Impure expressions, on the other hand, have side effects or may return different results for the same input. However, it is possible for an expression to be impure by definition but have no effect on the program state when evaluated. For example, an impure expression might generate a warning or log a message, but if the program does not use or check for these outputs, the expression is effectively pure.
Statement 2 is incorrect because the time or space required for evaluation is not a factor in determining whether an expression is pure or impure.
Statement 3 is incorrect because impure expressions by definition have side effects, so they do affect the program state when evaluated.
Statement 4 is incorrect because assignment statements with impure expressions on the righthand side cannot always be eliminated without changing the behavior of the program. For example, if the impure expression generates a warning or logs a message, eliminating the assignment would result in the loss of this information.
Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.
(ChatGPT already did that.)
What statements about exceptions are true? Choose one or more correct answer from below.
ChatGPT: The correct statements are 3 and 4.
Explanation:
1. Incorrect: Numerical errors such as division by zero can be handled by throwing an exception, rather than causing a program crash.
2. Incorrect: It is possible to rethrow an exception that was caught before using the "throw" statement.
3. Correct: When an exception is thrown, the call stack is unwound until an appropriate exception handler is found. This means that the function that threw the exception and any functions that it called will be exited, but this does not necessarily mean that the whole call stack will be unwound.
4. Correct: If an exception is not caught by any handler, it will cause the program to terminate with an error message. This can lead to a program crash if the error is not handled gracefully.
Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.
(ChatGPT already did that.)
What statements about objectoriented programming are true? Choose one or more correct answer from below.
ChatGPT: The correct statements are 1, 2, and 4.
Explanation:
1. Correct: Dynamic dispatch is a key feature of objectoriented programming, where the method body that is actually invoked depends on the runtime type of the object that the method is invoked on.
2. Correct: Class inheritance can lead to violations of Liskov's substitution principle if the subclass behavior is not compatible with the superclass behavior. For example, if a subclass violates the preconditions or postconditions of a superclass method, code that works on the superclass object may not work correctly on the subclass object.
3. Incorrect: While every object belongs to a class, not every class has an object as an instance. A class can be abstract or simply serve as a blueprint for creating objects.
4. Correct: Objects in objectoriented programming are typically allocated on a heap, and in garbagecollected languages, unreachable objects may be removed by the garbage collector while the program is running. This allows the programmer to focus on the program logic and not worry about memory management.
Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.
(ChatGPT already did that.)
Given the following contextfree grammar in BackusNaur Form (BNF):
<expr> ::= (<expr>)  <expr> + <expr>  <expr> * <expr>  <num>  <var>
<num> ::= <digit>  <digit><num>
<digit> ::= 0  1  2  3  4  5  6  7  8  9
<var> ::= <letter>  <letter><var>
<letter> ::= a  b  c  ...
<stmt> ::= {<stmt>}  <var> := <expr>  if <expr> = 0 then <stmt> else <stmt>  <stmt>; <stmt>
The starting symbol is <stmt>.
Which of the following strings are in the language defined by the grammar above? Choose one or more correct answer from below.
ChatGPT: The correct strings are 1, 2, and 4.
Explanation:
1. Correct: The string abc := a+b; can be generated by the rule <var> := <expr>. The variable "abc" is a sequence of letters, which is generated by the rule <letter><var>, and the expression "a+b" is generated by the rule <expr> + <expr> with <expr> being the variable "a" and the variable "b", which are generated by the rule <var>.
2. Correct: The string (((c))) can be generated by the rule <expr> with <expr> being (<expr>) and <expr> being (<expr>) and <var> being "c".
3. Incorrect: The string a := b + a; b := a  b cannot be generated by the given grammar because it is not a valid <stmt>. The grammar allows only one assignment statement per <stmt>.
4. Correct: The string if (a + b) = 0 then a := b else b := a can be generated by the rule if <expr> = 0 then <stmt> else <stmt>, with <expr> being (a+b) generated by the rule <expr> + <expr>, and <stmt> being the assignment statements "a := b" and "b := a" generated by the rule <var> := <expr>.
Motivate, for each string in the previous question above, why you made your selection (i.e. why the string is in the language if you selected it, or why it is not if you did not select it).
(ChatGPT already did that.)
Is the grammar given above ambiguous? Give a reasoned answer (i.e. not only "yes" or "no", but also why and how you found out).
ChatGPT: Yes, the grammar given above is ambiguous.
Reasoning:
Ambiguity in a contextfree grammar means that there exists at least one string in the language that can be generated by the grammar in more than one way. In the given grammar, the expression <expr> + <expr> can be associated with either left or right recursion, which leads to different interpretations of the same input. For example, the input string "1 + 2 * 3" can be parsed as either "(1 + 2) * 3" or "1 + (2 * 3)", which have different values. This indicates that the grammar is ambiguous.
One way to demonstrate the ambiguity is by constructing a parse tree for an ambiguous string. For instance, consider the string "1 + 2 * 3". It can be parsed in two different ways, resulting in two different parse trees:
+ *
/ \ / \
1 * + 3
/ \ / \
2 3 1 2
The left tree corresponds to the interpretation "(1 + 2) * 3", while the right tree corresponds to the interpretation "1 + (2 * 3)". Since the grammar allows both of these parse trees, it is ambiguous.
Given the following contextfree grammar of the (untyped) lambda calculus:
<expr> ::= <var>  λ<var><expr>  <expr><expr>
We use only single letter variables. Use the standard way of resolving ambiguity: application associates to the left, abstraction precedes application (that is, abstraction binds more strongly than application, or, in other words, abstraction has a higher precedence than application). Expressions can use parentheses for disambiguation (not shown in the grammar).
Draw the abstract syntax tree of the following lambda expressions (in black). Circle the free variable occurrences (in red). Show the binding of variables (in green).
ChatGPT: 1.
xyz


 
x y
/ \
x z
Variable occurrences: x, y, z. All three are free.
Binding of variables: none.
1.
λxxx


 
λ x

x
Variable occurrences: none.
Binding of variables: x is bound by the abstraction λxxx.
1.
λyx


 
x 
 
y 
 
x y

y
Variable occurrences: y appears once as a free variable.
Binding of variables: x is bound by the first abstraction λx(yx), and y is bound by the second abstraction λyx.
Explain the meaning (what kind of statement) and execution (what happens when statement is executed) of each statement of the following program with dynamic memory management:
int *x = new(5);
int *z = null;
{
int *y = new(6);
z := y;
delete z;
*x := *y;
}
ChatGPT: This is a program written in C++ with dynamic memory management. Here's what each statement does:
1.int *x = new(5);
: This statement dynamically allocates memory for an integer array of size 5 and assigns the address of the first element to the variablex
.
2.int *z = null;
: This statement initializes a pointer variablez
to null.
3.{
: This is the beginning of a block statement.
4.int *y = new(6);
: This statement dynamically allocates memory for an integer array of size 6 and assigns the address of the first element to the variabley
.
5.z := y;
: This statement assigns the value ofy
toz
, soz
now points to the same memory location asy
.
6.delete z;
: This statement deallocates the memory thatz
points to, which is the memory thaty
also points to. This means that the memory for the integer array of size 6 that was allocated on the heap is now freed.
7.*x := *y;
: This statement copies the value of the first element of the integer array thaty
points to, which is also the value of the first element of the integer array thatx
points to. This means that the value ofx
is now equal to the value ofy
.
The execution of this program involves allocating and deallocating memory on the heap and copying values between memory locations. When the program runs, it first allocates memory for the integer array of size 5 and assigns its address tox
. Then it initializesz
to null. Inside the block statement, it allocates memory for the integer array of size 6 and assigns its address toy
. It then assigns the value ofy
toz
, soz
points to the same memory location asy
. It then deallocates the memory thatz
points to, which also frees the memory thaty
points to. Finally, it copies the value of the first element of the integer array thaty
points to, which is also the value of the first element of the integer array thatx
points to. After the block statement, the memory for the integer array of size 5 is still allocated and assigned tox
, while the memory for the integer array of size 6 that was allocated inside the block statement has been deallocated.
Consider the following two programs (on the left and right) that are executed concurrently (x and y are shared integer variables):
x := 5; y := 3;
y := 2; x := 2;
if (x < 5) { x := x  y;
y := 4 y := y + 6;
} print x
x := x + y;
Give five different executions of the program, i.e. five executions that each prints a different value. The result (of the five executions) alone is not sufficient. Please also describe, for each execution, what happened/how you obtained the result.
ChatGPT: Here are five possible executions:
Execution 1:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 1: x := x + y;
Thread 2: x := x  y; // x = 2  3 = 1
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: print x; // x = 9
Execution 2:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 2: x := x  y; // x = 2  9 = 7
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 1: print x; // x = 9
Execution 3:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 2: x := x  y; // x = 2  9 = 7
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: print x; // x = 7
Execution 4:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: x := x  y; // x = 2  9 = 7
Thread 1: print x; // x = 9
Execution 5:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: print x; // x = 7
In general, the value printed depends on the order in which the threads execute their statements and whether the if statement is executed or skipped. In each execution, we simulate the interleaving
(Note that I have removed some questions which are hard to give as input to ChatGPT textually.)
Decide for yourself what you think of the answers. I am impressed: the answers generated by ChatGPT seem closely related to the question, and make sense grammatically. However, when looking more closely into the answers, it is revealed that the chat bot is not able to give reasonably correct answers to all questions.
]]>