dr. heap

A sound and complete proof system for separation logic (part 1)

Hans-Dieter A. Hiep — Fri, 28 Jun 2024 22:00:00 GMT

Download the PDF version of this article

1 Introduction

In this article we have another look at the proof system for separation logic that is introduced in the first author’s PhD thesis [5] (publicly defended on Thursday, May 23rd, 2024).

By separation logic we mean the logic behind the assertion language used in Reynolds’ logic, the program logic for reasoning about the correctness of pointer programs that was introduced in 2002 by J.C. Reynolds [9]. In that article, Reynolds introduces both his program logic and axiomatizes the logic of separation logic by several axioms, but writes:

Finally, we give axiom schemata for the predicate .
(Regrettably, these are far from complete.)

In 2021, completeness of quantifier-free separation logic was established [3], and three year later completeness of the full language of separation logic [5].

The purpose of this article is to show the novel proof system of [5] in a straightforward way. The new proof system can be used to prove all valid formulas, which until now were impossible to prove using existing automatic and interactive tools for separation logic. In Section 2 we quickly revisit the formulas of separation logic, in Section 3 we introduce the proof system, and in Section 4 we have a look at a number of example proofs. We then continue the discussion that motivates the design of the proof system: in Section 5 we discuss referential transparency and the binding structure of separation logic, and in Section 6 we discuss issues such as univalence, well-foundedness, and finiteness.

This article is part one of a series of articles about the new proof system for separation logic. In this article, we focus on the syntax of the proof system. In Section 7, the conclusion, we discuss the topics of the next parts of this series, namely semantics and the soundness and completeness of the proof system.

2 Preliminaries

The syntax of formulas of separation logic is defined as follows: where we assume there is a countably infinite set of variables with typical examples (possibly with subscripts), and we have a signature which has a countably infinite set of non-logical symbols each assigned to a fixed arity of which is a typical example with arity . We have the usual logical symbols: stands for falsity and stands for logical implication. From these two symbols we can derive all other propositional connectives, such as negation , verum , logical conjunction , and logical disjunction . We have universal quantification where the variable is bound in the usual way, and we can define existential quantification as the dual . By we mean the set of free variables in . Quantification is first-order in the sense that quantification ranges over individuals. Finally, we also have equality as a non-logical symbol, but with a fixed meaning. (Our treatment of parentheses and resolution of ambiguity is standard: we may leave parentheses out as long as the result is not too ambiguous.)

What is different in separation logic compared to classical first-order logic are the following so-called separation symbols (distinguished from the logical and non-logical symbols). The primitive formula is called points to (as in ‘ points to ’) or a reference (as in ‘ is a reference to ’). As complex formulas, two separating connectives are given: is a separating conjunction, and is a separating implication. The latter connective is also called the magic wand by some authors. Having ‘points to’ as primitive allows us to define ‘strict points to’ as follows, where we take to abbreviate The intention is that expresses that location has value , whereas expresses furthermore that is the only location allocated. We also have the abbreviations and , where we immediately existentially quantify away the value. These express that is allocated (possibly among other locations) and, moreover, that only is allocated. By we mean nothing is allocated, so it abbreviates , i.e. every location does not point to some value, or equivalently .

3 Proof system

In this section we introduce a novel proof system for separation logic. In this article we look at the proof system from a purely syntactical point of view. In the next article of this series, we give the standard semantics of separation logic.

The first device we introduce is a special construct, in the following sense: This construct allows us to change the meaning of ‘point to’ in , by assigning it the meaning denoted by . Intuitively speaking, to evaluate whether holds, we first consider the heap denoted by (with free variables and ) and then we evaluate whether holds in the heap described by . We must be careful not having a too naïve interpretation of : we cannot just simplify by replacing the occurrences of in by , because separating connectives are referentially opaque (this is explained in more detail in Section 5). The purpose of our proof system is to reason about this construct in a formal way.

Working with takes much space, so instead we use the shorthand notation . Thus, the objects of our proof system involve not just the formulas of separation logic, but an extended language (called extended separation logic) in which we add this special construct:

Next, we introduce a proof system with as objects the formulas of extended separation logic. This proof system allows us to deduce formulas: a deduction is also called a proof, and we shall give a number of example proofs. Recall that we have a signature that has a countable infinite supply of non-logical symbols. For any formula of extended separation logic, its parameters are the predicate symbols of the signature that occur somewhere in the formula. In particular, we shall make use of so-called ‘bookkeeping devices’, which are binary predicate symbols (possibly with quotes or subscripts) from the signature. Sometimes we have the side-conditions that our bookkeeping devices are ‘fresh’, in the sense that they do not appear as parameters of formulas in the context.

We present the proof system as a Hilbert-style axiom system, but nothing prevents us from also giving the proof system in the style of natural deduction. We have the usual proof rules and axioms of classical logic (but instantiated with formulas of extended separation logic), together with the following axioms:

(Lookup)
(Replace)
()
()
if ()
(Assoc)
(Extent)
(E)
(I)
(E)
(I)

We have the side-condition in the rule (E) that the symbols and are fresh, i.e. are not parameters of . Similarly, we have the side-condition in the rule (I) that the symbol is fresh, i.e. is not a parameter of . We used without subscripts instead of to reduce notational clutter. To avoid confusion, we may use instead of and instead of .

is the result of simultaneous substitution of by and by , respectively. The substitution operator is defined compositionally as usual, and has the following specification for the new connective: where are all distinct. If is either the same variable as or , then the substitution is not pushed down on the right side. A similar definition can be given for simultaneous substitution of distinct variables.

We let abbreviate the formula and let abbreviate the formula and let abbreviate the formula These abbreviations universally quantify : we let these quantifiers, on purpose, capture the free variables and of . When and are just the binary predicate symbols and , we mean the formulas and . One can also use set builder notation to make the intention more clear. Note that in the latter abbreviation, , we require the stronger notion of disjointness of the domains of the relation, not the weaker notion of disjointness of the two sets of pairs representing the pairs that are related by each relations.

Further, a useful result in practical reasoning is that we can replace equivalent subformulas in any formula. Moreover, the deduction theorem also holds for our proof system, hence we can apply the axioms under any context. We furthermore shall use the above proof system in a natural deduction style.

4 Example proofs

Let us now have a look at a number of example proofs. We shall write to mean that is demonstrable in the proof system given above without any premises, and to mean that is demonstrable using the premises in .

The first example is given in Figure 1. The statement we want to prove has the following intuitive meaning: in the heap described by we have that is satisfied. The argument is the following: the heap described by is the empty graph (no location is mapped to any value), so evaluating in that heap indeed yields a true formula. In the proof that follows, we do not explicitly write down how to do classical reasoning, and instead we focus on the application of the new axioms.

Figure 1. Proof of in the empty heap.

The second example is given in Figure 2. We prove that for any (extended) separation logic formulas and , their separating conjunction is commutative. The proof proceeds in two parts. In step 9, we have shown how to swap the two separated formulas relative to the heap . But this heap description has the same extension as the ‘outer’ heap, hence we obtain the non-relative result in step 10! As such, we can obtain the result simply by putting the given formula under this -connective. We add formulas to the context by means of opening a box, so at step 6 we have established:

Figure 2. Proof of commutativity of .

See Figure 3 and Figure 4 for the third and fourth examples. Figure 3 is a generalization of the result in Figure 1. Note that in step 5 of Figure 4 we use the result proven in Figure 3. What should be obvious now is that the proofs are not very difficult: we use our set theoretic intuition for dealing with heaps. Both Figure 2 and Figure 4 show that (E) simply adds fresh parameters and the corresponding assumptions to the context. This shows that separating connectives behave almost like a quantifier, if we compare it with the way first-order quantification works (as in Figure 3).

Figure 3. Proof that holds and only holds in empty heaps.

Figure 4. Proof that is a unit of separating conjunction.

The reader can now try and write down the proofs for the following formulas:

At last, we have the following non-trivial properties:

The last property is very important. It shows that separation logic can be used to express cardinality properties of the universe. The last property shows the separation logic equivalent of the classical expression of the property ‘there are at least two elements’. When we scale this property, to ‘there are at least elements’, one will see that the separation logic formula grows linearly but the classical logic equivalent grows drastically faster: quadratically! This is the essence of the scalability argument motivating the use of separation logic.

Our proof system is able to prove this equivalence. However, existing proof systems for separation logic still lack the ability to prove this elementary fact. We have investigated whether the equivalence of these formulas can be proven in an interactive tool for reasoning about separation logic: the Iris project [6]. In current versions of that system, it is not possible to show the equivalence of these assertions, at least not without adding additional axioms.

The last example is a demonstration of the following equivalence: This equivalence is expressed in quantifier-free separation logic, for which a complete axiomatization was already known [3]. We can also give a proof, see Figure 5. Surprisingly, this already exceeds the capability of all the automated separation logic provers in the benchmark competition SL-COMP. In fact, only the CVC4-SL tool [8] supports the fragment of separation logic that includes the separating implication. However, from our own experiments with that tool, we have that it produces an incorrect counter-example and reported this as a bug to one of the maintainers of the project. In fact, the latest version, CVC5-SL, reports the same input as ‘unknown’, indicating that the tool is incomplete.

Figure 5. Proof of an equivalence between a semi-pure and separating formula.

So far, we have seen several valid formulas of separation logic, which in the novel proof system for separation logic we are actually able to prove. This alone already shows our proof system goes beyond the ability of existing tools for reasoning about separation logic! The novelty of this proof system lies in the fact of adding a binding construct, which in shorthand is written using the -connective, that relativates the heap with respect to which a formula is interpreted.

5 Referential transparency

In this section we discuss the binding structure of separation logic, and how the concept of referential transparency applies. Referential transparancy is a general concept in formal languages and as such applies to both logical and programming languages. Although Whitehead and Russell already speak of it, Quine is often credited for introducing the term in his book Word and Object [7] originally published in 1960. In the case of separation logic, we shall see that the separating connectives fail referential transparency!

Separating connectives capture references, the ‘points to’, that occur in subformulas. In the binding structure of first-order logic, one could resolve unintentional capturing by means of a so-called ‘capture avoiding’ substitution operator that renames quantified variables before actually performing a substitution. However in separation logic, one cannot define such a capture avoiding substitution operator since in separation logic there is only a single heap in scope that can not be renamed.

First, we shall make some general remarks about the binding structure of separation logic formulas. A formula is pure if no separation symbol occurs in it. In that case the meaning of a formula does not depend on the heap, viz. the interpretation of . Otherwise, a formula is semi-pure if only the separation symbol occurs in it. A formula in which one of the separating connectives occur is called a separating formula. We have the usual notions of free variable occurrence and bound variable occurrence, as our notion of variable binding is the same as in first-order logic. But, in separation logic, we also have another binding structure, namely that of references: the meaning of ‘points to’ is different under the separating connectives.

To see why separation logic fails referential transparency, consider the reference to ‘the value of location ’ in the proposition ‘the value of location has property ’. To avoid that ‘the value of location ’ is ill-defined, when speaking of the value one implicitly intends there exists a unique value. Moreover, linguistically speaking, a reference is free if we can replace it by any other expression that is equal to it, without affecting the truth value of the proposition after replacement compared to the proposition before replacement. Often this is called the principle of substitutivity¹. For example, given that ‘the value of location ’ equals ‘the value of location ’, when we replace a reference of the former by the latter in the expression ‘the value of location has property ’ to obtain ‘the value of location has property ’, we obtain an equivalent proposition: so we have that the reference ‘the value of location ’ occurs free. A context is said to be referentially transparent whenever it preserves the free references: every free reference remains a free reference under the given context. Otherwise, the context is referentially opaque.

In classical logic all propositional connectives are referentially transparent. The only referentially opaque connectives are the quantifiers under specific circumstances. This is easy to see for a given formula with a free variable . Suppose , then by substitutivity we know that is equivalent to . However, some quantifiers fail referential transparency, since for example in the formula we can no longer naïvely replace with when we know . If the quantified variable is not the same as one of the free variables (either in the subformula or in the expression being substituted), we do maintain referential transparency. To ensure referential transparency there is the convention of keeping bound and free variables separate, analogous to the so-called Barendregt variable convention [4 Sect. 5.2].

In separation logic, however, many contexts involving separating connectives are referentially opaque. For example, in the context of a separating conjunction it is not always the case that we can freely replace references by equivalent expressions. An example is where the value of location is equal to the value of location , and where we also separate the locations and . Formally, we have the equality on the left, and the separation on the right: Although we know that locations and have the same value, we cannot literally replace for in the left component of the separating conjunction, without also doing the reverse replacement (replacing for ) in the other component of the separating connective. Thus we no longer have that the reference ‘the value of location ’ is free when it is nested under a separating conjunction: separating conjunction is referentially opaque!

To understand the binding structure of separation logic, we introduce the notions of direct and indirect binding. A reference (a ‘points to’ construct) or a separating connective is directly bound to the separating connective under which it is nested, without any other separating connective in between. Thus, here by nesting we only have to look at separating connectives in the immediate context, not at the logical connectives. A reference or a separating connective is said to be free whenever it is not directly bound. A reference or a separating connective is indirectly bound to all the separating connectives under which it is nested, but not immediately nested. In a sense, indirect binding is the transitive but irreflexive closure of direct binding.

Another example is the following formula involving magic wand: which expresses the following concept: for every value that the heap does not refer to, if we were to extend the heap with a cycle between the locations and , then in the resulting heap the location is the only location which has value . So how does the binding structure of this formula look like? Syntactically, there are four references (‘points to’ constructs) in this formula and two separating connectives. Each of these entities are either bound or free. The left-most reference is free, and the other three references are bound. These three references are nested under the magic wand, so directly or indirectly bound to that magic wand. The magic wand itself is free. The right-most reference is directly bound to the magic wand. The other two references are directly bound to the separating conjunction. See Figure 6 for a graphical depiction of the parse tree and the binding structure of references and separating connectives to separating connectives.

Figure 6. A parse tree showing the binding structure of separation logic. Direct bindings are shown with dotted lines pointing to a separating connective. Free references and free separating connectives are shown in red.

There is a difference with the variable binding structure of first-order logic: if a variable is bound to a quantifier, then it no longer necessarily has a relationship with the free variables of the same name. Quantifiers thus introduce a so-called scope for each variable. This is different for separation logic: although a reference can be directly bound to a separating connective, there still can be a necessary relationship with references that occur outside the connective to which it is bound. For example, in Figure 6 we have that both the free reference and the magic wand speak about the same heap (the ‘outer’ heap), but also the right-most reference under the magic wands speaks about (part of) that outer heap: namely, for every we also have due to the equivalence on the right-side of the magic wand.

The moral is that separation logic has ‘leaky scopes’. But it is also possible to define constructions in separation logic that have proper scopes. For example, the formula has the intuitive meaning that holds for all heaps (its formal definition is given in the next section). It thus acts as a universal quantifier for heaps. And we can also define as the dual , that acts as an existential quantifier for heaps. Just like quantifiers in first-order logic, we have that and introduce a proper scope of the ‘points to’ construct inside , which is different from the ‘points to’ construct outside.

The formula is a so-called heap independent formula. A heap independent formula is a formula for which its truth value does not depend on the ‘current’ heap in which it is evaluated. For example, the pure fragment of separation logic, comprising no separation symbols, is heap independent. But also is heap independent, even when it contains ‘points to’ constructs and separating connectives in . All references and connectives under are bound and the scope is closed: no ‘leaky scope’ for the black box.

6 Univalence, well-foundedness and finiteness

We now introduce the modality as the abbreviation We also have the dual defined as . We have that both modalities have the same binding strength as classical negation. The intuitive reading of the modal operators is that holds in a given ‘current’ heap whenever holds for all heaps (including the current heap), and holds in a given current heap whenever holds in some heap (which may be different from the current heap). As such, these modal operators change the heap with respect to which a formula is evaluated.

In fact, we have the following valid formulas involving these modalities:

These formulas can all be proven in our novel proof system (their proofs are great exercises for the reader). We also have that the rule of necessitation is admissible, in the sense that implies , but whether this rule is also effectively derivable is not known to us.

We now consider an example of using the black box modality . In our treatment of separation logic, we do not necessarily impose so-called ‘functionality’ or ‘univalence’ of the heap. This means that it is possible that is true in some situation. We thus treat as a relation symbol. One intuitive way to interpret the ‘points to’ relation would be from object-oriented programming, where the object has some reference to the object by one of its fields, but we abstract away through which field object references object . It is not difficult to obtain univalence by restricting ourselves to those situations where there is at most one value, by means of the property: That all heaps are univalent can be simply expressed by:

We also have the following modality , introduced as the abbreviation Also the dual is defined as . The intuitive reading of these modal operators is different, in the sense that holds in a given heap whenever holds for all subheaps of the given heap. Similarly, holds in a given heap whenever holds for some subheap of the given heap.

An example of the modality is the following. We say that a value is reachable if there is a location which refers to it, so . Conversely, a location is allocated whenever it refers to a value, so . Consider that allocated locations can also be used as values, so we can have an allocated and reachable location. This way, we can form chains of so-called traversals: which abbreviates the conjunction of for . Whenever is not allocated, the traversal has reached a dead-end. However, whenever in a traversal the first and last location are the same, we have a cycle: it is then possible to keep on traversing the heap indefinitely.

We say that a heap is well-founded whenever for every non-empty subheap there is some allocated but unreachable location. This is expressed formally as: The claim is now that there are no cycles in a well-founded heap. To see why, suppose towards contradiction we have a well-founded heap (in which the above formula is true) in which there exists a cycle Then take the subheap which consists precisely of the locations , that is, we ignore all the locations not visited as part of the cycle. This subheap is non-empty. But we can not take any as witness, since every location is reachable! This is a contradiction.

When speaking of modal operators, it is useful to speak of the ‘current’ heap (with respect to which any formula in separation logic is evaluated), the ‘outer’ heap (which is the heap with respect to which an enclosing formula is evaluated) and the ‘inner’ heap (which is the current heap while evaluating a subformula). This terminology is also useful when speaking about the separating connective , where we would speak of the ‘outer’ heap with respect to which the entire formula is evaluated, and two ‘inner’ heaps corresponding to the evaluation of and .

The point of the discussion above is that we can now understand more clearly what happens with the -connective. Suppose now that is pure, so it does not have any (free) references. Then we have that and the formula are equivalent. (We discuss this and related formulas in more detail below.) Clearly, this is a heap independent formula, due to the black box! However, when is not pure, the formula is not heap independent. In the -connective, the crux is that the ‘points to’ symbol in is relevant and its meaning depends on the ‘outer’ heap, whereas the ‘points to’ symbol in is intentionally captured by the -connective where its denotation is described by . The -connective thus changes what is the ‘current’ heap when evaluating . This is similar to what the modal operator does, in which also we have an ‘inner’ and ‘outer’ heap, but where the former is a subheap of the latter heap. In the -connective the ‘inner’ heap is described by , which may depend on the ‘outer’ heap when it is not a heap independent formula.

Existence of the empty heap, where nothing is allocated, is expressed by: But what about the opposite, the existence of a heap in which every location is allocated? Could the formula be true? Or what about the existence of a heap in which every value is reachable? Could the formula be true? No, in the standard interpretation of separation logic, based on the integers, these formulas are false because heaps are finitely-based partial functions!

Suppose we work with the standard integers , and we have in our signature the usual arithmetical symbols. If we want to ensure we only deal with finite, univalent heaps, then we should take the following formulas as axioms: The first axiom expresses univalence. The second axiom expresses boundedness, that is, for every heap there is a bound on the domain, that is, there is a maximum and minimum location. Every finitely-based partial function satisfies these property (a finitely-based partial function can be seen as a finite list of location-value associations, and the maximum and minimum can be computed). Conversely, every heap that satisfies both axioms can be represented by a finitely-based partial function: there are only finitely many locations between the minimum and maximum location (due to boundedness) that can be assigned at most one value (due to univalence).

Note that in the standard interpretation of separation logic on the integers, we never treat the heap as a total map, where every location must have a value. It thus always remains a possibility for a location to be unallocated, i.e. the location is unallocated in a situation whenever holds—which expresses that there is no value to which points. In non-standard interpretations of separation logic, we do have the possibility of an infinite heap.

7 Conclusion

The proof system we introduce makes use of a new -connective which allows to interpret the points-to relation in terms of a logical description. It bears some relation with hybrid logic [1] which features so-called nominals and satisfaction operators. Temporally, the nominals describe when is ‘now’, and the satisfaction operator allows to evaluate a formula with respect to a given nominal, thereby changing when is ‘now’. As such, hybrid logic allows to express more than modal logic: an example is “At 6 o’clock, the church bells ring six times.” This sentence is more time-specific than the usual modal operators for expressing ‘always’ or ‘sometimes’. Comparing with the -connective, we see that is even more general notion than what a satisfaction operator provides, since we introduce it as a connective between formulas. This means that formulas can now also take the place of the nominals in hybrid logic, and this allow us to describe a situation, that is, the ‘current’ heap, by means of a formula.

An important result is that our new proof system allows us to show many more equivalences than existing proof systems for separation logic. Thus we go beyond the capability of many existing tools for (automatic or interactive) reasoning about separation logic! It is quite surprising that none of the existing tools can verify some of our particular equivalences. We think this is due to the abstract description of the separating connectives in terms of cancellative partially commutative monoids (cf. separation algebras [2]). How to combine this abstract description with a set-theoretical interpretation of the points-to relation is problematic. This seems to suggest we should start developing new kinds of tools for automatic or interactive reasoning about separation logic, or adjust the existing tools to be able to work around current limitations.

The presented proof system is sound and complete. This will be elaborated upon in following blog posts. In part two we study standard and non-standard interpretations of separation logic, and give the main argument of relative completeness of the novel proof system. Relative completeness is a completeness argument relative to an oracle. This approach is necessary since absolute completeness for standard separation logic is not possible due to failure of compactness. Other topics that we will discuss in this series of articles concern the impact on Reynolds’ program logic [9], expressivity of separation logic and separation logic as an intermediate logic between first-order logic and second-order logic, and intuitionistic separation logic.

Acknowledgments

Cover image credit: Ulysses and the Sirens, John William Waterhouse (Google Art Project)

References

[1]

Torben Braüner. Hybrid logic and its proof-theory. Springer. https://doi.org/10.1007/978-94-007-0002-4

[2]

Cristiano Calcagno, Peter W. O’Hearn, and Hongseok Yang. Local action and abstract separation logic. In 22nd annual IEEE symposium on logic in computer science (LICS 2007), 2007. IEEE, page 366–378. https://doi.org/10.1109/LICS.2007.30

[3]

Stéphane Demri, Étienne Lozes, and Alessio Mansutti. A complete axiomatisation for quantifier-free separation logic. Log. Methods Comput. Sci. volume 17, issue 3 (2021). https://doi.org/10.46298/lmcs-17(3:17)2021

[4]

David Herman and Mitchell Wand. A theory of hygienic macros. In Programming languages and systems: 17th european symposium on programming, 2008. Springer, page 48–62. https://doi.org/10.1007/978-3-540-78739-6_4

[5]

Hans-Dieter A. Hiep. New foundations for separation logic. PhD thesis. Leiden University. Retrieved from https://hdl.handle.net/1887/3754463

[6]

Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Aleš Bizjak, Lars Birkedal, and Derek Dreyer. Iris from the ground up: A modular foundation for higher-order concurrent separation logic. Journal of Functional Programming volume 28, (2018). https://doi.org/10.1017/S0956796818000151

[7]

Willard Van Orman Quine. Word and object. MIT press. https://doi.org/10.7551/mitpress/9636.001.0001

[8]

Andrew Reynolds, Radu Iosif, Cristina Serban, and Tim King. A decision procedure for separation logic in SMT. In International symposium on automated technology for verification and analysis, 2016. Springer, page 244–261. https://doi.org/10.1007/978-3-319-46520-3_16

[9]

J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In Proceedings 17th annual IEEE symposium on logic in computer science, 2002. page 55–74. https://doi.org/10.1109/LICS.2002.1029817

Footnotes

What is called a ‘free reference’ here comes from Quine’s ‘purely referential position’. But, we already use the word ‘pure’ in a different sense, namely that any formula that involves the ‘points to’ construct itself is not pure. Hence we use instead the term ‘free reference’.↩︎

Citation

BibTeX citation:

@article{hiep2024,
  author = {Hiep, Hans-Dieter A. and de Boer, Frank S.},
  title = {A Sound and Complete Proof System for Separation Logic (Part
    1)},
  journal = {dr. heap},
  volume = {2},
  number = {2},
  date = {2024-06-29},
  url = {https://www.drheap.nl/articles/2024/sound-and-complete-proof-system-for-separation-logic-part-1/},
  doi = {10.59350/2gkd1-c0k49},
  issn = {3050-5224},
  langid = {en}
}

For attribution, please cite this work as:

Hans-Dieter A. Hiep and Frank S. de Boer. A sound and complete proof system for separation logic (part 1). dr. heap volume 2, issue 2 (June 2024). https://doi.org/10.59350/2gkd1-c0k49

On invariance and inconsistency

Hans-Dieter A. Hiep — Thu, 15 Feb 2024 23:00:00 GMT

Download the PDF version of this article

1 Introduction

How can you explain important concepts from program correctness in a simple and intuitive manner? In this blog post, we shall have a look at some puzzles and analyze them from the perspective of program correctness. This way we can nicely explain and demonstrate the usefulness of two important concepts, namely invariants and inconsistent specifications.

The puzzles we study here come from the book Algorithmic Puzzles [1] by Anany and Maria Levitin, published by Oxford University Press in 2011. This book presents 150 puzzles that are good candidates for applying analytical and logical thinking skills (the puzzles can also be used as challenging interview questions). We make a small selection of the puzzles, and we will see them answered from the perspective of program correctness. In program correctness, we consider a program to be correct with respect to a given program specification. A program specification is a specific formulation of a requirement. For example, a specification of what the output of a program must be given some input. More specifically (no pun intended), we can rephrase the puzzles in such way that a puzzle can seen as a program specification, and proving that there exists a program that is correct with respect to that specification would then solve the puzzle in question. Or, alternatively, we show that there is no solution to the puzzle, by arguing there cannot be a correct program in the first place.

First we shortly revisit preliminaries (Section 2). This article does assume the reader is already somewhat familiar with the basics of programming and program correctness, but we nevertheless quickly revisit the basic concepts. For a thorough introduction to program correctness, one could take a look at one of the following books (in order of appearance):

A Discipline of Programming by Edsger Dijkstra (1976),
Mathematical Theory of Program Correctness by Jaco de Bakker (1980),
The Science of Programming by David Gries (1981),
Program Verification by Nissim Francez (1992), or
Verification of Sequential and Concurrent Programs by Krzysztof Apt, Frank de Boer & Ernst-Rüdiger Olderog (2009).

Then we shed light on the concept of an invariant by discussing the 5th puzzle of the book, ‘Row and Column Exchanges’ (Section 3). We also look at why declarative specifications are useful by discussing the 12th puzzle of the book, ‘Questionable Tiling’ (Section 4). But we also discuss more generally the importance of invariants and formulating consistent specifications (Section 5).

2 Preliminaries

We shall restrict our attention to a simple imperative programming language: where we use not only as variable but also (possibly with subscripts), where the terms of the language are the usual arithmetical expressions: and where the terms of the language are the Boolean expressions: We also have the usual abbreviations, such as , that abbreviate more complex expressions, such as , respectively. The numerals are also abbreviations of complex expressions

We also have first-order formulas, captured by the following syntax: Other logical connectives, such as and , can seen as abbreviations. First-order logic involves first-order universal quantification , and we have the dual of first-order existential quantification as abbreviation of . Quantification only ranges over individuals, so in our case integers.

Now let us consider semantics. Let be a state (an assignment of variables to integer values). We have the usual semantics for arithmetical expressions and Boolean expressions : denotes an integer value and denotes a Boolean value. Note that an expression depends only on finitely many variables, and we only deal with pure expressions in our simple language. Each statement of our programming language denotes a transition relation of states: where is the set of states (with typical element ), and is the set of pairs of states. A statement denotes a binary relation between initial and final states. Each formula denotes a set of states: in the sense that in each state the formula is true, also written .

In program correctness we combine two languages: a programming language and a specification language. The programming language is already given above. As specification language we take the above first-order language. Note that the variables of formulas in the assertion language are the same variables we use in the programming language. We can now form the Hoare triples: where is called the precondition and is called the postcondition. A Hoare triple is correct when the statement satisfies the input/output specification given by the precondition and the postcondition , and a Hoare triple is incorrect otherwise. Note that the (global) variables of and the (free) variables of the formulas and are bound to each other. Formally, we define where is the left-restriction of the binary relation by the set , that is, . Unpacking this formal definition gives us

Incorrectness means that has a bug. Suppose we start in some initial state which satisfies the precondition , and we execute from that state, and that execution results in some final state . If the final state does not satisfy , then we have found a bug! Formally,

Hoare logic is a formal system in which Hoare triples can be derived, in which case one writes . Hoare logic is sound and (relatively) complete, meaning that we have under some reasonable assumptions.¹ See one of the books mentioned in in the introduction for a presentation of Hoare logic, or Wikipedia.

A quick example is the following Hoare triple. Is it correct or not? To verify the loop, we need to come up with a so-called loop invariant: a condition that holds at four control points (1) before entering the loop, (2) before the loop body begins, (3) after the loop body ends, and (4) after the loop is exited. Finding loop invariants is difficult, and often it takes multiple tries until one finds a suitable invariant. In the above example, one can take: where abbreviates and the chain of inequalities is conjunctive.

3 Invariants

In this section we discuss a puzzle in which invariants play a prominent role. The 5th puzzle of the book Algorithmic Puzzles is ‘Row and Column Exchanges’:

Can you transform the left table into the right table of Figure 1 by exchanging its rows and columns?

Figure 1: Initial and final table.

(It is recommended that the reader first tries out solving this puzzle herself!)

To get a sense of what the puzzle asks for, let us perform the operations of swapping rows and columns in a table. An example of a sequence of successive applications of these operations is shown in Figure 2.

Figure 2: The start of a sequence of exchanges.

This figure shows:

The first table shows the initial table of Figure 1, our starting point in this puzzle.
After the first step, we have exchanged the first and last row. So we swapped the values and .
After the second step, we have also exchanged the first and last column. So we swapped the values and .

Notice that we have now obtained a table, in which the last row coincides with the values of the final table we wish to obtain (the last row is ‘correct’ with respect to the desired final table). To get closer to the final table, we can continue the series of operations as in Figure 3, where we perform two additional steps: we swap the first and third row, and we swap the second and third row.

Figure 3: Continuing the series of exchanges.

In the resulting table, we have colored the cells that have values in the right place when comparing it to the final table in Figure 1. This particular example shows that we are not there yet. Click here for a Rust implementation of this example.

Just giving this single example, where we have not solved the puzzle (since the final table is not ‘correct’), is not a solution the puzzle! But one may wonder, whether there exists a solution at all. If there exists a solution, then we have not yet found it. But, if there is no solution to be found, then just showing this single counter-example is not sufficient proof.

Imagine that these tables are representations of state, where the state is an assignment of integers to variables (each cell in the table is modeled by its own variable, sixteen in total). There are two primitive operations that work on this state:

to swap two columns , and
to swap two rows .

The puzzle can be rephrased by asking whether we can come up with a program that is composed out of these two primitive operations. Instead of our simple programming language given above, where the only primitive operation is the assignment , we instead consider the programming language with only these two primitive operations. In this way we realize encapsulation, in the sense that the program may not directly modify the state by means of an assignment, only indirectly through the exposed operations.

This may remind the reader of object-oriented programming. Each table could be seen as an instance of a class of objects, which has an encapsulated internal state. The class of objects exposes a number of operations, viz. it has a well-defined interface. We ask ourselves now: does there exists a client, which can only work with the interface and not directly modify the internal state, that solves our puzzle?

What does it mean to solve the puzzle? We can formulate the Hoare triple where until are the sixteen variables corresponding to the cells of the table.² Note that in the postcondition we simply require the variables to have the proper values, as indicated in Figure 1. If we can find a program that is composed of only these primitive operations, and prove it correct, we have solved the puzzle!

To understand the meaning of the primitive operations, we give a set of Hoare triples that we take as axioms (technically, we give an axiom scheme). This approach is also known as the ‘axiomatic approach’, where we abstract from the exact semantics of the primitive operations. Here we go (assuming meta-variables and ): The ‘freeze variables’ capture the old values at column , and capture old values at column . In the postcondition, we use the (unchanged) freeze variables to refer to the old values at the beginning of the swapping operation. This argument crucially relies on the fact that the operation only changes the variables in the set . By Hoare’s invariance rule, we know that any property about the other variables thus remains invariant. A similar axiom scheme can be given for swapping the rows.

We could think of an object invariant: a property that holds of the internal state of the object, that must be preserved by every operation that is performed by any client. Note that object invariants may be temporarily broken in the implementation of an operation, as long as the object invariant is restored before the implementation terminates.

The beauty of invariants is that they are a powerful tool for answering these kinds of puzzle questions. When we are able to find some invariant, that is true for the initial table but false for the final table, then we must know: the final table cannot be obtained by means of these operations only, since all the operations preserve the object invariant!

An example of an object invariant in this case would be the property: the table has the values . In other words, every value in the table is in and every value of is somewhere in the table. Let’s formalize it (Equation 1): The set comprehension on the left collects all values in the table in a set. The set expression on the right is the finite set consisting of the integers up to and including . The property now expresses that these two sets are identical, i.e. have precisely the same members. This property holds for the initial state of the object, and it also is preserved by every operation: swapping two rows, or swapping two columns, does not introduce any new values and thus does not invalidate this property. Hence, this property is an object invariant.

The final table of Figure 1 also satisfies the object invariant of Equation 1. So this invariant, while nice to know, is not useful in answering the puzzle question. We can only prove that there is no solution to the puzzle when we find an invariant, that holds of the initial state and is preserved by the operations, but does not hold in the final state.

Just how finding loop invariants (to show the correctness of a program) is a difficult problem, finding object invariants (to show there can be no correct program) is also a difficult problem. Finding invariants may require several tries. Let us try another invariant. Consider that we not only have a set of values, but in fact we have a set of sets of values: The outer set consists of the sets corresponding to the values one finds at each row. And the inner sets consists of the values present at each row. If we swap two rows, the invariant is preserved because the outer set does not care about the order of its values (sets of integers). If we swap two columns, then the invariant is preserved, because the set of values at each row remain the same when we have swapped two columns.

Now, looking at Figure 1 we see that the initial table satisfies this property. However, if we look at the final table we see that it does not satisfy this property. The final table has as set of sets of integers: Sure, the first and last row are correct, so we could focus on comparing the sets which cannot be equal because both sets contain values that are not contained in the other set. Hence the final table does not satisfy the invariant, which finally proves that there is no solution! (We shall further discuss this problem in Section 5.)

4 Logical specifications

We have a look at the 12th puzzle of the Algorithmic Puzzles book, ‘Questionable Tiling’ (with a slightly different phrasing):

Is it possible to tile an 8-by-8 board with dominoes (2-by-1 tiles) such that no two dominoes lie next to each other in parallel?

(Again, the reader should first try to solve this puzzle herself!)

Before even beginning to solve the problem, we should first try to get an exact understanding of the puzzle by understanding each part of the question:

What is a ‘tiling of dominoes’ on an 8-by-8 board?
What does it mean when two dominoes ‘lie next to each other in parallel’?

Suppose we formalize the 8-by-8 board, again by means of a table. Each cell of the table is again understood to be represented by the variables where is the row counted from the top and is the column counted from the left. But what do the values of these variables mean? We could devise the following encoding:

If a variable has value it means that the cell is empty.
If a variable has some positive value, then that positive value identifies a domino piece.

For example, see Figure 4 where we have a table that encodes an 8-by-8 board with only four dominoes. Note that in this and the following pictures, only the numbers in the cells are significant and not the colors. Colors are only for visual aid. Further, what is shown in Figure 4 is not a tiling yet, it is a partial tiling and towards becoming a complete tiling.

Figure 4: An encoding of dominoes on an 8-by-8 board.

One fruitful approach would be looking for patterns. A pattern is, figuratively speaking, a small ‘frame’ or ‘scope’ that you locally could observe in the picture. These patterns are ‘timeless’ and observed of the outcome, and thus do not care about the intermediary state one has passed through to obtain the outcome. Finding patterns is a useful ability of a declarative programmer.

One can observe already the following properties:

Property 1.: (Number of dominoes in tiling)
In an 8-by-8 table, a complete tiling has exactly numbers identifying domino pieces.
Property 2.: (Size of single domino)
Every number identifying a domino piece occurs at most twice.
Property 3.: (Dominoes line up)
Given a cell of a table that contains a number identifying a domino. Now consider its immediate neighborhood (the cells on the top, right, bottom, left—but not the diagonal cells). We observe that the following must hold: a cell above, on the left, below, or on the right of the given cell exists and has the same domino identifying number. The other neighboring cells must have a different value. See also Figure 5 for a picture, but note that these patterns only work for interior cells. For cells on the border, the pattern need not check outside bounds.

Figure 5: Patterns to check that numbers line up. Here, is the number identifying a domino and we have , , .

Now, consider completing the tiling in Figure 4. What domino do we place on the left of the domino identified by number three (the blue one)? It will form a 2-by-2 square. We also form a 2-by-2 square if we would place another domino directly below and in parallel with the domino identified by number four (the purple one). These are undesired according to the puzzle.

We end up with the following property:

Property 4.: (No parallel dominoes)
In each 2-by-2 square there are not exactly two dominoes. See Figure 6 for the two forbidden patterns.

Figure 6: Two patterns that show how two dominoes form a 2-by-2 square.

We can now formalize the properties, and obtain a program specification.

Property 1.: (Number of dominoes in tiling) where is the set of domino identifying numbers that occur somewhere in the table, that is, .
Property 2.: (Size of single domino)
Property 3.: (Dominoes line up)
For every there is some such that where is the set of neighboring coordinates within bounds

Technically, we have that is defined to be equal to so we can use the coordinates to refer to a particular subscripted variable.

Property 4.: (No parallel dominoes)
For every we have

These properties can be abbreviated to , respectively. Now the puzzle amounts to finding a program that changes the variables such that we can prove Consider a program that assigns the cells’ values according to Figure 7. We can now verify whether the program indeed satisfies the specification, by checking whether all properties hold.

Figure 7: An encoding of dominoes on an 8-by-8 board (we use a duotrigesimal numbering system, the shifted ‘extended hex’ numbers).

holds because there are exactly 32 dominoes in the final state assigned to the variables. holds since every number identifying a domino piece occurs exactly twice. Also holds, and this can easily be seen by the different colors. However, checking shows that the property is violated (see the center).

If we slightly generalize the problem, then we see there is a solution. Suppose the board is infinite, i.e. we have an -by- board, which we start tiling from the top-left corner. Then the following brick laying pattern can be continued indefinitely (see Figure 8):

Start horizontally with the dominoes and lay the next on the right of the previous one until the entire first row is covered with dominoes.
Continue vertically with the dominoes and lay the next below the previous one until the entire first column is also covered with dominoes.
We are now in the same situation as before: we want to fill an -by- board, so we repeat the strategy of first laying horizontally and then laying vertically.

Figure 8: Laying dominoes on an -by- board.

Such an infinite board would then satisfy these properties:

The number of dominoes on the infinite board are also infinite.
If we make sure that each domino is represented by a different number, then each such number occurs only twice. For example, we could take the numbering scheme where for each domino that lies on the coordinates and we take as identifier .
The dominoes are placed correctly, as can be observed from the coloring.
There are no parallel dominoes, since each 2-by-2 square has exactly three dominoes.

Note that we avoided the occurrence of four dominoes within a 2-by-2 square, as shown in Figure 9.

Figure 9: Laying dominoes in such a way that there are four different dominoes in the middle 2-by-2 square.

More generally, we never have any of the patterns in Figure 10 occurring. These patterns are called the top-right corner and bottom-left corner. Note that these patterns do occur in Figure 9, so already from knowing the absence of these two corners we also know that there can be no four different dominoes within a 2-by-2 square.

Figure 10: Patterns that never occur in the infinite solution.

Now suppose we would cut off the board of Figure 8 so to obtain an 8-by-8 board. We then see problems occurring at the boundaries, with dominoes sticking out. Here are two instances:

On the first column, we see that the domino falls out of bounds. Hence the only way to lay down that domino is by turning it 90 degrees.
On the second row, we see that the domino also falls out of bounds. Also here we would need to lay down that domino turned by 90 degrees.

What we thus see, is that whenever the board is finite, it must have one of the corners of Figure 10. We shall now argue that it is impossible to satisfy , the property that no dominoes are parallel, whenever we have the (necessary) top-left corner and also either the bottom-left or the top-right corner on the board. We make a number of simplifying assumptions, but these do not hurt our demonstration (that is to say, these assumptions are without loss of generality):

we assume we work on an arbitrary -by- board where is even,
we assume we start with the same type of top-left corner and top-right corner where the horizontal domino is on top,
we assume that both corners occur on the same height.

Figure 11: Starting situation of the impossibility result.

Now consider the situation of Figure 11. Consider that, if we were to satisfy all properties until , it is impossible to place a domino vertically next to nor is it possible to place a domino vertically next to . If we were to place a domino horizontally at the low end at (thus forming a bottom-left corner), then we need to place another domino on top that violates . Hence the only dominoes that are possible are depicted in Figure 12 (a). We end up with the other type of corner (where the vertical domino is on the side) and we can again analyze where to place the next domino in the corner next to and below , and next to and below . After analyzing the possibilities and ruling out those that violate we end up with the situation depicted in Figure 12 (b).

(a) Second situation

(b) Third situation

Figure 12: Stepping towards the impossibility result.

After continuing this way, we see that we construct two ‘lines’, one originating from each corner. It is necessarily the case that these two lines will intersect!

Figure 13: Last part of the impossibility result.

In Figure 13 we see the two lines coming diagonally out of the top-left corner and the top-right corner intersect. The way this plays out is as follows: we start with the corner consisting of dominoes (the top-left corner) and dominoes (the top-right corner). Then we necessarily place domino , but this takes the same place as we would take when we would place a domino in the other corner. We now have two corners, but they share a domino, namely the dominoes (the top-left corner) and the dominoes (the top-right corner). We then place and in the only way possible inside these corners, but we see that this gives us a parallel pair of dominoes in a 2-by-2 square.

Summarizing, the argument goes as follows. If there are two corners on the board that induce a ‘diagonal line’ that intersect, this must give rise to a pair of parallel dominoes. Hence we can not have both a top-left corner and a top-right corner on the board. However, for every -by- board tiling it is necessary to have both a top-left corner and a top-right corner. Hence we cannot have a tiling of the -by- that also has no parallel dominoes.

5 Conclusion

We have now seen two example puzzles, which we phrased by means of asking whether we can come up with a program that satisfies certain requirements. In the first example Section 3 we have seen that the program’s requirements can (1) be stated formally, and (2) a final state was imaginable that satisfies the end goal, but (3) there was no correct program that reaches the final state. In the second example (Section 4) we have seen that the requirements themselves can (1) be stated formally, but already that (2) a final state was not imaginable that satisfies the end goal. If there is no final state that satisfies the requirements, it is impossible to write a correct program. This must be a valid conclusion, since each program only moves from state to state, and there does not even exist a state that satisfied the requirements.

All this serves to show is that program correctness is a difficult subject. It shows that sometimes it is ‘easy to ask’ but ‘difficult to deliver’. Extensive analysis of a problem is required to obtain (1) a formal description of the problem, and (2) proof that the requirements are consistent. Even before one starts writing a program, one already has to face an undecidable problem: namely, to check that the requirements are consistent! And we have seen a concrete example that this is not always the case—even when the problem looks simple. If we then have requirements that are satisfiable, we then face the second difficult problem: does there exists a correct program? We have seen that, no, this is not obvious either. To show that there does not exists a correct program, we need to formulate an invariant that any program preserves but which the final state violates. On the other hand, whenever there exists a non-trivial program (i.e. involving a loop) we also face a difficult problem: to prove it correct requires us to come up with an invariant as well.

This finally gives us two slogans:

To show a program is correct, requires one to find an invariant.
To show there is no correct program, also requires one to find an invariant.

and

Correctness is impossible to attain if the requirements are inconsistent.

Bonus questions.

Can you analyze the problem of swapping rows and columns also in the context of concurrent client programs?
What about tiling the board without parallel dominoes when the boundaries are glued to each other in weird (non-Euclidean) ways?

Acknowledgments

Cover image credit: Thomas Couillard

References

[1]

Anany Levitin and Maria Levitin. Algorithmic Puzzles. Oxford University Press. https://doi.org/10.1093/oso/9780199740444.001.0001

Footnotes

Access to an oracle that provides the valid formulas in arithmetic (this is an undecidable problem), and the expressivity of loop invariants.↩︎
Technically, the ‘quantifiers’ in the precondition are not first-order quantifiers but instead abbreviations where and are meta-variables that range over finitely many constant values: thus the formula is a big conjunction with sixteen clauses where each clause specifies the value of precisely one variable.↩︎

Citation

BibTeX citation:

@article{hiep2024,
  author = {Hiep, Hans-Dieter A.},
  title = {On Invariance and Inconsistency},
  journal = {dr. heap},
  volume = {2},
  number = {1},
  date = {2024-02-16},
  url = {https://www.drheap.nl/articles/2024/on-invariance-and-inconsistency/},
  doi = {10.59350/2r39a-f7740},
  issn = {3050-5224},
  langid = {en}
}

For attribution, please cite this work as:

Hans-Dieter A. Hiep. On invariance and inconsistency. dr. heap volume 2, issue 1 (February 2024). https://doi.org/10.59350/2r39a-f7740

Can the Collatz conjecture be proven, or not?

Hans-Dieter A. Hiep — Tue, 31 Oct 2023 23:00:00 GMT

Download the PDF version of this article

1 Introduction

In 1937, shortly after the mathematician Lothar Collatz obtained his doctorate, he wrote down a problem in his notebook that later became known as Collatz’ problem or the -problem. The problem is remarkable since it is easy to state, but for more than eighty years no solution had been found.

Before the problem can be stated, we first need two ingredients: we define a function, and we recall what is repeated function application. The function is defined on the positive natural numbers (, , , et cetera) with the following specification: Now take an arbitrary positive natural number . We can repeatedly apply the function starting with , i.e. , , , et cetera. We write to mean that the function is applied times starting with , so that we have and . The superscript notation, an operation on a function, should not be confused with exponentiation, an operation on a number. With this in mind, we can state the problem: for every positive natural number , is there an such that ?

We can first try out a number of examples, to gain some intuition about the problem. If we take , can we find an ? Clearly, is , since is odd and so the first clause of the definition of applies, and is . Then we evaluate , which is : since is even the second clause applies so we have . Finally, we evaluate , which is : since is even we have . So , hence we can take to be three. For the problem can be solved.

The process we tried is called iteration. We find by starting with the smallest value and try out successively larger values for until we reach our desired destination. This process works efficiently, since the computations we did in the past to find out whether was a solution can be reused to find out whether is a solution. In the case of Collatz’ problem, we see that the outcome of the previous try, i.e. , is what we feed back as input in the next try, i.e. computes .

Such iterative processes can be written down neatly. Say we start with . How often do we need to apply the function until we reach the destination ? We write the following sequence, where each number is separated by an arrow: the computation sequence starts with and we get after applying once (), we get after applying again (so twice from our starting point), and so on, until we reach our destination after applying five times from where we started. In other words, we have . So for , we know there exists an , namely , such that . But meanwhile we also discovered for , , and a solution, since these were the intermediary numbers that turned up in the computation starting in .

As another example, take . We then have , and , and from that point onward we already know what happens. Starting with the number , we see that we can extend the computation sequence towards the right (each time computing the function ), but we can also extend the computation sequence towards the left: What would be the number before ? And how would the computation sequence starting from look like? The reader may try answering these questions, to gain some intuition about Collatz’ problem.

For some numbers , the solution is not immediately obvious. The reader may try out (but, beware, the computation takes more than 100 steps). In fact, for large enough , the intermediary numbers that the computation goes through can be used to generate pseudo-random numbers that passes standard tests for randomness [12]. This fact may give us the impression that the numbers involved in the computation do not give much insight into solving the problem.

Now, the conjecture states that there is a solution for every . This is what is known as the ‘Collatz conjecture’. Although the problem is quite old, more recently a new interest for the problem emerged—as witnessed by the many published articles, including scientific articles (e.g. [2–4]) and articles in the popular press (e.g. The Simple Math Problem We Still Can’t Solve in QuantaMagazine). Also on social media such as MathOverflow, unanswered questions are raised, such as “How to know when the Collatz conjecture has been proved?”

In 2013, the late John H. Conway wrote about the problem in The American Mathematical Monthly, in an article entitled On unsettleable arithmetical problems [8]. In that article, Conway also speaks of well-known results such as Turing’s unsolvable halting problem or Gödel’s incompleteness theorem. The halting problem is the question whether an algorithm halts (which, as we shall see, has Collatz’ problem as an instance) and Turing showed that it is in general unsolvable. The incompleteness theorem states that there are true statements in a formal system called Peano arithmetic that cannot be proven from Peano’s axioms. The two results are closely related: there is a proof of the incompleteness theorem by reduction to the halting problem [11].

Conway argues that it is very unlikely that the Collatz problem is settleable—a technical term he introduces for true assertions, as phrased in a set theoretical language, that can also be proven. A problem is unsettleable if it is true, but cannot be proven. There surely are unsettleable (or, unsettling?) assertions: the axioms of set theory are insufficient for proving all true assertions, by Gödel’s incompleteness theorem. But I find Conway’s argument, that Collatz conjecture is unlikely settleable, too difficult to grasp—it is probably because I lack the background knowledge and intuition behind his argument. However, Conway is not sure, and still leaves open the possibility for the conjecture to be proven:

“I don’t want readers to take these words on trust but rather to encourage those who don’t find them convincing to try even harder to prove the Collatz Conjecture!” [8]

A quick search on Google Scholar shows articles, that claim to have proven the Collatz conjecture. These articles are not published, but are available on pre-print servers or institutional repositories on the Web. How can we evaluate such articles? Are they really proofs? Is it worth our time to review them?

By accident, I stumbled upon the pre-print paper Collatz conjecture becomes theorem by Mirkowska and Salwicki [10]. Grażyna Mirkowska is a professor emeritus of Warsaw University, and she is an expert on mathematical logic, program semantics, and formal verification. Andrzej Salwicki is also professor emeritus, founder of the journal Fundamenta Informaticae, and he is an expert on the mathematical foundations of computer science. Both worked together on the 1978 book on Algorithmic Logic [9], among other works. Algorithmic logic is closely related to my field of expertise, Hoare’s logic and dynamic logic.

Maybe I can understand what is written in their article? In the final remarks, the authors write:

“We know that our presentation is clumsy (we are of age, English is not our native language).” [10]

It is not a good idea to reject this paper because of a clumsy presentation: the authors have a track record in the field, and their work must be taken seriously. Maybe I can figure out whether their paper makes sense to me? The adventure thus begins... And in this article, I do not present my own work, but I present what I could distill from what Mirkowska and Salwicki wrote.

Note

This article assumes the reader has knowledge of Hoare’s logic. See, for reference: A Discipline of Programming by Edsger Dijkstra (1976), Mathematical Theory of Program Correctness by Jaco de Bakker (1980), The Science of Programming by David Gries (1981), Program Verification by Nissim Francez (1992), Verification of Sequential and Concurrent Programs by Krzysztof Apt, Frank de Boer & Ernst-Rüdiger Olderog (2009). See the survey paper by Apt and Olderog [5] for more references.

2 Problem statement

First, we revisit the conjecture. We can understand the conjecture in a different way, by studying the program in Figure 1. Looking at the program as given, we have the following primitive operations and tests:

the test for deciding whether the number is not equal to ,
the test for deciding whether is even,
the primitive operation for dividing the number by two—but this operation is only executed in the context where we know that the old value of is even and not equal to ,
the primitive operation that multiplies the old value of by the constant three and adds the constant one—this operation is only executed in the context where we know that is odd and not equal to .

If the program terminates on every input , then the Collatz conjecture is true. From the terminating execution of the program we can then extract the computation sequence, simply by looking at the values that take over time. If, however, the program runs infinitely for some input , then we have a counterexample to the Collatz conjecture.

Figure 1: A program representation of the Collatz conjecture.

Let us abbreviate the program in Figure 1 by . We reformulate the question, whether the program terminates or not, as follows:

If is provable for some in Hoare’s logic for partial correctness such that the precondition is satisfiable, then we know that the Collatz conjecture is false.
If is provable in Hoare’s logic for total correctness, then we know that the Collatz conjecture is true.

This formulation raises a number of question, as often is the case in Hoare’s logic. What is the language we use in assertions? What is the program theory? And what is the background theory?

What is the logical language we use in assertions? We restrict ourselves to a first-order language consisting of addition only. This language consists of first-order formulas with respect to a signature with:

the constant individual symbol ,
the constant individual symbol ,
the binary function symbol .

We can also introduce abbreviations: for every natural number we have the numeral . We have , and for any we have . In other words, we have the numerals: Note that these numerals are terms constructed from constants and function symbols, hence do not depend on the value of variables: the numerals are all ground. Also, we implicitly used the fact that since this follows from the background theory we introduce later.

We introduce the abbreviation to stand for where is fresh (so not equal to either or ). abbreviates , and abbreviates .

In a similar way as the numerals, can also introduce abbreviations for multiplication by a numeral: , for . In other words, we have the abbreviations where we implicitly assume that is associative, and (again, we shall introduce the background theory later, from which these properties follow).

What is the program theory? This question amounts to showing how we axiomatize the primitive operations and tests. We introduce a predicate symbol for the test , where is a variable. We can define this predicate symbol in our language as follows: It is necessary that tests are decidable. Otherwise, if one would execute the program one can not make the case distinction in the -statement.

Given that we have a formal understanding of the test, how do we axiomatize the two updates? We introduce the following axiom schemes. The precondition of the division-by-two update states that the original value of must be even before executing the operation. The witness of evenness, , is substituted for in the postcondition—and we require the variable to be fresh, that is, not already occurring in and different from the variable . The precondition of the times-three-plus-one update is the weakest precondition given the postcondition . Notice how we are able to express the new value of in terms of the old value of only by using the constant symbol and function symbol from our signature (recall that our multiplication by a numeral is an abbreviation, viz. ).

Now that we have axiomatized the test and primitive operations, we turn to the last question. What is the background theory? Until now, one may freely interpret what the language means. By means of a background theory we restrict the possible interpretations. From the background theory, we can derive consequences which must hold in every program state.

As background theory we take Presburger arithmetic. Presburger arithmetic is a restriction of Peano arithmetic, as it speaks only of the addition operation on numbers. By Gödel’s incompleteness theorem, the formal theory of Peano arithmetic can be shown to be incomplete, that is, there are valid sentences that are not consequences of the axioms. What is remarkable about Presburger arithmetic, however, is that it’s theory is complete: all valid sentences are also consequences of the axioms.

Furthermore, the axiomatization of Presburger arithmetic is recursive, that is, there exists a procedure to decide what are the axioms of Presburger arithmetic. From this, we also obtain that the formal theory of Presburger arithmetic is decidable: for any sentence, either we can use the standard proof system of first-order logic and prove that the sentence is valid, or we can effectively find a model that satisfies all axioms but in which the sentence is false.

There are different presentations of the same theory. We shall give one that is short on paper, but it is presented by using the unary function . The unary function is defined by , and we have that . The other axioms of Presbuger arithmetic are the following:

The last is an axiom scheme for any formula , and where is the result of replacing the free variable occurrences of by the term . From these axioms, the usual properties of addition follow. For example:

,
,
,
.
et cetera.

3 Different interpretations

Now, we turn to the semantics of the program , of which the halting problem captures the Collatz conjecture. A program is nothing but a piece of text: a finite string of symbols. One may think of the intended meaning of a program, but nothing prevents two people looking at the same string of symbols and interpret it differently. To illustrate this concept, we shall look at two interpretations of . The fact that this simple program can be interpreted in different ways, somewhat delighted me when I first read the pre-print paper by Mirkowska and Salwicki [10].

The standard interpretation is given by taking the following data structure:

Take the natural numbers as domain.
Interpret the symbols in the usual way: is addition of natural numbers, and is less than.

This standard interpretation satisfies all the axioms of Presburger arithmetic.

Alternatively, one could give a complex interpretation, as follows:

Take the following subset of the complex numbers as domain: where is an integer, is a non-negative rational, and is . (We use the Greek instead of the Latin , to avoid confusion with the natural number we used earlier in function iteration .) The condition implies that we have no negative real numbers in our domain.
The constant individual symbols and are interpreted as the complex numbers and , respectively.
The binary function symbol is interpreted as usual: Note in this equation alone, the symbol has four different meanings: the function symbol in our language, we have the on complex numbers, we have the on integers, and we have the on non-negative rationals!

This complex interpretation satisfies all the axioms of Presburger arithmetic.

In the complex interpretation, we have the so-called reachable elements and the unreachable elements. The reachable elements are the complex numbers in our domain with a zero imaginary part (and so, by the condition, we know that ). The unreachable elements are the complex numbers in our domain with a non-zero imaginary part. Any operation performed on only reachable elements, gives us back a reachable element. However, an operation performed with at least one unreachable element, results in an unreachable element. This follows from the fact that the imaginary part is non-negative.

In the complex interpretation, the order of elements is defined by , which abbreviates . We can understand the order relation on complex numbers as it were a lexical order on pairs: . This means that all reachable elements (where ) are below all unreachable elements (where ). Further, in the complex interpretation we have that an element is even iff the real part is even.

With our complex interpretation, we can actually give an infinite run of the program! The following is a demonstration of a computation sequence: The computation sequence can be mapped back to an execution of the program (this takes a bit of work, but it is not difficult to see that the number generated each step is the value of the variable right at the start of each iteration of the -loop). Notice that, since we start with an unreachable value (the imaginary part is non-zero), the test of the loop never fails, and so the body of the loop is always taken. Any of the two operations (division-by-two or multiply-by-three-add-one) on an unreachable value result in an unreachable value again, which can never be equal to the reachable value of .

However, this infinite run cannot be used to argue that the Collatz conjecture is false, since the conjecture speaks of the positive natural numbers and not the complex numbers we have introduced. So, coming back to our reformulated question: in Hoare’s partial correctness logic, can we prove for some , and show that the precondition is satisfiable? No, we can not prove this based on our example above. The reason why is that the complex interpretation is elementarily equivalent to the standard interpretation. This means that the same first-order sentences are true in both the standard interpretation as in the complex interpretation. Since our assertion language uses a first-order language, we cannot express in that the starting value of has to be an unreachable element. Namely, the ability to express that is an unreachable element (which would be false in the standard interpretation, but true in the complex interpretation under existential quantification) contradicts that the standard and complex interpretations have the same first-order theories.

Note, however, that the above correctness formula may still be provable—showing the Collatz conjecture is false. If one gives a proof of in Hoare’s logic for partial correctness, then the proof must be sound for the standard interpretation too. If one then also shows that is satisfiable in the standard interpretation, then the Collatz conjecture is settled (to be false).

On the other hand, if we look at proving in Hoare’s logic for total correctness, are we sure that a proof means that the Collatz conjecture is true? In Hoare’s total correctness logic, we employ a proof rule for reasoning about the termination of the -loop by giving two ingredients: the invariant (an assertion that holds before and after the loop and also before and after the loop body) and the variant (a term denoting a value that must decrease each iteration, and it must be shown that this term is larger than or equal to zero assuming the invariant).

There may be a problem with non-standard interpretations. For example, in the complex interpretation, what does the variant denote? It may no longer express a quantity that ensures that the loop body is executed finitely many times. In fact, we already know that every unreachable element is larger than , and every unreachable element has infinitely many predecessors: so requiring that during the loop body the variant decreases no longer yields an argument that the computation must be finite. Yet, this is not a problem, for we can always ignore non-standard interpretations. The proof rule is sound: interpreting total correctness with respect to the standard interpretation works fine. In that interpretation, the variant expresses that, before executing the loop, we can predict the maximum number of iterations that will be taken based on the values of program variables in the initial state.

Mirkowska and Salwicki suggest that the Collatz conjecture, formulated as questions of provability in Hoare logic as we did above, is not precise. They then start working on the level of the semantics in the complex interpretation, and restrict attention to those states in which the program variables have reachable values. However, here I disagree. That there exists non-standard interpretations is not an excuse to not give a proof in Hoare’s logic: although such a proof can also be interpreted in a non-standard interpretation, there is always the standard interpretation one can look at.

What I thus expected to see was a clear description from which I can extract a proof in Hoare’s total correctness logic. In particular, to perform such an extraction, I need to know the answers to the following questions:

What is the loop invariant of ?
Is the loop invariant expressible as an assertion?
What is the loop variant of ?
Is the loop variant expressible?

Unfortunately, the paper does not clearly give answers to these questions. When I read the paper, I was not able to verify that there actually is a proof. The paper presents many interesting ideas, though, but I feel that I am doing original research to see how the ideas presented are related to me trying to answer the questions above. I should not have the feeling I am doing original research when my task and only task is to verify a proof!

4 Conclusion and closing remarks

Unfortunately, I am unable to extract a proof from the pre-print paper by Mirkowska and Salwicki [10]. Thus, the Collatz conjecture still remains unsettled—at least in my mind—and thus remains unsettling. The lesson, if there is a need for one, is that non-standard interpretations of Hoare’s logic exists, and may shed light on a problem from a different angle.

At some point during reading, I got the following intuitions. I share them here—just for fun—but these closing remarks may not be valuable to anyone and may only cause confusion.

If we revisit the computation sequence above, that yielded an infinite run, we see the following pattern (we call these the even step and the odd step): For simplicity, assume is a positive natural number, and are natural numbers. Forgetting about the imaginary part, we see simply that the function we defined in the introduction is applied on . However, in the imaginary part, we see that the rational number is ‘counting’ how many times which part of the function was applied. If was even, we increase the denominator from to . If was odd, we increase the numerator from to .

Now, for any computation sequence on natural numbers of the domain of the standard interpretation, there exists infinitely many corresponding computation sequences on the complex numbers of the domain of the complex interpretation. We are free, so to say, to choose the imaginary part of the complex number. So, let us look at the complex interpretation. From now onward, my thoughts are a bit cloudy and imprecise—so we proceed in unclear territory.

Since in computations of the complex interpretation, ‘we are free’ to choose the imaginary part, could we treat the imaginary part as a ghost count of the number of times we see an even number and the number of times we see an odd number? If the even numbers tend to outnumber the odd numbers then we are going towards the exit—but at any moment we are doubtful, we may freely reset the counters. Even when we reach the destination, we may still continue (here means multiple steps, even or odd): Here the denominator grows faster than the numerator: the imaginary part tends towards zero in the limit if we never reset the ghost counter back to . Does this intuition help? I do not know. Maybe one can show that there is a computation sequence in the complex interpretation in which the imaginary part tends to zero if and only if there is a corresponding computation sequence in the standard interpretation? If so, can we then also show that a computation sequence tends to zero if and only if it contains a number with as real part?

The intuition here is that we want to jump ahead a potential computation sequence starting in some number, as if we could travel in time. The target of our jump is a number with the same real part—that would be evidence of a loop. If the loop also exists in the standard interpretation, then all the intermediary numbers must be natural numbers. Suppose we are at an arbitrary section of the computation sequence: where . The sequence are the values of the ghost counters. We now want to reset the ghost counters back to , but keep the real part. Note that the imaginary part of is the fraction , meaning both even and odd counters are zero. After resetting the ghost counters, we have the section that starts with : where the sequences and are running ghost counts with . For any index in the sequence , we have that .

Is it possible to have but where the numerator grows faster than the denominator, i.e. ? If so, can it then also have occurring in the sequence? Note that is never reached as a result of (only with zero real part would give , but that never occurs since is even and thus stays even), so from that point onward the denominator would dominate the numerator.

And what about the non-repeating computation sequences? Where there is no loop, but in which the imaginary part does tend to zero?

So many questions are still floating around in my mind, like butterflies...

Acknowledgments

Cover image credit: Ian Parker

References

[1]

Idriss J. Aberkane. A collatz proof from the book. (2021). Retrieved from https://hal.science/hal-03364693

[2]

Paul J. Andaloro. The problem and directed graphs. Fibonacci Quarterly volume 40, issue 1 (2002), page 43–54. https://doi.org/10.48550/arXiv.math/0608208

[3]

Ştefan Andrei, Manfred Kudlek, and Radu Ştefan Niculescu. Some results on the Collatz problem. Acta Informatica volume 37, (2000), page 145–160. https://doi.org/10.1007/s002360000039

[4]

Ştefan Andrei and Cristian Masalagiu. About the Collatz conjecture. Acta Informatica volume 35, (1998), page 167–179. https://doi.org/10.1007/s002360050117

[5]

Krzysztof R. Apt and Ernst-Rüdiger Olderog. Fifty years of Hoare’s logic. Formal Aspects of Computing volume 31, (2019), page 751–807. https://doi.org/10.1007/s00165-019-00501-3

[6]

Dora M. Ballesteros, Jimmy Peña, and Diego Renza. A novel image encryption scheme based on Collatz conjecture. Entropy volume 20, issue 12 (2018), page 901. https://doi.org/10.3390/e20120901

[7]

Oliver K. Clay. The long search for Collatz counterexamples. Journal of Humanistic Mathematics volume 13, issue 2 (2023), page 199–227. https://doi.org/10.5642/jhummath.YQHO7207

[8]

John H. Conway. On unsettleable arithmetical problems. The American Mathematical Monthly volume 120, issue 3 (2013), page 192–198. https://doi.org/10.4169/amer.math.monthly.120.03.192

[9]

Grażyna Mirkowska and Andrzej Salwicki. Algorithmic logic. Springer.

[10]

Grażyna Mirkowska and Andrzej Salwicki. Collatz conjecture becomes theorem. (2023). https://doi.org/10.48550/arXiv.2310.13035

[11]

George Tourlakis. Gödel’s first incompleteness theorem via the halting problem. In Computability. Springer, page 265–280. https://doi.org/10.1007/978-3-030-83202-5_8

[12]

David Xu and Dan E. Tamir. Pseudo-random number generators based on the Collatz conjecture. International Journal of Information Technology volume 11, issue 3 (2019), page 453–459. https://doi.org/10.1007/s41870-019-00307-9

Citation

BibTeX citation:

@article{hiep2023,
  author = {Hiep, Hans-Dieter A.},
  title = {Can the {Collatz} Conjecture Be Proven, or Not?},
  journal = {dr. heap},
  volume = {1},
  number = {4},
  date = {2023-11-01},
  url = {https://www.drheap.nl/articles/2023/can-the-collatz-conjecture-be-proven-or-not/},
  doi = {10.59350/7cfjs-b4x59},
  issn = {3050-5224},
  langid = {en}
}

For attribution, please cite this work as:

Hans-Dieter A. Hiep. Can the Collatz conjecture be proven, or not? dr. heap volume 1, issue 4 (November 2023). https://doi.org/10.59350/7cfjs-b4x59

What is an education?

Hans-Dieter A. Hiep — Fri, 05 May 2023 22:00:00 GMT

In university, one sees students attending lectures and labs, working together on assignments, and doing exams. But—standing still for a moment—what is the purpose of all this business? The answer involves education, in the sense that students are being educated. But what is an education, precisely? In this blog post, I will summarize my views on the matter.

Figure 1: Music Video: Pink Floyd - Another Brick in the Wall (a protest song against corporal punishment)

Some say that an education is more than professional training. By such professional training, I mean equipping students with useful skills such as communicating with humans or with computers, finding interesting or profitable problems to work on that are relevant to society at large, and finding novel, practical, efficient, and elegant solutions to such problems, et cetera. One could speak of skill competency, being measured at levels that indicate how well an individual is able to perform in a particular skill. There are skills which are considered more useful than others in a professional setting, thus correspondingly considered worth more. Following the tradition of New Public Management, one can then do a back-on-the-envelope costs-and-benefits calculation (in Dutch this is called ‘studierendement’): surely, becoming proficient in computer science is worth the effort, to both the student at the individual level, and to society at the collective level. But all this skill development should not cost too much: there should be a positive Return On Investment!

Although professional training works, and could be very valuable, it also misses essential points of an education. But what, then, is an education more than training students and developing skills? Below, I shall outline the objectives of an education that go beyond the objectives of professional training. Generally speaking, during academic study, a student:

progresses through human development into an independent and disciplined thinker,
cultivates a disinterested love of truth,
develops a scientific attitude to be able to understand, and be critical of, technological developments in the field, and
becomes empowered to defend one’s own outspoken position in public.

All these points contribute to an education. Simply put, an education helps students to become self-sufficient. I shall discuss the above points one-by-one, and reflect on each general statement how, in particular, it applies to computer science education.

1 Independent and disciplined thinker

Small is the number of them that
see with their own eyes and feel with their own hearts.

— Albert Einstein

A typical university student (age: 18-25) progresses through the human developmental stage of emerging adulthood and enters early adulthood. In these stages of human development, students improve in functioning (e.g. mastering technical skill), become disciplined actors and critical thinkers (e.g. improving use of language, developing awareness of context-sensitivity, and engaging in self-reflection), and develop a clear sense of self and identity (e.g. by demonstrating intellectual curiosity and creativity). An excellent education aims to improve not only a student’s capability in technical skill, communication, comprehension, and reasoning ability—but also in attaining and maintaining a position of (financial) stability, physical and mental health and safety, the perseverance to overcome challenges, and the resilience to handle failure gracefully.

In computer science there are lots of different technical skills and languages by which a student can develop, e.g. there are many programming languages, logical languages, and mathematical languages to study and practice with. Each language comes with its own set of challenges: some are easier to learn and others are more difficult. The required level of discipline varies: some languages are rigorous and exact, while others are more lenient and fuzzy. In a well-rounded curriculum, there should be a lot of room for exploration and creativity: I believe that the best form of learning is rediscovery, in which (practical and theoretical) results, and their importance, are arrived at by oneself.

Ideally, the objective of education is not to learn just new languages, but also to reflect on the learning process itself and learn how to learn. Each language is a tool to achieve this objective. The end goal is attained if a student is able to confront challenges and tackle difficulties encountered during learning, which often necessitates independent study and a disciplined approach to thinking and working. A mean to this end is progression through gradually more difficult subjects in time. To keep progressing further, it seems that the following factors are crucial for student engagement: confidence, interaction, and autonomy (CIA).

Confidence. A lack of confidence hampers student engagement with the subject. Confidence can be increased by allowing students to revisit the necessary background knowledge, allowing students to check their own work, and giving hints towards a solution to exercises in case students are blocked for too long. Confidence can be decreased by obscurity of reference material, a lack of (direct or indirect) feedback, or information overload.

Interaction. Ideally, students are intrinsically motivated to independently study the subject material. However, most times students need help from, and interaction with, their social surroundings. One effective way is to work together, in the sense of mentorship, to guide students through the material by means of a dialogue. Sometimes students can help one another (which ideally is useful for both), but this requires ‘matching up’ students at different levels. It is a clear signal of insufficient interaction if interaction only starts after a final examination.

Autonomy. Ideally, each student can autonomously choose their next direction to explore in the subject (depending on their starting point). Sometimes it is a joy to discover that one has entered a dead-end: a valuable lesson! But finding oneself too often stuck in dead-ends is risky. Hints or nudges towards the right path are helpful to avoid going too far astray—however, these hints or nudges should never be enforced, and only offered as advice.

2 Disinterested love of truth

If an opinion contrary to your own makes you angry, that is a sign that you are subconsciously aware of having no good reason for thinking as you do.

— Bertrand Russell

Truth exists regardless of one’s expectation. Something is not true because one wants it to be true or profits from it being true. One should seek the truth from a position of freedom: disinterestedness in the sense of free from bias or (self-)interest. It may be difficult to engage all students in cultivating curiosity for why something is true. This could be the result of students being subjected to a prior non-free learning experience, where students were instructed to ‘just remember’ what was said and take it for granted, or to ‘just do’ what an assignment asks for—rather than stimulating students to raise questions and take the initiative to study the subject more deeply. Surely, it is easier to let students stop thinking, than not!

To cultivate a disinterested love of truth, one question to keep in mind is: what is true? Or even deeper, what is truth? It is not wise to claim that something is true in the absolute sense, or that only one person knows the truth: truth is something each student has to find out for themselves. The most reliable form of truth, looking from an epistemological point of view, is experienced individually: to be convinced of truth requires honest, independent, and disinterested verification by means of individual mental construction (theory), or direct experience by ones own senses (practice). As a lover of truth, viz. a philosopher, a student seeks answers to questions of truth through diverse means. This requires students to carefully and critically assess their own perception, memory, and (prior) beliefs and skills. Every (including one’s own) testimony, evidence, judgment, theory, or ideal should be regarded critically, or even skeptically when in (self-)doubt.

However, criticism and skepticism is bottomless, and students also need to be able to overcome the Münchhausen trilemma: even the most rigorous methods for attaining truth have limitations, and there are no absolute or complete methods for obtaining truth or certainty. Generally speaking, one should be interested in comparing the authority (in the sense of convincing force) of different arguments, and should be able to follow and participate in academic debate that has culminated or is culminating in consensus.

Some questions fundamental to computer science are: what are the foundations of computer science? What are the limits of computing? What are the limits of logical, mathematical, and scientific reasoning? What languages can (not) be used to express ideas clearly? When comparing two entities, what differences are significant (or not)? What basic facts about computer science are relevant, and how does this depend on the context? What research agendas are out there and are they conflicting or complementary? What distinguishes fashions and opinions from necessities and facts? How can one turn vague ideas into concrete conceptualizations? What academic debates were relevant in the past, are still relevant today, and will become relevant in the future? Et cetera.

3 Scientific attitude

The originator of a new concept […] finds, as a rule, that it is much more difficult to find out why other people do not understand him, than it was to discover the new truths.

— Hermann von Helmholtz

It is important for students to develop a scientific attitude that is critical of technological developments in the field. A scientific attitude is exploratory in nature and embraces exciting speculation, necessitates experimentation, but requires one to be scrupulous in publicly making claims, and only to present findings that others can independently verify. Ideally, a scientific discovery lays bare not only the presence of a mechanism, but also its underlying structure that explains why predictions are reliable and how they are correct. Technological progress should be considered within a wider ethical context, also keeping in mind the actual and potential detrimental effects on human society and nature.

Especially in the field of computer science, a critical outlook on technology is important, since not all developments in our field are necessarily beneficial to all of humankind or nature. For example, most computer technology only benefits a tiny portion of humanity: is this fair, or not? Are practitioners that apply the results of computer science misguided or misinformed, even when they form a large majority influencing consensus and best practices, or not? Consensus is not a replacement of truth, but standing up against consensus might be risky for one’s career or public standing (e.g. nobody likes the partypooper).

Figure 2: Video Clip of Kindergarten Cop (1990) with Arnold Schwarzenegger

Especially in computer security, sometimes one needs courage to raise issues and be very confident of one’s case, also considering the potential (legal) repercussions instigated by large companies and governments. But also in theoretical computer science, it is too easy to forget ‘the bigger picture’ and lose oneself in a deluge of technicalities: how can one be certain that the subject is studied as simple as is possible, without being too simplistic? The cases above necessitate a scientific attitude, to ensure that one only takes positions that can withstand reasonable doubt.

Critical thinking is an important skill, especially in an academic discipline such as computer science. Many computer scientists fall in the trap of making things too complicated (unfortunately, myself included). How can one prevent such tendencies? Working on complicated and obscure subjects may scratch one’s own itch of intellectual curiosity, but the trap is that it makes the field more difficult for outsiders to enter. It requires a good education to appreciate the fine balance between necessary complexity and accidental complexity of programs, theories, and systems. The key is that simplicity and elegance, while difficult and hard to attain, pays off by making it easier to explain to others what one did.

4 Empowerment

Don’t, for heaven’s sake, be afraid of talking nonsense!
But you must pay attention to your nonsense.

— Ludwig Wittgenstein

Finally, empowering students allows them to defend their own, outspoken position in public. A reasonable position is supported by rational and well-founded argumentation, presented in an orderly (no chaos), timely (not too long), and, ideally, beautiful manner. To appreciate reason as a means of establishing truth, one has to be receptive of criticism, to be open-minded towards the possibility of finding faults and flaws in one’s own position. A good defender is capable of seeking out opponents, even if those opponents are established authorities, and engage in a mutually respectful, decent, and civilized debate. But, every defeat is a learning opportunity: losing an argument is fun!

There are many opportunities in computer science for taking in and trying to defend a position, for example in designing programs (and surviving programming errors), theories (and surviving inconsistency), and systems (and surviving design flaws). There are qualitative and quantitative approaches by which any idea can be evaluated, such as (among many others) program correctness and efficiency, system resilience, error tolerance, usability and extensibility, and theoretical elegance and simplicity: an educated person is able to appreciate these fine details.

5 Acknowledgments

Cover image credit: Elisa Calvet B.

Citation

BibTeX citation:

@article{hiep2023,
  author = {Hiep, Hans-Dieter A.},
  title = {What Is an Education?},
  journal = {dr. heap},
  volume = {1},
  number = {3},
  date = {2023-05-06},
  url = {https://www.drheap.nl/articles/2023/what-is-an-education/},
  doi = {10.59350/ej3j0-jb103},
  issn = {3050-5224},
  langid = {en}
}

For attribution, please cite this work as:

Hans-Dieter A. Hiep. What is an education? dr. heap volume 1, issue 3 (May 2023). https://doi.org/10.59350/ej3j0-jb103

Correctness of two sorting algorithms

Hans-Dieter A. Hiep — Tue, 14 Mar 2023 23:00:00 GMT

Download the PDF version of this article

1 Introduction

On Wednesday, 8th of March, 2023, I gave a lecture about two sorting algorithms for the course Program Correctness (see also the lecture series on YouTube, available only in Dutch). Then, due to limited time, I only discussed a sketch of the correctness proof. But I promised the students that I would write down the correctness argument in more detail. So, in this article, I will revisit the two sorting algorithms and give the correctness proof in full detail.

We had a look at two sorting algorithms: gnome sort and bozosort. The purpose of a sorting algorithm is to operate on an array and rearrange its elements in order. The two algorithms presented are not the most efficient sorting algorithms, but that is not of our concern: instead, we will look at them from the perspective of their correctness.

The main questions answered in this article are:

What is the (intuitive) argument of correctness of these algorithms?
How to write down a proof outline for these algorithms?

When proving an algorithm correct, it is important to first have a rough, informal, idea how the correctness argument should go. Only then it makes sense to formalize the argument, by writing down a proof outline. During the latter activity, one can systematically check the argument to ensure there is no fault in one’s own reasoning. Thus, both aspects are important: the bigger picture and the devil’s in the details.

Note

This blog post assumes the reader has basic knowledge of Hoare logic. In case you want an introduction, you can consult one of the following books (in order of appearance): A Discipline of Programming by Edsger Dijkstra (1976), Mathematical Theory of Program Correctness by Jaco de Bakker (1980), The Science of Programming by David Gries (1981), Program Verification by Nissim Francez (1992), Verification of Sequential and Concurrent Programs by Krzysztof Apt, Frank de Boer & Ernst-Rüdiger Olderog (2009). See the survey paper by Apt and Olderog [1] for more references.

2 Sorting

The purpose of a sorting algorithm is to rearrange elements in an array, so that the final result is an array where all elements are in order. For simplicity, we assume we are dealing with an array of integers. The ordering of integers is their natural order, i.e. .

Given array of type . We define the following predicate: The above predicate expresses that the whole array is sorted. We also define: This predicate expresses that array is sorted on the range , i.e. from index until and including index .

Alternatively, we can define the following predicates: and It can now be verified (e.g. using a proof system for predicate logic such as natural deduction) that and .

We may use these predicates to describe the desired outcome of a sorting algorithm: namely, that array is sorted (on a particular range). However, this property alone is not sufficient. We also require a relation between the input array and the output array, to specify that the algorithm did not insert new, duplicate old, or throw out any elements. Note that the input array and the output array are stored in the same place in memory, so it matters not where we look but when we look. By input array we mean the value of the array before the algorithm runs, and by output array we mean the value of the (same) array but after the algorithm finished running.

To avoid algorithms inserting new, duplicating old, or throwing out elements, we require there exists a one-to-one correspondence between the input and output array.

Given array of type . We use as the name for the input array, whereas is the name for the output array. We now define the following predicate: This predicate expresses that the whole array is a permutation of array . The intuition of is that represents a bijection between integers: a one-to-one correspondence between the indices of the output array and input array . Here we make use of the following definitions: and

(As an aside, i.e. not relevant for the rest of this article, note that it depends on the language in which we work whether quantification over is first-order or not. If we work in the language of Peano arithmetic, this quantifier is higher-order. But if we work in the language of set theory, this quantifier is first-order where ranges over sets representing functions .)

We actually need a stronger predicate than , namely to expresses that array is a permutation of array for a particular range, and leaves all other elements in place. Compare this with how we have two predicates for being sorted: and . So we now define the following predicate: The new condition requires of the bijection that every index that falls outside of the range is mapped identically. We could say that the latter predicate expresses a restricted permutation.

3 Gnome sort

Gnome sort is a simple sorting algorithm. The story behind the algorithm is as follows. Suppose there is a garden with flower pots arranged next to each other on a table. Each flower pot contains a beautiful flower of a certain height. A gnome comes along, and being a pedantic gnome, wants to arrange the flowers in such way that the flowers in the pots are ordered from the smallest flower to the largest flower on the table.

How does the gnome achieve this? The gnome stands next to the table, in front of a single flower pot. The procedure is easy:

The gnome starts at the leftmost flower pot.
If there is no preceding flower pot, or if the flower in the preceding pot is smaller than the flower in the pot in front of the gnome, then the gnome takes one step to the right.
If there is a preceding flower pot and the flower in the preceding pot is larger than the flower in the pot in front of the gnome, the gnome swaps the two flowers and takes one step to the left.
If the gnome has not reached the end of the table, it goes back to step 2.

Now, we use an array to represent an array of flower pots, each cell of the array is a flower pot, and the value stored in each cell represents the height of a flower. Swapping the values of two cells of the array would represent that the gnome, being a true gardener, would take out the flowers of the two pots and place them back in the other pot. Try to imagine how the gnome runs!

Note that gnome sort is slightly different from insertion sort. In the insertion sort algorithm, we need to keep track of two locations: the location of the element which is being inserted in the proper place, and the location of where to insert that element. Insertion sort is typically implemented using a nested loop: after the given element is inserted in the prefix (inner loop), we can continue with the next element after the prefix (outer loop). However, in gnome sort, there is only a single position that is tracked, namely the position of the gnome. The gnome has to walk back after it has placed the element in the proper position, and thus performs more comparisons than insertion sort.

Figure 1: The gnome sort algorithm.

We can write down an algorithm that encodes the procedure of the gnome: see Figure 1. We are given variables and of type . These represent the range of indices in the array representing the flower pots, where is the index of the first flower pot and is the index of the last flower pot. The variable of type represents the position of the gnome, and the variable of type is a temporary variable used for swapping the flowers.

We can make the following observations of the algorithm in Figure 1:

If then the algorithm terminates without modifying the array. Otherwise, the bounds of the position are: .
The array from to is always sorted. This property is a loop invariant: it holds at the beginning of the loop, at the beginning of the loop body, at the end of the loop body, and after the loop.
The sorting algorithm does not insert new elements, duplicate old elements, or throw out elements: thus, the output array (that is, the value of after running) is a permutation of the input array (the value of before running) restricted to the given range . At the beginning and end of the loop body, the current value of the array is also a restricted permutation of the input array, and this property is a loop invariant. However, this is temporarily broken when we swap around values in the array!

We now formalize the correctness proof, by means of a proof outline (see Figure 2). First, we introduce a freeze variable, the array of type . We may think of as a snapshot of the array at the time before running the algorithm. Since is never modified by the program, it maintains its value over time, and thus allows us to compare the actual value of with its original value.

Figure 2: A proof outline of the gnome sort algorithm.

The precondition as formulated expresses that the freeze variable contains a snapshot of array at this instant. Also, we restrict ourselves in this proof to the case where holds (since, otherwise, the algorithm has no effect).
This assertion follows from (2.1) since the following properties hold: We take as a witness (for the existentially quantified in the definition of ) the identity function, which is a bijection. Also, the following property holds: since the range must be empty if , and also holds.
As we already observed from the algorithm, we can now formalize the loop invariant. The first part expresses that the array from until (but excluding) must be sorted. The second part expresses that the actual value of is a permutation of the input array (given the name ), restricted to the range . The third part expresses the bounds of the position . Note that by applying the substitution rule we obtain assertion (2.2), so we have verified that this loop invariant is initially valid. In the remainder of the proof outline we check whether the loop invariant is preserved by the loop body, and allows us to conclude our post condition.
This assertion always holds at the start of the loop body, where we know the loop test is true. Thus we can make the invariant stronger: we know that subsumes .
We have obtained this assertion in the following way: we need to establish the loop invariant of (2.3) at the end of the loop body. Hence, it has to be a postcondition of the then-branch of the conditional statement. We apply the substitution axiom that replaces by . Now, why does this assertion follow from the preceding assertion? We discriminate two cases:
- Case . We can establish from the general property mentioned at (2.2).
- Case . The following property holds: since we extend the range but also have the required property about and (and by transitivity of , also for all elements before , cf. why the predicates and are equivalent).
That follows from (shown with dots) is trivial, and it is easy to see that and .
The assertion above this one is obtained from (2.4) and the negation of the conditional test. This assertion follows from the one above by the following argument. We have the property: since the range becomes smaller. implies itself is again trivial, and and are easy to see.
But how did we obtain the assertion at (2.6) in the first place? Intuitively, it is the result of swapping and in the assertion (2.8). Swapping does not affect . We have the following property: since for all : there is no aliasing here!

The proof outline has a missing link: why is it that and are still restricted permutations when we swap and ? The essence of the argument is that from the witness of , we can construct a new witness, say , as follows: and and for all other arguments and agree. Further, and fall within the restricted range of the permutation.

To phrase it in a different way, we have the following property: if .
Easily obtained from the loop invariant by replacing by .
We check that the loop invariant and the negation of the loop test actually leads to the desired postcondition: we have that must hold.

This concludes the correctness argument of the gnome sort algorithm.

4 Bozosort

Bozosort is also a simple sorting algorithm. Suppose you stand in front of a table with cards on its surface, and each card has a number printed on it. Now pick any two cards on the table and swap them. Repeat this, until the numbers on the cards are all sorted from small to large. In this algorithm, we make use of a source of randomness: namely, to pick two cards to swap.

Again, we use an array of type to represent the cards on the table. Each cell of the array corresponds to a spot on the table, and the value stored in the array is the number printed on the card.

Figure 3: The bozosort algorithm.

We can write down an algorithm that encodes this procedure, see Figure 3. Here, we are given variables and of type representing the bounds of the array. The algorithm consists of an outer loop and an inner loop. The loop body of the outer loop has two components (separated by vertical space). By component I simply mean a subprogram. The first component chooses two random numbers and swaps the values in the array. The second component walks through the array to test whether it is actually sorted. We use the variables and of type , and the variable of type .

To understand the meaning of the random assignment, we can make use of the following axiom: where is an arbitrary formula where does not occur free, and does not occur in the arbitrary expressions or . Intuitively, this statement selects (non-deterministically) an integer between and and updates the value of variable with the selected value.

What happens when is the case and we perform the random assignment? According to the above axiom, we obtain The postcondition is contradictory, so equivalent to . Thus, operationally, we could think that running the random assignment from such a situation is equivalent to running a program that never finishes. This works because we look at Hoare triples in their partial correctness sense.

We can make the following observations of the algorithm in Figure 3:

The actual value of array is a permutation of the input array at any control point, except in the middle of the first component where we perform the swapping of two elements. Thus, we could use this fact as a loop invariant of both the inner and the outer loop.
After the second component finishes its execution, the variable represents whether the array is actually sorted or not. Hence, it is a loop invariant of the outer loop that if is true, then the array is sorted. This means that the loop only exists when the array is actually sorted!
The inner loop that checks whether the array is sorted looks a bit like gnome sort: the position variable is moved to the right whenever we have tested the array elements are in order. But, instead of walking to the left, the inner loop has an early exit in case it encounters two elements that are not properly ordered. By setting to , the outer loop must run again.

Figure 4: A proof outline for the bozosort algorithm.

We now formalize the correctness proof, by means of a proof outline (see Figure 4). Again, we introduce a freeze variable, the array of type , with the same purpose as before.

This assertion follows from the precondition, since is vacuous for any formula , and the precondition implies for the same reason as given in (2.2).
Here we have formalized our intuition of the loop invariant for the outer loop. Note that the assignment axiom allows us to obtain (4.1).
When entering the loop body we know that must be , so we can adapt the loop invariant: only information about the array being a restricted permutation remains present here.
Obtained by applying our axiom for random assignments.
Also obtained by applying our axiom for random assignments. Note that from this assertion it already follows that .
Again we have the swapping of two elements. The argument needed here is a slight generalization of the argument of (2.7) above, where the essence is this property (given and ): Note that if the two random variables have the same value, the swap has no effect.
Comparing to (4.3), we now also know that . Clearly, the three preceding assignments cannot affect and we already knew it holds in (4.5). In the other case (that holds), we would not even reach this point in the program.
We here formulate the loop invariant for the inner body. The essence is that we know that everything on the left (and including) of must be sorted. This is initially valid, since is true as in (2.2). We still need to establish that this is indeed a loop invariant.
When the loop body is entered, we know both and . The latter is used to make our assertion stronger than the loop invariant.
In the case the elements are properly ordered, we can again apply the property as mentioned in the second case of (2.5).
Easy to see that implies . It is quite surprising that the assertion at this point does not say anything about variable .
We now establish that the outer loop invariant follows from the inner loop invariant, under the condition that the inner loop has terminated (so it’s test must be false). There are two cases:
- Case : we have an early exit of the inner loop. Since is , the left conjunct is vacuous. The right conjunct trivially follows from the inner loop invariant.
- Case : we know the inner loop has fully executed, so from the upper bound in the loop invariant. So must hold, regardless of the value of .
The postcondition is obtained by a double negation elimination on the test of the outer loop: implies , and so from we obtain that the array is actually sorted. Again, follows trivially from the outer loop invariant.

This concludes the correctness argument of the bozosort algorithm.

5 Conclusion

We have seen two sorting algorithms, and discussed their correctness proofs. Although almost every detail is present here, there still remains a good exercise for practicing with array variable substitutions: to write down the proof outline of swapping the array elements above, and working out in all detail how it affects the restricted permutation predicate .

In this article, we only look at program correctness in the sense of partial correctness. An interesting question remains: what can we say about the termination of these algorithms? Under what conditions do these algorithms terminate? In the next following weeks of the Program Correctness course, we will look at total correctness, where we shall prove not only the correctness of a program with respect to a specification of its input and output behavior, but also whether the program terminates!

Acknowledgments

I thank Dominique Lawson and Roos Wensveen (both student assistants of the Program Correctness course) for suggesting improvements, and discovering an error in a previous version of this article. The error was that the postcondition needs to be and not (can you see why?). All remaining errors remain my own.

References

[1]

Krzysztof R. Apt and Ernst-Rüdiger Olderog. Fifty years of Hoare’s logic. Formal Aspects of Computing volume 31, (2019), page 751–807. https://doi.org/10.1007/s00165-019-00501-3

Citation

BibTeX citation:

@article{hiep2023,
  author = {Hiep, Hans-Dieter A.},
  title = {Correctness of Two Sorting Algorithms},
  journal = {dr. heap},
  volume = {1},
  number = {2},
  date = {2023-03-15},
  url = {https://www.drheap.nl/articles/2023/correctness-of-two-sorting-algorithms/},
  doi = {10.59350/yzrje-rgv51},
  issn = {3050-5224},
  langid = {en}
}

For attribution, please cite this work as:

Hans-Dieter A. Hiep. Correctness of two sorting algorithms. dr. heap volume 1, issue 2 (March 2023). https://doi.org/10.59350/yzrje-rgv51

Can ChatGPT do an exam?

Hans-Dieter A. Hiep — Wed, 22 Feb 2023 23:00:00 GMT

In this article, I’ll publish the answers given by OpenAI’s ChatGPT for the final exam of the course Concepts of Programming Languages, a course I gave in the fall semester of 2022. I leave it to the reader to decide whether the given answers are correct or not ;-)

ChatGPT is a research project developed by OpenAI, which gives the following charter on their website: “OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.” I have used version Februari 13 (2023) of ChatGPT.

After creating an account and logging in, the user interface for ChatGPT gives the following disclaimer: “While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.” So, we’re better beware of the answers…

I give it the following first input: Hi. I would like to ask you a number of exam questions about the course Concepts of Programming Languages.

ChatGPT: Of course, I’ll do my best to help you with your exam questions about Concepts of Programming Languages. What would you like to ask?

So, here we go. All quotations are generated by ChatGPT.

1 Question 1.

What distinguishes a high-level programming language from a low-level programming language? Choose one or more correct answer from below.

High-level programming languages are used by more programmers than low-level programming languages.
High-level programming languages are always compiled into low-level programming languages, whereas low-level programming languages are directly executed by hardware processors.
High-level programming languages allow (some) concepts to be expressed more directly, whereas low-level programming languages require (convoluted, error-prone) encodings to express the same concepts.
A low-level programming language allows the programmer to describe more efficient computations, whereas a high-level programming language describes less efficient computations.

ChatGPT: The correct answer is 3. High-level programming languages allow (some) concepts to be expressed more directly, whereas low-level programming languages require (convoluted, error-prone) encodings to express the same concepts.

Explanation: High-level programming languages are designed to provide a more abstract view of programming, allowing the programmer to express concepts and algorithms more easily and directly. They typically use more human-readable syntax and provide built-in abstractions for common programming tasks, such as memory management, string manipulation, and I/O operations. In contrast, low-level programming languages require the programmer to work more directly with the underlying hardware, requiring more detailed knowledge of the system architecture and often relying on manual memory management and other low-level operations. While low-level programming languages may offer greater control over the hardware, they can be more error-prone and harder to use effectively than high-level languages.

2 Question 2.

Give a short motivation for each choice of the question above, why you made your selection (that is, why it is correct if you selected it, or why it is incorrect if you did not select it). Keep your motivation short: write at most two sentences per choice.

ChatGPT: Sure, here are the motivations for each choice:

Incorrect: The number of programmers using a language is not a factor in distinguishing between high-level and low-level programming languages.

Incorrect: While some high-level programming languages may be compiled into low-level machine code, this is not always the case, and low-level languages can also be compiled.

Correct: High-level programming languages provide built-in abstractions and a more abstract view of programming, making it easier to express concepts and algorithms more directly. In contrast, low-level languages require more direct manipulation of the hardware, making it necessary to use convoluted encodings to express the same concepts.

Incorrect: The efficiency of computations is not directly related to whether a language is high-level or low-level. Both types of languages can be used to write efficient code, depending on the specific task and the skill of the programmer.

3 Question 3.

What statements about pure and impure expressions are true? Choose one or more correct answer from below.

There exists an expression that is considered impure if you look at its parts, but which has no effect on the program state after it is fully evaluated.
An expression is only impure if its evaluation requires a large amount of time or space.
Impure expressions must have no side-effect on the program state when they are evaluated.
Assignment statements with impure expressions on the right-hand side can be eliminated, resulting in an equivalent program without any impure expressions.

ChatGPT: The correct statement is 1. There exists an expression that is considered impure if you look at its parts, but which has no effect on the program state after it is fully evaluated.

Explanation: Pure expressions are those that have no side effects and always return the same result given the same input. Impure expressions, on the other hand, have side effects or may return different results for the same input. However, it is possible for an expression to be impure by definition but have no effect on the program state when evaluated. For example, an impure expression might generate a warning or log a message, but if the program does not use or check for these outputs, the expression is effectively pure.

Statement 2 is incorrect because the time or space required for evaluation is not a factor in determining whether an expression is pure or impure.

Statement 3 is incorrect because impure expressions by definition have side effects, so they do affect the program state when evaluated.

Statement 4 is incorrect because assignment statements with impure expressions on the right-hand side cannot always be eliminated without changing the behavior of the program. For example, if the impure expression generates a warning or logs a message, eliminating the assignment would result in the loss of this information.

4 Question 4.

Give a short motivation for each choice of the previous question above, why you made your selection (that is, why it is true if you selected it, or why it is not true if you did not select it). Keep your motivation short: write at most two sentences per choice.

(ChatGPT already did that.)

5 Question 5.

What statements about exceptions are true? Choose one or more correct answer from below.

Numerical errors such as division by zero must immediately cause a program crash.
It is never possible to rethrow an exception that was caught before.
Throwing an exception always requires unwinding the whole call stack to find an exception handler.
An unhandled exception that was thrown could lead to a program crash.

ChatGPT: The correct statements are 3 and 4.

Explanation:

Incorrect: Numerical errors such as division by zero can be handled by throwing an exception, rather than causing a program crash.

Incorrect: It is possible to rethrow an exception that was caught before using the “throw” statement.

Correct: When an exception is thrown, the call stack is unwound until an appropriate exception handler is found. This means that the function that threw the exception and any functions that it called will be exited, but this does not necessarily mean that the whole call stack will be unwound.

Correct: If an exception is not caught by any handler, it will cause the program to terminate with an error message. This can lead to a program crash if the error is not handled gracefully.

6 Question 6.

(ChatGPT already did that.)

7 Question 7.

What statements about object-oriented programming are true? Choose one or more correct answer from below.

Dynamic dispatch is a calling convention, where the method body that is actually invoked depends on the run-time type of the callee.
Class inheritance may sometimes be incompatible with Liskov’s substitution principle (where code that works on an object of type A also works for any object that is a subtype of A).
Every class has an object as an instance.
Objects live on a heap, and in garbage collected languages reachable objects may be removed in the background while the program is running.

ChatGPT: The correct statements are 1, 2, and 4.

Explanation:

Correct: Dynamic dispatch is a key feature of object-oriented programming, where the method body that is actually invoked depends on the run-time type of the object that the method is invoked on.

Correct: Class inheritance can lead to violations of Liskov’s substitution principle if the subclass behavior is not compatible with the superclass behavior. For example, if a subclass violates the preconditions or postconditions of a superclass method, code that works on the superclass object may not work correctly on the subclass object.

Incorrect: While every object belongs to a class, not every class has an object as an instance. A class can be abstract or simply serve as a blueprint for creating objects.

Correct: Objects in object-oriented programming are typically allocated on a heap, and in garbage-collected languages, unreachable objects may be removed by the garbage collector while the program is running. This allows the programmer to focus on the program logic and not worry about memory management.

8 Question 8.

(ChatGPT already did that.)

9 Question 9.

Given the following context-free grammar in Backus-Naur Form (BNF):

 ::= () |  +  |  *  |  | 
 ::=  | 
 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
 ::=  | 
 ::= a | b | c | ...
 ::= {} |  :=  | if  = 0 then  else  | ;

The starting symbol is .

Which of the following strings are in the language defined by the grammar above? Choose one or more correct answer from below.

abc := a+b;
c := (((c)))
a := b + a; b := a - b
if (a + b) = 0 then a := b else b := a

ChatGPT: The correct strings are 1, 2, and 4.

Explanation:

Correct: The string abc := a+b; can be generated by the rule := . The variable “abc” is a sequence of letters, which is generated by the rule , and the expression “a+b” is generated by the rule + with being the variable “a” and the variable “b”, which are generated by the rule .

Correct: The string (((c))) can be generated by the rule with being () and being () and being “c”.

Incorrect: The string a := b + a; b := a - b cannot be generated by the given grammar because it is not a valid . The grammar allows only one assignment statement per .

Correct: The string if (a + b) = 0 then a := b else b := a can be generated by the rule if = 0 then else , with being (a+b) generated by the rule + , and being the assignment statements “a := b” and “b := a” generated by the rule := .

10 Question 10.

Motivate, for each string in the previous question above, why you made your selection (i.e. why the string is in the language if you selected it, or why it is not if you did not select it).

(ChatGPT already did that.)

11 Question 11.

Is the grammar given above ambiguous? Give a reasoned answer (i.e. not only “yes” or “no”, but also why and how you found out).

ChatGPT: Yes, the grammar given above is ambiguous.

Reasoning:

Ambiguity in a context-free grammar means that there exists at least one string in the language that can be generated by the grammar in more than one way. In the given grammar, the expression + can be associated with either left or right recursion, which leads to different interpretations of the same input. For example, the input string "1 + 2 * 3" can be parsed as either "(1 + 2) * 3" or "1 + (2 * 3)", which have different values. This indicates that the grammar is ambiguous.

One way to demonstrate the ambiguity is by constructing a parse tree for an ambiguous string. For instance, consider the string "1 + 2 * 3". It can be parsed in two different ways, resulting in two different parse trees:
   +                    *
  / \                  / \
 1   *                +   3
    / \              / \
   2   3            1   2
The left tree corresponds to the interpretation "(1 + 2) * 3", while the right tree corresponds to the interpretation "1 + (2 * 3)". Since the grammar allows both of these parse trees, it is ambiguous.

12 Question 12.

Given the following context-free grammar of the (untyped) lambda calculus:

 ::=  | λ |

We use only single letter variables. Use the standard way of resolving ambiguity: application associates to the left, abstraction precedes application (that is, abstraction binds more strongly than application, or, in other words, abstraction has a higher precedence than application). Expressions can use parentheses for disambiguation (not shown in the grammar).

Draw the abstract syntax tree of the following lambda expressions (in black). Circle the free variable occurrences (in red). Show the binding of variables (in green).

xyz
λxxx
(λyxλx(yx))y

ChatGPT: 1.
     xyz
      |
    -----
   |     |
   x     y
        / \
       x   z
Variable occurrences: x, y, z. All three are free.

Binding of variables: none.
      λxxx
        |
      -----
     |     |
     λ     x
          |
          x
Variable occurrences: none.

Binding of variables: x is bound by the abstraction λxxx.
          λyx
            |
         --------
        |        |
        x     --------
             |        |
             y     -----
                   |     |
                   x     y
                   |
                   y
Variable occurrences: y appears once as a free variable.

Binding of variables: x is bound by the first abstraction λx(yx), and y is bound by the second abstraction λyx.

13 Question 13.

Explain the meaning (what kind of statement) and execution (what happens when statement is executed) of each statement of the following program with dynamic memory management:

int *x = new(5);
int *z = null;
{
  int *y = new(6);
  z := y;
  delete z;
  *x := *y;
}

ChatGPT: This is a program written in C++ with dynamic memory management. Here’s what each statement does:

int *x = new(5);: This statement dynamically allocates memory for an integer array of size 5 and assigns the address of the first element to the variable x.

int *z = null;: This statement initializes a pointer variable z to null.

{: This is the beginning of a block statement.

int *y = new(6);: This statement dynamically allocates memory for an integer array of size 6 and assigns the address of the first element to the variable y.

z := y;: This statement assigns the value of y to z, so z now points to the same memory location as y.

delete z;: This statement deallocates the memory that z points to, which is the memory that y also points to. This means that the memory for the integer array of size 6 that was allocated on the heap is now freed.

*x := *y;: This statement copies the value of the first element of the integer array that y points to, which is also the value of the first element of the integer array that x points to. This means that the value of x is now equal to the value of y.

The execution of this program involves allocating and deallocating memory on the heap and copying values between memory locations. When the program runs, it first allocates memory for the integer array of size 5 and assigns its address to x. Then it initializes z to null. Inside the block statement, it allocates memory for the integer array of size 6 and assigns its address to y. It then assigns the value of y to z, so z points to the same memory location as y. It then deallocates the memory that z points to, which also frees the memory that y points to. Finally, it copies the value of the first element of the integer array that y points to, which is also the value of the first element of the integer array that x points to. After the block statement, the memory for the integer array of size 5 is still allocated and assigned to x, while the memory for the integer array of size 6 that was allocated inside the block statement has been deallocated.

14 Question 14.

Consider the following two programs (on the left and right) that are executed concurrently (x and y are shared integer variables):

x := 5;                       y := 3;
y := 2;                       x := 2;
if (x < 5) {                  x := x - y;
  y := 4                      y := y + 6;
}                             print x
x := x + y;

Give five different executions of the program, i.e. five executions that each prints a different value. The result (of the five executions) alone is not sufficient. Please also describe, for each execution, what happened/how you obtained the result.

ChatGPT: Here are five possible executions:

Execution 1:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 1: x := x + y;
Thread 2: x := x - y; // x = 2 - 3 = -1
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: print x; // x = 9

Execution 2:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 2: x := x - y; // x = 2 - 9 = -7
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 1: print x; // x = 9

Execution 3:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 2: x := x - y; // x = 2 - 9 = -7
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: print x; // x = -7

Execution 4:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: x := x - y; // x = 2 - 9 = -7
Thread 1: print x; // x = 9

Execution 5:
Thread 1: x := 5;
Thread 2: y := 3;
Thread 2: x := 2;
Thread 1: y := 4;
Thread 1: if (x < 5) { // Condition is false, skip if statement
Thread 2: y := y + 6; // y = 3 + 6 = 9
Thread 1: x := x + y; // x = 5 + 4 = 9
Thread 2: print x; // x = -7

In general, the value printed depends on the order in which the threads execute their statements and whether the if statement is executed or skipped. In each execution, we simulate the interleaving

15 Conclusion

(Note that I have removed some questions which are hard to give as input to ChatGPT textually.)

Decide for yourself what you think of the answers. I am impressed: the answers generated by ChatGPT seem closely related to the question, and make sense grammatically. However, when looking more closely into the answers, it is revealed that the chat bot is not able to give reasonably correct answers to all questions.

Acknowledgments

Cover image credit: Levart Photographer

Citation

BibTeX citation:

@article{hiep2023,
  author = {Hiep, Hans-Dieter A.},
  title = {Can {ChatGPT} Do an Exam?},
  journal = {dr. heap},
  volume = {1},
  number = {1},
  date = {2023-02-23},
  url = {https://www.drheap.nl/articles/2023/can-chatgpt-do-an-exam/},
  doi = {10.59350/tyv9n-sw306},
  issn = {3050-5224},
  langid = {en}
}

For attribution, please cite this work as:

Hans-Dieter A. Hiep. Can ChatGPT do an exam? dr. heap volume 1, issue 1 (February 2023). https://doi.org/10.59350/tyv9n-sw306