A Proof of the Schröder-Bernstein Theorem in ACL2
Recently, Dr. Freek Wiedijk’s Formalizing 100 Theorems list made the rounds within the ACL2 community. One of the theorems on the list, which at that point had not been proved within ACL2, was the Schröder-Bernstein Theorem. I had remembered trying to prove this theorem in Coq back in the day while fleshing out a theory of cardinality, before I realized it had already been proven elsewhere.
The Schröder-Bernstein theorem states that, for sets $A$ and $B$, if there exists an injection from $A$ to $B$, and an injection from $B$ to $A$, then there must exist a bijection between the two sets. This is equivalent to saying that the ordering of cardinal numbers is antisymmetric. The statement seems rather simple, but I had no real intuition as to how one might construct such a bijection until reading existing proofs.
I found the proof described here to be direct and easy to follow. Before constructing the bijection, we introduce a theory of “chains”. Let’s say that $f : A \rightarrow B$ and $g : B \rightarrow A$ are our two injections. A chain is a subset of $A \cup B$ whose elements are reachable to one another via repeated applications of $f$ and $g$, or their inverses. So the element $a \in A$ belongs to the chain: $\lbrace\ \dots\ ,\ f^{-1}(g^{-1}(a)),\ g^{-1}(a),\ a,\ f(a),\ g(f(a)),\ \dots\ \rbrace$. Similarly, $b \in B$ belongs to the chain: $\lbrace\ \dots\ ,\ g^{-1}(f^{-1}(b)),\ f^{-1}(b),\ b,\ g(b),\ f(g(b)),\ \dots\ \rbrace$. We may think of chains as a (potentially cyclical) sequence. Of course, $f^{-1}$ and $g^{-1}$ are not necessarily defined on all input, so sequences proceed rightward infinitely (possibly entering a cycle), but may have a “beginning” when traced leftward where the next inverse to be applied is undefined.
Note that the set of chains partitions $A \cup B$. That is, this notion of chains induces an equivalence class within $A \cup B$.
Among chains which have an initial element (an element for which $f^{-1}$ or $g^{-1}$ is undefined), we shall call chains which begin in $A$ “$A$-stoppers”, and the others “$B$-stoppers”. We then claim that for any given $a \in A$, $g^{-1} :~A \rightarrow B$ is a bijection on $a$’s chain if the chain is an $B$-stopper. Otherwise (regardless of whether the chain is an $A$-stopper or a non-stopper), $f$ is a bijection on the chain. It follows then that the function $h$, defined below, is a bijection from $A$ to $B$.
$$ h(a) = \begin{cases} g^{-1}(a) &\text{if } stopper_B(a)\\ f(a) &\text{otherwise} \end{cases} $$
To see the details, I’d encourage you to follow the earlier link to a full proof sketch, as the focus here is more on the ACL2 formalization.
ACL2
As Dr. Matt Kaufmann will tell you, ACL2 was not primarily intended for this sort of proof. It is much more a system for efficient and pragmatic modeling and reasoning over software and hardware systems than it is, say, a compelling system for mathematical foundations. This is likely why we’re not higher in Freek’s list (at the time of writing, ACL2 and its predecessor nqthm have proved 45/100 theorem, while Isabelle leads with 89/100); most who are concerned primarily with pure mathematics will be drawn to other provers.
Whatever. Let’s prove it anyway. It should be fun!
Setup
Before we consider the proof, we must first determine how to even state the theorem in ACL2. Instead of using sets, we will use predicates, which are of course equivalent. Really, we just use functions. Recall that any non-nil
value is considered “true” in ACL2, so we need not require that a function return a boolean to be considered a predicate. To remind us that a function is being thought of as a predicate, we’ll use function names p
and q
(these will replace our sets $A$ and $B$).
Recall also that the ACL2 logic is untyped, and a function is defined for every value. So we cannot really define a function logically restricted to a particular domain. Instead, we define the function as we wish for the domain in question, and give it an arbitrary value outside of it. To say then that a function f
has domain p
and codomain q
, it suffices to show (implies (p x) (q (f x)))
for all x
.
The real obstacle to stating our theorem is that ACL2 does not have higher-order functions (although there has been some work toward pseudo-second-order functions [1, 2]). Our solution will be to avoid universal quantification by instead introducing a generic definition. That is, instead of saying “for any function $f$”, we will say “let us define some function $f$, and then throw away its definition”. In ACL2, we can do this with an encapsulate
event.
For example, say we wish to introduce an arbitrary function f
with arbitrary domain and codomain p
and q
. We may do so with the following encapsulate
.
(encapsulate
(((p *) => *)
((q *) => *)
((f *) => *))
(local (define p (x) x))
(local (define q (x) x))
(local (define f (x) x))
(defrule q-of-f-when-p
(implies (p x)
(q (f x)))))
Here we define each function to be the identity, but this is unimportant; they are declared to be local
to the encapsulate
, and therefore their definitions are limited to its scope. The theorem capturing the domain and codomain of f
, however, is non-local, and therefore will survive outside the encapsulate
.
To set up the Schröder-Bernstein theorem, we use an encapsulate
similar to the example above, introducing not one but two functions, and also asserting their injectivity.
(encapsulate
(((f *) => *)
((g *) => *)
((p *) => *)
((q *) => *))
(local (define p (x) (declare (ignore x)) :enabled t t))
(local (define q (x) (declare (ignore x)) :enabled t t))
(local (define f (x) :enabled t x))
(local (define g (x) :enabled t x))
(defrule q-of-f-when-p
(implies (p x)
(q (f x))))
(defrule injectivity-of-f
(implies (and (p x)
(p y)
(equal (f x) (f y)))
(equal x y))
:rule-classes nil)
(defrule p-of-g-when-q
(implies (q x)
(p (g x))))
(defrule injectivity-of-g
(implies (and (q x)
(q y)
(equal (g x) (g y)))
(equal x y))
:rule-classes nil))
Inverses
With the premises out of the way, we can start our actual proof.
First, we’ll want to define inverses for f
and g
. For this, we’ll use ACL2’s defchoose
event. defchoose
permits you to define non-computational functions via a choice mechanism. Essentially, one provides defchoose
with a predicate, and it defines a new function which is logically constrained to return a witness to the predicate, if one exists. This is how universally and existentially quantified statements are defined in ACL2.
An inverse function may be seen as providing a witness to the invertivilty of some element in the codomain. Therefore, we can use defchoose
to define our inverse.
(define is-f-inverse (inv x)
(and (p inv)
(q x)
(equal (f inv) x)))
(defchoose f-inverse (inv) (x)
(is-f-inverse inv x))
(define has-f-inverse (x)
(is-f-inverse (f-inverse x) x))
The definition of g-inverse
and its associates mirrors the above.
defchoose
gives us a nice way to deal with inverses, but there is no getting around that many theorems will involve explicitly providing witnesses as proof hints. This isn’t ideal; the prevailing ACL2 style is to do as much as possible with rewrite rules, and only provide a theorem with a couple enable
/disable
hints.
Chains
After fleshing out a straightforward rewrite theory from our premises and inverse functions, we can start to define our notion of chains. First we note that our predicates p
and q
might not be disjoint. A chain segment therefore can not be just a list of values, since we wouldn’t be able to discern which predicate the value was intended to satisfy. Our chains are going to be made up of values annotated with their “polarity”, a boolean indicating membership in either p
or q
.
We avoid actually “constructing” chains, as they are, in general, infinite. All that is necessary is to define our equivalence relation with asserts two elements are in the same chain. This is easy enough. First we define a chain<=
operator, which says that some element is equal to or to the left of some other element on a chain. That is, there exists some finite number (perhaps 0) of alternating applications of f
and g
such that we start at the left element and end at the right element. Then chain=
following obviously from that.
;; `(chain-steps x n)` proceeds `n` steps to the right of chain element `x`.
(define-sk chain<=
((x chain-elem-p)
(y chain-elem-p))
(exists n
(equal (chain-steps x (nfix n)) y)))
(define chain=
((x chain-elem-p)
(y chain-elem-p))
(if (mbt (and (chain-elem-p x)
(chain-elem-p y)))
(or (chain<= x y)
(chain<= y x))
(equal x y)))
;; ...
(defequiv chain=)
Recall that mbt
(= “must be true”) is proved to always be true in guard-verified execution. Logically, it is just the identity, but in execution it is replaced with the constant t
. Its purpose here is to ensure that chain=
is an equivalence relation even on guard-violating inputs. ACL2 does not support conditional equivalence relations, and even if it did, it is good practice to use definitions which are well-behaved outside of the guard domain so as to avoid unnecessary hypotheses in one’s rules.
We’ll next need to capture our notion of “initial” elements of a chain.
(define initial-p ((x chain-elem-p))
(if (polarity x)
(not (has-g-inverse (chain-elem x)))
(not (has-f-inverse (chain-elem x)))))
(define initial-wrt (initial (x chain-elem-p))
(and (chain-elem-p initial)
(initial-p initial)
(chain<= initial x)))
(defchoose get-initial (initial) (x)
(initial-wrt initial x))
(define has-initial ((x chain-elem-p))
(initial-wrt (get-initial x) x))
We define an initial chain element to be one whose value does not have an f
or g
inverse (depending on the polarity). We could have just as well defined initial-p
in terms of chain<=
. I.e. (initial-p x)
might have been defined as (forall (y) (implies (chain<= y x) (equal x y)))
. This definition involves a quantifier though, which we try to minimize in ACL2. Instead we prove this property as a consequence1.
Again we see an appearance of defchoose
, this time in the get-initial
function. We have no choice2 here, since we can’t define the function computationally. We could walk backwards along the chain looking for an initial element or cycle, but we would have no way to tell if the chain proceeds backwards infinitely.
Finally we can define in-q-stopper
, mirroring $stopper_B$ from the informal sketch.
(define in-q-stopper ((x chain-elem-p))
(and (has-initial x)
(not (polarity (get-initial x)))))
The first conjunct states that the chain is a stopper, and the second that the polarity of the initial element matches q
.
The Witness
Finally we have enough to define our bijective witness, sb-witness
, mirroring $h$ from the initial sketch.
;; `(make-chain-elem t x)` just makes `x` into a chain element, giving it
;; polariy `t` to indicate its membership in `p`.
(define sb-witness ((x p))
(if (in-q-stopper (make-chain-elem t x))
(g-inverse x)
(f x)))
At this point, I will omit a whole lot of rewrite rules and intermediate theorems, as I really have been doing this whole time, to show you the conclusion, summarized below.
(define-sk exists-sb-inverse (x)
(exists inv
(and (p inv)
(equal (sb inv) x))))
(defrule q-of-sb-when-p
(implies (p x)
(q (sb x))))
(defrule injectivity-of-sb
(implies (and (p x)
(p y)
(equal (sb x) (sb y)))
(equal x y)))
(defrule surjectivity-of-sb
(implies (q x)
(exists-sb-inverse x)))
I have spared the reader from all of intermediate proofs because, truthfully, none of them are all that interesting in isolation. If you want to see the full proof, you’ll have to check it out in the ACL2 community books!
Footnotes
Actually, the rule we prove is stronger, and a better rewrite rule: https://github.com/acl2/acl2/blob/master/books/projects/schroder-bernstein/chains.lisp#L429-L437 ↩︎
Pun intended. ↩︎