Let \(\mathcal{I}_2\) be the set of tuples \((a_0, a_1, a_2, \ldots)\) of nonnegative integers, such that all but finitely many \(a_r\) are nonzero. We impose the lexicographic total ordering on \(\mathcal{I}_2:\) given \(a = (a_0, a_1, a_2, \ldots)\) and \(b =(b_0, b_1, b_2, \ldots)\) in \(\mathcal{I}_2,\) take the largest \(r\) such that \(a_r \neq b_r,\) and define \(a > b\) if and only if \(a_r > b_r.\)
We have described a bijection between trees of depth at most \(2\) and tuples in \(\mathcal{I}_2.\) The theorem (for depth \(2\)) now follows from the following two lemmas, which are left as an exercise to the reader.
Lemma. Any legal move changes a tuple \(a \in \mathcal{I}_2\) to another tuple \(b\) such that \(b < a\) for our total ordering.
Lemma. The set \(\mathcal{I}_2\) is well-ordered: it contains no infinite descending chain.
We will construct recursively a well-ordered set \(\mathcal{I}_d\) which classifies the isomorphism types of trees of depth at most \(d.\) Specifically, we will construct \(\mathcal{I}_{d+1}\) as a set of tuples of nonnegative integers, where instead of indexing by \(\mathbb{N}\) we index by \(\mathcal{I}_d.\) Precisely, \(\mathcal{I}_{d+1}\) is the set of functions \(a: \mathcal{I}_d \rightarrow \mathbb{N}\) (we notate such a function \(i \mapsto a_i\)) such that \(a_i = 0\) for all but finitely many \(i \in \mathcal{I}_d.\)
Again, we impose on \(\mathcal{I}_{d+1}\) the lexicographic total ordering: given any \(a, b \in \mathcal{I}_{d+1},\) choose \(i \in \mathcal{I}_d\) maximal such that \(a(i) \neq b(i);\) and take \(a > b\) if and only if \(a(i) > b(i).\) (Only finitely many \(i\) satisfy \(a(i) \neq b(i),\) so there is no difficulty in choosing the maximal one.)
Just like before, there is a bijection between elements of \(\mathcal{I}_{d+1}\) and isomorphism types of trees of depth at most \(d+1:\) given a tree \(T,\) produce the tuple which records for each \(i \in \mathcal{I}_d\) the number of copies of \(T_i\) that are immediate subtrees of \(T.\)
The result now follows from the following two lemmas, the first of which we again leave as an exercise.
Lemma. Any legal move changes a tuple \(a \in \mathcal{I}_{d+1}\) to another tuple \(b\) such that \(b < a\) for our total ordering.
Lemma. The set \(\mathcal{I}_{d+1}\) is well-ordered: it contains no infinite descending chain.
Proof. Suppose \(a^{(1)} \geq a^{(2)} \geq a^{(3)} \geq \cdots\) is an infinite descending chain in \(\mathcal{I}_{d+1}.\) We want to show that eventually the chain stabilizes: there exists \(N\) such that \(a^{(n)} = a^{(N)}\) for \(n \geq N.\)
\((\)A remark about notation: \(a^{(N)} \in \mathcal{I}_{d+1}\) is a tuple in our infinite descending chain of tuples. For \(i \in \mathcal{I}_d,\) the \(i\)-th entry of this tuple is \(a^{(N)}_i.)\)
Let \(j \in \mathcal{I}_d\) be an index; we say that the sequence \((a^{(n)})\) stabilizes to the right of index \(j\) if there exists a positive integer \(N\) such that
$$a^{(n)}_i = a^{(N)}_i$$
whenever \(i \geq j\) and \(n > N.\)
Since \(\mathcal{I}_d\) is well-ordered, we can choose \(j\) minimal such that our sequence stabilizes to the right of \(j;\) also choose \(N\) as above, so that \((a^{(n)})_i = (a^{(N)})_i\) whenever \(i \geq j\) and \(n > N.\)
The tuple \(a^{(N)}\) has only finitely many nonzero entries. If \(a^{(N)}_k = 0\) for all \(k < j\), then the sequence stabilizes at \(a^{(N)}\) and we are done. Otherwise, choose \(k\) maximal such that \(k < j\) and \(a^{(N)}_k > 0.\) (There are only finitely many \(k\) with \(a^{(N)}_k > 0,\) so again there is no harm in choosing the maximum.)
I claim that in fact the sequence stabilizes to the right of index \(k\), contradicting our choice of \(j.\) Consider an index \(i \geq k.\) If \(i \geq j\) then we know \((a^{(n)})_i = (a^{(N)})_i\) for \(n > N.\) If \(k < i < j\) then \(a^{(N)}_i = 0;\) and by the lexicographic ordering on \(\mathcal{I}_{d+1}\) we know that \(a^{(n)}_i = 0\) for \(n > N\) as well. So we only need to worry about \(i = k.\) Again by lexicographic ordering, the sequence
$$a^{(N)}_k, a^{(N+1)}_k, a^{(N+2)}_k, \ldots$$
is a nonincreasing sequence of natural numbers. Any such sequence in \(\mathbb{N}\) eventually stabilizes. So our sequence \((a^{(n)})\) stabilizes to the right of \(k\) as well, contradicting our choice of \(j.\) \(\blacksquare\)
In closing I'll just mention that the sets \(\mathcal{I}_d\) are examples of ordinals; interpreting them as such gives rise to the proofs linked to at the very beginning of this post.