Compactness. Part 1: Degrees of freedom

by Willie Wong

This series of posts has its genesis in a couple questions posted on It is taking me a bit longer to write this than expected: I am not entirely sure what level of an audience I should address the post to. Another complication is that it is hard to get ideas organized when I am only writing about this during downtime where I hit a snag in research. The main idea that I want to get to is “a failure of sequential compactness of a domain represents the presence of some sort of infinity.” In this first post I will motivate the definition of compactness in mathematics.

We start with the pigeonhole principle: “If one tries to stuff pigeons into pigeonholes, and there are more pigeons than there are pigeonholes, then at least one hole is shared by multiple pigeons.” This can be easily generalized the the case where there are infinitely many pigeons, which is the version we will use

Pigeonhole principle. If one tries to stuff infinitely many pigeons into a finite number of pigeonholes, then at least one hole is occupied by infinitely many pigeons.

Now imagine being the pigeon sorter: I hand you a pigeon, you put it into a pigeonhole. However you sort the pigeons, you are making a (infinite) sequence of assignments of pigeons to pigeonholes. Mathematically, you represent a sequence of points (choices) in a finite set (the list of all holes). In this example, we will say that a point in the set of choices is a cluster point relative to a sequence of choices if that particular point is visited infinitely many times. A mathematical way of formulating the pigeonhole principle then is:

Let S be a finite set. Let (a_n) be a sequence of points in S. Then there exist some point p \in S such that p is a cluster point of (a_n).

The key here is the word finite: if you have an infinitely long row of pigeonholes (say running to the right of you), and I keep handing you pigeons, as long as you keep stepping to the right after stuffing a pigeon into a hole, you can keep a “one pigeon per hole (or fewer)” policy.

If we now take S = [0,1] to be the set of all real numbers between 0 and 1, however, the pigeonhole principle fails. That set is uncountably infinite, as Cantor would say, and if we consider each number in S to be a unique pigeonhole, I can stuff the first pigeon into the hole labeled 1, the second labeled 1/2, the nth pigeon into the hole labeled 1/n, and so on, and no pigeon will share a hole with another pigeon. On the other hand, if you look at the sequence of points given by (1/n), you see that they gradually approach the point 0. So it is somewhat dissatisfying to directly extend the notion of a cluster point for the case where S is finite, to the case where S is infinite.

What our intuition tells us is that we need some notion that two points in S is close; or equivalently, we need some “fuzzy” notion of pigeonholes. For the set S = [0,1], we have a natural way of measuring closeness: the distance between two points x,y in S is defined as the absolute value of their difference |x-y|. And using the distance function to define fuzziness, we arrive at the following version of the fuzzy pigeonhole principle:

A “fuzzy” pigeonhole principle for [0,1]. Let S = [0,1], the set of numbers greater-than-or-equal to 0 and less-than-or-equal-to 1; and let (a_n) be a sequence of points in S. Then for any positive \delta, which represents the degree of fuzziness, there exists some point s_0 in S such that there are infinitely many points in (a_n) whose distance to s_0 is less than \delta.

The fuzzy pigeonhole principle is a direct consequence of the original pigeonhole principle: for any fixed \delta > 0, we can divide S into a bunch of smaller intervals [0,\delta), [\delta, 2\delta), [2\delta, 3\delta), \ldots . There are at most N such intervals, where (N-1) is the smallest integer bigger than 1/\delta. So we’ve divided S into a finite number of smaller intervals, where any two points taken from the same sub-interval have distance less than \delta. Now use the finite pigeonhole principle on the sequence, but replacing the set S by the set S' of the sub-intervals. So there must be a particular sub-interval [c,d) such that infinitely many points of (a_n) land inside it. Then any point inside the sub-interval will qualify as the point s_0 we seek.

In view of the fuzzy pigeonhole principle, we can say that a point s \in S is a cluster point of (a_n) if it can serve as the point s_0 for any positive \delta. Observe that not every s_0 is a cluster point. If we fix the fuzziness scale to be \delta = 1/2, then the number s_0 = 1/3 is one of the points that satisfies the fuzzy pigeonhole principle for the sequence (1/n). In fact, all but the first term a_1 = 1 lies within distance 1/2 of s_0. But if we shrink the fuzziness scale to be \delta = 1/4, then 1/3 no longer serves as a viable choice for s_0. In fact, in this example, the only cluster point is s = 0.

But the cluster point of a sequence need not be unique. Let the sequence (a_n) be given by

a_n = 1 / n if n is odd, and a_n = 1 - 1/n if n is even.

then it is a small exercise to see that both 0 and 1 are cluster points of this sequence. (In the finite case, the analogue is the fact that with infinite pigeons to put into a finite number of holes, there can be more than one hole with infinitely many pigeons.)

Compactness for metric spaces / sequential compactness
The pigeonhole principle, by itself, discriminates between finite and infinite sets. We’ve just shown that for a particular infinite set, if we use the notion of a distance to make things fuzzier, we can recover a version of the pigeonhole principle. Can we do this for arbitrary infinite sets? The answer is no. Let S be the set of all natural numbers. Again we have a infinite set. This infinite set is “smaller” than [0,1], in the Cantor sense that it is countably infinite. But on this set if we use the standard distance function — absolute value of the difference between two numbers — we see that the distance between any two points is at least 1. So if we pick 1 > \delta > 0, the fuzziness factor does not make the set look any different. In other words, if the infinite set cannot be divided into a finite collection of small (in the distance sense) sets, the fuzzy pigeonhole principle cannot hold.

On the other hand, if we use a funny distance on the set of natural numbers, we can recover the pigeonhole principle! If we define the distance between two natural numbers as the inverse distance: dist(a,b) = | 1/a - 1/b|, then we have the fuzzy pigeonhole principle. This is because for any fixed \delta > 0, we can let N be the smallest natural number such that N\delta > 1. Then for any two numbers a,b \geq N, we have that dist(a,b) = |1/a - 1/b| \leq 1 / min(a,b) \leq 1 / N \leq \delta. So we can divide the natural numbers into N small sets: a subset for each of the numbers going from 1 to (N-1), and one last subset for all the numbers bigger-than-or-equal-to N. Then running the argument as in the [0,1] case, we can appeal to the finite pigeonhole principle and conclude that infinitely many choices must reside in one of the subsets, and any representative point in that subset is the s_0 we want.

So we see that the fuzzy pigeonhole principle depends not just on the set we choose, but also on the distance function we define!

Definition. Metric space
A metric space is a set S endowed with a distance function dist, which takes as input two elements of S and outputs a non-negative real number. The distance function is required to satisfy the following properties

  • dist(x,x) = 0: there’s no distance between a point and itself.
  • If dist(x,y) = 0, then x = y: two different points must have a positive distance between them.
  • dist(x,y) = dist(y,x): the order does not matter in the input for the distance function; another way of looking at it is saying that the distance from x to y is the same as the distance from y to x.
  • The triangle inequality dist(x,y) + dist(y,z) \geq dist(x,z): if two points are both close to a given third point, then the two points cannot be too far apart.

We can then classify metric spaces into those for which the fuzzy pigeonhole principle holds, and those for which it doesn’t.

Definition. Sequentially pre-compact
A metric space (S,dist) is said to be (sequentially) pre-compact if for any positive number \delta_0, and for any given sequence of points (a_n) in S, we can find a point s_0 in S such that infinitely many points in the sequence has a distance dist(a_n,s_0) < \delta_0.

(The word “sequentially” is used to distinguish the definition from the topological definition. As it turns out, however, the word “sequentially” can be safely omitted here, as the two definitions agree for metric spaces. It is only for more unreasonable generalizations that a distinction need to be made.) (For those of you who already had a bit of analysis, the condition given, in view of the triangle inequality, is equivalent to saying that any sequence has a Cauchy subsequence.) A slightly stronger definition is

Definition. Sequential compactness
A metric space (S,dist) is said to be (sequentially) compact if for any given sequence of points (a_n), we can find a cluster point s\in S. In other words, for any sequence there exists a point s such that for any \delta > 0 infinitely many points of (a_n) has distance dist(a_n,s) < \delta.

(And here we see that syntactical order in the statement is important in mathematics.) The difference between pre-compact and compact is best illustrated by example: consider the sets S_N and S, the former being all natural numbers less than N, the latter being all natural numbers. On the two sets we use the same inverse distance function as described before dist(a,b) = | 1/a - 1/b|. For any N, S_N, being a finite set, is sequentially compact: any “hole” described by the finite pigeonhole principle is a cluster point. S, however, is only pre-compact (which we’ve already shown): consider the sequence of numbers a_n = n in S. For any fixed element s \in S, if we take \delta  2s, dist(s,a_m) > 1/(2s) > \delta, so there can be no more than 2s elements whose distance to s is closer than 1/(3s). This means that s cannot be a cluster point. So for this particular sequence, there does not exists a cluster point in S, and so S is not sequentially compact.

While it is important to make the distinction between compact and pre-compact spaces, we will not dwell too much upon it: the discussion to follow will only really need the fuzzy pigeonhole principle (i.e. pre-compactness).

Uniform spaces and compactness
(I will not discuss compactness from the point of view of point-set topology. It is a useful notion, to be sure, but since my motivation is more analysis-heavy, I’ll skip it for now. Furthermore, a discussion of topological compactness cannot be complete without relating the various notions of compactness to each other, which necessitates many definitions and complications I don’t want to get into; this is especially since all the notions of compactness agree on spaces we really care about in analysis. Instead, I’ll present the same definition, just motivated from the uniform space point of view, which is a more natural setting for analysis.)

Strictly speaking, the distance function is not necessary for getting a concept of compactness. That is, we don’t actually need to know the numerical value of the distance between points, we just need some notion of points being relatively close to each other. This is captured by the notion of an uniform space.

Definition. Uniform space
A uniform space is a set S endowed with a uniform structure \Phi.

Well, that’s useful! A uniform structure is a replacement for the distance function.

Definition. Uniform structure (a.k.a. uniformity)
A uniform structure \Phi on a set S is a collection of subsets of S\times S, the set of all ordered pairs with values in S. \Phi is required to satisfy the axioms of uniformity:

  • Every element of \Phi contains the diagonal \{(x,x) : x\in S\} as a subset: this says that any point is close to itself, analogous to the condition that dist(x,x) = 0 for a distance function.
  • The only points contained in all of the elements of \Phi is the diagonal. In other words, \cap_{U\in \Phi} U = the diagonal. This is the condition of separability, analogous to the statement that any two distinct points must have positive distance between them. [Note, quite often this separability axiom is not assumed as necessary. Indeed, much of the theory can be done without it. But for our case it is rather necessary.]
  • \Phi is symmetric: if U is an element of \Phi, then the set V = \{ (x,y): (y,x)\in U\} is also an element. This says that closeness is symmetric, and replaces the condition that dist(x,y) = dist(y,x).
  • The existence of a “half-as-large” measure: if U is an element of \Phi, then there exists another element V such that if (x,z)\in V and (z,y)\in V, then (x,y)\in U. This is a replacement of the triangle inequality.

\Phi is also required to be a filter:

  • Let U,V be subsets of S\times S. If U,V are both in \Phi, then so is their intersection U\cap V.
  • Let U,V be subsets of S\times S. If U contains V as a subset, and V is an element of \Phi, then U is also an element.

A subset of S\times S that is an element of \Phi is called an entourage (French for neighborhood). Given an entourage U and a point x \in S, we will write U[x] as the subset of S given by \{ y: (x,y) \in U\}. An entourage therefore is a rule that shows, for each point in S, which points are close to it. The main difference between measuring using an entourage and measuring using a distance function is the following: the distance function is totally ordered. Given a point x, and two values d_1, d_2, we can write down the sets S_1, S_2 of points whose distance to x is at most d_1 and d_2 respectively. Then we know that either S_1\subset S_2 or the other way around. With entourages, two measures of closeness can agree on some points without having one necessarily subsumed by the other.

One can think of a uniform structure as something obtained by querying several different opinions as to what constitutes closeness on the set S, and then completing the collection via the uniformity and filter axioms. In other words, we are formalizing the concept of fuzziness that we used in the introduction. We can then define sequential (pre)compactness just as before

Definition. Sequential pre-compact for uniform spaces
A uniform space (S,\Phi) is said to be sequentially pre-compact if for any given sequence (a_n) of points in S, and for any fixed entourage U, there exist some point s such that infinitely many points in the sequence lands inside U[s].
Definition. Sequential compactness for uniform spaces
The uniform space is said to be sequentially compact if for any given sequence (a_n), there exists some point s such that for any entourage U, U[s] contains infinitely many points of (a_n).

To make more explicit the connection between the finite pigeonhole principle and compactness, we make the following definition

Definition. Pre-compactness of uniform spaces
Let (S,\Phi) be a uniform space. It is said to be pre-compact if for any entourage U in \Phi, we can find a finite set of points \{x_1,\ldots,x_{\alpha(U)}\} in S such that \cup U[x_i] \supset S. In other words, every uniform cover has a finite refinement.

This definition is stricter than sequential pre-compactness.

Proposition. A pre-compact uniform space is sequentially pre-compact.

Proof. Let (a_n) be some sequence, and let U be the fixed entourage. By pre-compactness, there exists finitely many x_i such that every point in S belongs to some U[x_i]. Now we can apply the finite pigeonhole principle, and conclude that one of the x_i is the s we want. Q.E.D.

Just as an example, consider the half-open interval S = (0,1] again. We’ll put on it two different uniform structures. The first is the uniform structure associated to the standard metric: U is an entourage if there exists some \epsilon > 0 such that \{ (x,y) : dist(x,y) \leq \epsilon\} \subset U. By the same construction we did for [0,1], we see that for any such entourage, then, we can cover S by the neighborhoods of roughly 1/\epsilon points spaced equally apart. So under this uniform structure S = (0,1] is pre-compact. On the other hand, let the uniform structure be defined by the following: U is an entourage if there exists some \epsilon > 0 such that \{ (x,y) : dist(x,y) \leq \epsilon x\} is contained in U. The entourages becomes funnel shaped: they are allowed to get very, very thin as one approaches 0. A nice exercise is to show that the entourage defined by U = \{ (x,y) : dist(x,y) \leq  x/3\} does not generate a cover of (0,1] that has a finite refinement. (Hint: if there were a finite cover associated to the points x_i, then there is a smallest x_i. Show that then there are necessarily points around 0 which are not in the cover.) Therefore with this uniform structure, S is not pre-compact. (In fact, this uniform structure is the same as that of the half line (-\infty, 0] with the standard metric, which is also not pre-compact.)