… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

## Category: Maths

### Heat ball

There are very few things I find unsatisfactory in L.C. Evans’ wonderful textbook on Partial Differential Equations; one of them is the illustration (on p.53 of the second edition) of the “heat ball”.

The heat ball is a region with respect to which an analogue of the mean value property of solutions to Laplace’s equation can be expressed, now for solutions of the heat equation. In the case of the Laplace’s equation, the regions are round balls. In the case of the heat equation, the regions are somewhat more complicated. They are defined by the expression

$\displaystyle E(x,t;r) := \left\{ (y,s)\in \mathbb{R}^{n+1}~|~s \leq t, \Phi(x-y, t-s) \geq \frac{1}{r^n} \right\}$

where $\Phi$ is the fundamental solution of the heat equation

$\displaystyle \Phi(x,t) := \frac{1}{(4\pi t)^{n/2}} e^{- \frac{|x|^2}{4t}}.$

In the expressions above, the constant $n$ is the number of spatial dimensions; $r$ is the analogue of the radius of the ball, and in $E(x,t;r)$, the point $(x,r)$ is the center. Below is a better visualization of the heat balls: the curves shown are the boundaries $\partial E(0,5;r)$ in dimension $n = 1$, for radii between 0.75 and 4 in steps of 0.25 (in particular all the red curves have integer radii). In higher dimensions the shape is generally the same, though they appear more “squashed” in the $t$ direction.

1-dimensional heat balls centered at (0,5) for various radii. (Made using Desmos)

### In defense of integration by parts

A prominent academic, who happens not to be a mathematician, visited my home institution recently and gave a public address about the role of the university in the modern world. Most of what he said concerning our teaching mission are the usual platitudes about not being stuck in the past and making sure that our curricular content and learning objectives are aligned with what we would expect a 21st century college graduates to need.

It however bugged me to no end that the recurring example this particular individual returns to for something old-fashioned and “ought not be taught” is integration by parts; and he justifies this by mentioning that computer algebra systems (or even just google) can do the integrals faster and better than we humans can.

I don’t generally mind others cracking jokes at mathematicians’ expense. But this particular self-serving strawman uttered by so well-regarded an individual is, to those of us actually in the field teaching calculus to freshmen and sophomores, very damaging and disingenuous.

I happened to have just spent the entirety of last year rethinking how we can best teach calculus to the modern engineering majors. Believe me, students nowadays know perfectly well when we are just asking them to do busywork; they also know perfectly well that computer algebra systems are generally better at finding closed-form integral expressions than we can. Part of the challenge of the redesign that I am involved in is precisely to convince the students that calculus is worth learning in spite of computers. The difficulty is not in dearth of reason; on the contrary, there are many good reasons why a solid grounding of calculus is important to a modern engineering students. To give a few examples:

1. Taylor series are in fact important because of computers, since they provide a method of compactly encoding an entire function.
2. Newton’s method for root finding (and its application to, say, numerical optimization) is build on a solid understanding of differential calculus.
3. The entirety of the finite element method of numerical simulation, which underlies a lot of civil and mechanical engineering applications, are based on a variational formulation of differential equations that, guess what, only make sense when one understand integration by parts.
4. The notion of Fourier transform which is behind a lot of signal/image processing requires understanding how trigonometric functions behave under integration.

No, the difficulty for me and my collaborator is narrowing down a list of examples that we can not only reasonably explain to undergraduate students, but also have them have some hands-on experience working with.

When my collaborator and I were first plunged into this adventure of designing engineering-specific calculus material, one of the very first things that we did was to seek out inputs from our engineering colleagues. My original impulse was to cut some curricular content in order to give the students a chance to develop deeper understanding of fewer topics. To that end I selected some number of topics which I thought are old-fashioned, out-dated, and no longer used in this day and age. How wrong I was! Even something like “integration by partial fractions” which most practicing mathematicians will defer to a computer to do has its advocates (those who have to teach control theory insists that a lot of fundamental examples in their field can be reduced to evaluating integrals of rational functions, and a good grasp of how such integrals behave is key to developing a general sense of how control theory works).

In short, unlike some individuals will have you believe, math education is not obsolete because we all have calculators. In fact, I would argue the opposite: math education is especially pertinent now that we all have calculators. Long gone was the age where a superficial understanding of mathematics in terms of its rote computations is a valuable skill. A successful scientist or engineer needs to be able to effectively leverage the large toolbox that is available to her, and this requires a much deeper understanding of mathematics, one that goes beyond just the how but also the what and the why.

There are indeed much that can be done to better math education for the modern student. But one thing that shouldn’t be done is getting rid of integration by parts.

### Riemann-, Generalized-Riemann-, and Darboux-Stieltjes integrals

(The following is somewhat rough and may have typos.)

Let us begin by setting the notations and recalling what happens without the Stieltjes part.

Defn (Partition)
Let $I$ be a closed interval. A partition $P$ is a finite collection of closed subintervals $\{I_\alpha\}$ such that

1. $P$ is finite;
2. $P$ covers $I$, i.e. $\cup P = I$;
3. $P$ is pairwise almost disjoint, i.e. for $I_\alpha, I_\beta$ distinct elements of $P$, their intersection contains at most one point.

We write $\mathscr{P}$ for the set of all partitions of $I$.

Defn (Refinement)
Fix $I$ a closed interval, and $P, Q$ two partitions. We say that $P$ refines $Q$ or that $P \preceq Q$ if for every $I_\alpha\in P$ there exists $J_\beta \in Q$ such that $I_\alpha \subseteq J_\beta$.

Defn (Selection)
Given $I$ a closed interval and $P$ a partition, a selection $\sigma: P \to I$ is a mapping that satisfies $\sigma(I_\alpha) \in I_\alpha$.

Defn (Size)
Given $I$ a closed interval and $P$ a partition, the size of $P$ is defined as $|P| = \sup_{I_\alpha \in P} |I_\alpha|$, where $|I_\alpha|$ is the length of the closed interval $I_\alpha$.

Remark In the above we have defined two different preorders on the set $\mathscr{P}$ of all partitions. One is induced by the size: we say that $P \leq Q$ if $|P| \leq |Q|$. The other is given by the refinement $P\preceq Q$. Note that neither are partial orders. (But that the preorder given by refinement can be made into a partial order if we disallow zero-length degenerate closed intervals.) Note also that if $P\preceq Q$ we must have $P \leq Q$.

Now we can define the notions of integrability.

Defn (Integrability)
Let $I$ be a closed, bounded interval and $f:I \to \mathbb{R}$ be a bounded function. We say that $f$ is integrable with integral $s$ in the sense of

• Riemann if for every $\epsilon > 0$ there exists $P_0\in \mathcal{P}$ such that for every $P \leq P_0$ and every selection $\sigma:P \to I$ we have
$\displaystyle \left| \sum_{I' \in P} f(\sigma(I')) |I'| - s \right| < \epsilon$

• Generalised-Riemann if for every $\epsilon > 0$ there exists $P_0 \in \mathcal{P}$ such that for every $P \preceq P_0$ and every selection $\sigma: P\to I$ we have
$\displaystyle \left| \sum_{I' \in P} f(\sigma(I')) |I'| - s \right| < \epsilon$

• Darboux if
$\displaystyle \inf_{P\in\mathscr{P}} \sum_{I' \in P} (\sup_{I'} f )|I'| = \sup_{P\in\mathscr{P}} \sum_{I' \in P} (\inf_{I'} f )|I'| = s$

From the definition it is clear that “Riemann integrable” implies “Generalised-Riemann integrable”. Furthermore, we have clearly that for a fixed $P$
$\displaystyle \sum_{I' \in P} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} f(\sigma(I')) |I'| \leq \sum_{I' \in P} (\sup_{I'} f) |I'|$
and that if $P \preceq Q$ we have
$\displaystyle \sum_{I' \in Q} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} (\sup_{I'} f) |I'| \leq \sum_{I' \in Q} (\inf_{I'} f) |I'|$
so “Darboux integrable” also implies “Generalised-Riemann integrable”. A little bit more work shows that “Generalised-Riemann integrable” also implies “Darboux integrable” (if the suprema and infima are obtained on the intervals $I'$, this would follow immediately; using the boundedness of the intervals we can find $\sigma$ such that the Riemann sum approximates the upper or lower Darboux sums arbitrarily well.

The interesting part is the following
Theorem
Darboux integrable functions are Riemann integrable. Thus all three notions are equivalent.

Proof. Let $P, Q$ be partitions. Let $|P| \leq \inf_{I'\in Q, |I'| \neq 0} |I'|$, and let $m$ be the number of non-degenerate subintervals in $Q$. We have the following estimate
$\displaystyle \sum_{I'\in Q} (\inf_{I'} f) |I'| - (m-1) |P| (\sup_I 2|f|) \leq \sum_{J'\in P} f(\sigma(J')) |J'| \leq \sum_{I'\in Q} (\sup_{I'} f) |I'| + (m-1) |P| (\sup_I 2|f|)$
The estimate follows by noting that “most” of the $J'\in P$ will be proper subsets of $I'\in Q$, and there can be at most $m-1$ of the $J'$ that straddles between two different non-degenerate sub-intervals of $Q$. To prove the theorem it suffices to choose first a $Q$ such that the upper and lower Darboux sums well-approximates the integral. Then we can conclude for all $P$ with $|P|$ sufficiently small the Riemann sum is almost controlled by the $Q$-Darboux sums. Q.E.D.

Now that we have recalled the case of the usual integrability. Let us consider the case of the Stieltjes integrals: instead of integrating against $\mathrm{d}x$, we integrate against $\mathrm{d}\rho$, where $\rho$ is roughly speaking a “cumulative distribution function”: we assume that $\rho:I \to \mathbb{R}$ is a bounded monotonically increasing function.

The definition of the integrals are largely the same, except that at every step we replace the width of the interval $|I'|$ by the diameter of $\rho(I')$, i.e. $\sup_{I'} \rho - \inf_{I'} \rho$. The arguments above immediately also imply that

• “Riemann-Stieltjes integrable” implies “Generalised-Riemann-Stieltjes integrable”
• “Darboux-Stieltjes integrable” implies “Generalised-Riemann-Stieltjes integrable”
• “Generalised-Riemann-Stieltjes integrable” implies “Darboux-Stientjes integrable”

However, Darboux-Stieltjes integrable functions need not be Riemann-Stieltjes integrable. The possibility of failure can be seen in the proof of the theorem above, where we used the fact that $|P|$ is allow to be made arbitrarily small. The same estimate, in the case of the Stieltjes version of the integrals, has $|P|$ replaced by $\sup_{J'\in P} (\sup_{J'} \rho - \inf_{J'} \rho)$, which for arbitrary partitions need to shrink to zero. To have a concrete illustration, we give the following:

Example
Let $I = [0,1]$. Let $\rho(x) = 0$ if $x < \frac12$ and $1$ otherwise. Let $f(x) = 0$ if $x \leq \frac12$ and $1$ otherwise. Let $Q_0$ be the partition $\{ [0,\frac12], [\frac12,1]\}$. We have that
$\displaystyle \sum_{I'\in Q_0} (\sup_{I'} f) (\sup_{I'} \rho - \inf_{I'} \rho) = 0 \cdot (1 - 0) + 1\cdot (1 - 1) = 0$
while
$\displaystyle \sum_{I'\in Q_0} (\inf_{I'} f) (\sup_{I'} \rho - \inf_{I'} \rho) = 0 \cdot (1-0) + 0 \cdot(1-1) = 0$
so we have that in particular the pair $(f,\rho)$ is Darboux-Stieltjes integrable with integral 0. However, let $k$ be any odd integer, consider the partition $P_k$ of $[0,1]$ into $k$ equal portions. Depending on the choice of the selection $\sigma$, we see that the sum can take the values
$\displaystyle \sum_{I'\in P_k} f(\sigma(I')) (\sup_{I'} \rho - \inf_{I'}\rho) = f(\sigma([\frac12 - \frac1{2k},\frac12 + \frac1{2k}])) (1 - 0) \in \{0,1\}$
which shows that the Riemann-Stieltjes condition can never be satisfied.

The example above where both $f$ and $\rho$ are discontinuous at the same point is essentially sharp. A easy modification of the previous theorem shows that
Prop
If at least one of $f,\rho$ is continuous, then Darboux-Stieltjes integrability is equivalent to Riemann-Stieltjes integrability.

Remark The nonexistence of Riemann-Stieltjes integral when $f$ and $g$ has shared discontinuity points is similar in spirit to the idea in distribution theory where whether the product of two distributions is well-defined (as a distribution) depends on their wave-front sets.

### An optimization problem: variation

Examining the theorem proven in the previous post, we are led naturally to ask whether there are higher order generalizations.

Question: Let $f \in C^{k}([-1,1])$ with $f^{(k)} > 0$. What can we say about the minimizer of $C = \int_{-1}^1 |f(x) - p(x)|~\mathrm{d}x$ where $p$ ranges over degree $k-1$ polynomials?

It is pretty easy to see that we expect $p$ to intersect $f$ at the maximum number of points, which is $k$. We label those points $x_1, \ldots, x_k$ and call $x_0 = -1$ and $x_{k+1}= 1$. Then the cost function can be written as
$\displaystyle C = \sum_{j = 0}^k (-1)^j \int_{x_j}^{x_{j+1}} f(x) - p(x; x_1, \ldots, x_k) ~\mathrm{d}x$
Since we know that values of $p$ at the points $x_1, \ldots, x_k$ we can write down the interpolation polynomial explicitly using Sylvester’s formula:
$\displaystyle p = \sum_{j = 1}^k \left( \prod_{1 \leq m \leq k, m\neq j} \frac{x - x_m}{x_j - x_m} \right) f(x_j) = \sum L_j(x; x_1, \ldots, x_k) f(x_j)$

The partial derivatives are now
$\displaystyle \partial_n C = \sum_{j = 0}^k (-1)^{j+1} \int_{x_j}^{x_{j+1}} \partial_n p(x; x_1, \ldots, x_k) ~\mathrm{d}x$
It remains to compute $\partial_n p$ for $1 \leq n \leq k$. We observe that when $n \neq j$
$\displaystyle \partial_n L_j = - \frac{1}{x - x_n} L_j + \frac{1}{x_j - x_n} L_j$
and also
$\displaystyle \partial_n L_n = - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n$
So
$\displaystyle \partial_n p = \sum_{j \neq n} \frac{x-x_j}{(x_j - x_n)(x - x_n)} L_j f(x_j) + L_n f'(x_n) - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n f(x_n)$
Now, we observe that
$\displaystyle \frac{x - x_j}{x - x_n} L_j = - \left( \prod_{m \neq n,j} \frac{x_n - x_m}{x_j - x_m} \right) L_n$
so after some computation we arrive at
$\displaystyle \partial_n p = L_n(x) \cdot \left[ f'(x_n) - \sum_{j \neq n} \frac{1}{x_j - x_n} \left(\left( \prod_{m \neq j,n}\frac{x_n - x_m}{x_j - x_m}\right)f(x_j) - f(x_n) \right)\right]$
which we can further simplify to
$\displaystyle \partial_n p = L_n(x) \cdot \left( f'(x_n) - p'(x_n)\right)$
Now, since $f$ and $p$ cross transversely at $x_n$, the difference of their derivatives is non-zero. (This harks back to our assumption that $f^{(k)} > 0$.) So we are down, as in the case where $k = 2$, to equations entirely independent of $f$.

More precisely, we see that the stationarity condition becomes the choice of $x_1, \ldots, x_k$ such that the integrals
$\displaystyle \sum_{j = 0}^k (-1)^{j} \int_{x_j}^{x_{j+1}} L_n(x) ~\mathrm{d}x = 0$
for each $n$. Since $L_n$ form a basis for the polynomials of degree at most $k-1$, we have that the function
$\chi(x) = (-1)^j \qquad x \in (x_j, x_{j+1})$
is $L^2$ orthogonal to every polynomial of degree at most $k-1$. So in particular the $x_j$ are solutions to the following system of equations
$x_0 = -1, \qquad x_{k+1} = 1$
$\sum_{j = 0}^k (-1)^j \left[ x_{j+1}^d - x_{j}^d \right] = 0 \qquad \forall d \in \{1, \ldots, k\}$

From symmetry considerations we have that $x_j = - x_{k+1 - j}$. This also kills about half of the equations. For the low $k$ we have

1. $\{ 0\}$
2. $\{ -1/2, 1/2\}$
3. $\{-1/2, 0, 1/2\}$
4. $\{ (\pm 1 \pm \sqrt{5})/4 \}$
5. $\{ 0, \pm\frac12, \pm \frac{\sqrt{3}}2 \}$

### An optimization problem: theme

Let’s start simple:

Question 1: What is the linear function $\ell(x)$ that minimizes the integral $\int_{-1}^1 |x^2 + x - \ell(x)| ~\mathrm{d}x$? In other words, what is the best linear approximation of $x^2 + x$ in the $L^1([-1,1])$ sense?

This is something that can in principle be solved by a high schooler with some calculus training. Here’s how one solution may go:

Solution: All linear functions take the form $\ell(x) = ax + b$. The integrand is equal to $x^2 + x - \ell(x)$ when $x^2 + x \geq \ell(x)$, and $\ell(x) - x - x^2$ otherwise. So we need to find the points of intersection. This requires solve $x^2 + x - ax - b = 0$, which we can solve by the quadratic formula. In the case where $x^2 + x - a x - b$ is signed, we see that changing $b$ we can make the integrand strictly smaller, and hence we cannot attain a minimizer. So we know that at the minimizer there must exist at least one root.

Consider the case where there is only one root in the interval (counted with multiplicity), call the root $x_0$. We have that the integral to minimize is equal to
$\displaystyle \left| \int_{-1}^{x_0} x^2 + x - ax - b~\mathrm{d}x \right| + \left| \int_{x_0}^1 x^2 + x - a x - b ~\mathrm{d}x \right|$
each part of which can be computed explicitly to obtain
$\displaystyle \left| \frac13 x_0^3 + \frac13 + \frac{1-a}{2} x_0^2 - \frac{1-a}{2} - b x_0 - b \right| + \left| \frac13 - \frac13 x_0^3 + \frac{1-a}{2} - \frac{1-a}{2} x_0^2 - b + b x_0\right|$
Since we know that the two terms comes from integrands with different signs, we can combine to get
$\displaystyle \left| \frac23 x_0^3 + (1-a) x_0^2 - (1-a) - 2b x_0 \right|$
as the integrand. Now, we cannot just take the partial derivatives of the above expression with respect to $a,b$ and set that to zero and see what we get: the root $x_0$ depends also on the parameters. So what we would do then is to plug in $x_0$ using the expression derived from the quadratic formula, $x_0 = \frac{1}{2} \left( a - 1 \pm \sqrt{ (1-a)^2 + 4b}\right)$, and then take the partial derivatives. Before that, though, we can simplify a little bit: since $x_0^3 + (1-a) x_0^2 - b x_0 = 0$ from the constraint, the quantity to minimize is now
$\displaystyle \left| - \frac13 x_0^3 - (1-a) - b x_0 \right|$
A long computation taking the $\partial_b$ now shows that necessarily $x_0 = 0$, which implies that $b = 0$. But for the range of $a$ where there is only one root in the interval, the quantity does not achieve a minimum. (The formal minimizer happens at $a = 1$ but we see for this case the integrand of the original cost function is signed.

So we are down to the case where there are two roots in the interval. Now we call the roots $x_+$ and $x_-$, and split the integral into
$\displaystyle \left| \int_{-1}^{x_-} x^2 + x - ax - b~\mathrm{d}x - \int_{x_-}^{x_+} x^2 + x - ax - b~\mathrm{d}x + \int_{x_+}^1 x^2 + x - ax - b~\mathrm{d}x \right|$
and proceed as before. (The remainder of the proof is omitted; the reader is encouraged to carry out this computation out by hand to see how tedious it is.) Read the rest of this entry »

### Products and expectation values

Let us start with an instructive example (modified from one I learned from Steven Landsburg). Let us play a game:

I show you three identical looking boxes. In the first box there are 3 red marbles and 1 blue one. In the second box there are 2 red marbles and 1 blue one. In the last box there is 1 red marble and 4 blue ones. You choose one at random. What is …

• The expected number of red marbles you will find?
• The expected number of blue marbles you will find?
• The expected number of marbles, irregardless of colour, you will find?
• The expected percentage of red marbles you will find?
• The expected percentage of blue marbles you will find?

### Decay of Waves IV: Numerical Interlude

I offer two videos. In both videos the same colour scheme is used: we have four waves in red, green, blue, and magenta. The four represent the amplitudes of spherically symmetric free waves on four different types of spatial geometries: 1 dimension flat space, 2 dimensional flat space, 3 dimensional flat space, and a 3 dimensional asymptotically flat manifold with “trapping” (has closed geodesics). Can you tell which is which? (Answer below the fold.)

### “The asymptotically hyperboloidal is not asymptotically null.”

By way of Roland Donninger, I learned today of the statement above which is apparently well-known in the numerical relativity community.

It may seem intuitively surprising: after all, the archetype of an asymptotically hyperboloidal surface is the hyperboloid as embedded in Minkowski space. Let $(t,r, \omega)\in \mathbb{R}\times\mathbb{R}_+ \times \mathbb{S}^{d-1}$ be the spherical coordinate system for the Minkowski space $\mathbb{R}^{1,d}$, the hyperboloid embeds in it as the surface $t^2 - r^2 = 1$. If you draw a picture we see clearly that the surface is asymptotic to the null cone $t = |r|$

The key, however, lies in the definition. For better or for worse, the definition under which the titular statement makes sense the following:

Definition
Let $(M,g)$ be an asymptotically simple space-time (or one for which one can define a Penrose compactification), and let $(\bar{M},\Omega^2 g)$ be the compactified space-time. We say that a hypersurface $\Sigma \subset M$ is asymptotically null if the $\bar{\Sigma}\cap \bar{M}$ transversely and the tangent space of $\bar{\Sigma}$ is null along $\partial\bar{M}$.

Now suppose near $\partial\bar{M}$ we can foliate via a double-null foliation $(u,v)$, with $\partial\bar{M} = \{ u = 0\}$. Let $x$ be a coordinate on $\partial\bar{M}$ so that $(u,v,x)$ form a coordinate system for a neighborhood of $\partial\bar{M}$. Assume that our surface $\Sigma$ can be written as a graph

$v = \phi(u,x)$

where $\phi$ is a $C^3$ function. Then the asymptotically null condition is just that $\partial_u \phi |_{u = 0} = 0$. Taking a Taylor expansion we have that this means

$v \approx \phi_{\infty}(x) + \phi^{(2)}_{\infty}(x) u^2$.

For the usual conformal compactification of Minkowski space, we have $u = \frac{\pi}{2} - \cot^{-1}\left( \frac{1}{r+t}\right)$. Hence we require that an asymptotically null surface to have convergence to the null surface at rate $O(1/(r+t)^2)$ (if $\phi$ is sufficiently differentiable; if we relax the differentiability at infinity we see that the above condition allows us to relax all the way to $O(1/(r+t)^{1+})$, but $O(1/(r+t))$ is not admissible).

On the other hand, the hyperboloid is given by $(r+t)(r-t) = -1 \implies r-t = v = O(1/(r+t))$ and so is not asymptotically null. And indeed, we can also check by direct computation that in the usual conformal compactification of Minkowski space, the limit of the hyperboloid at null infinity is space-like.

### Continuity of the infimum

Just realised (two seeks ago, but only gotten around to finish this blog posting now) that an argument used to prove a proposition in a project I am working on is wrong. After reducing the problem to its core I found that it is something quite elementary. So today’s post would be of a different flavour from the ones of recent past.

Question Let $X,Y$ be topological spaces. Let $f:X\times Y\to\mathbb{R}$ be a bounded, continuous function. Is the function $g(x) = \inf_{y\in Y}f(x,y)$ continuous?

Intuitively, one may be tempted to say “yes”. Indeed, there are plenty of examples where the answer is in the positive. The simplest one is when we can replace the infimum with the minimum:

Example Let the space $Y$ be a finite set with the discrete topology. Then $g(x) = \min_{y\in Y} f(x,y)$ is continuous.
Proof left as exercise.

But in fact, the answer to the question is “No”. Here’s a counterexample:

Example Let $X = Y = \mathbb{R}$ with the standard topology. Define

$\displaystyle f(x,y) = \begin{cases} 1 & x > 0 \\ 0 & x < -e^{y} \\ 1 + x e^{-y} & x\in [-e^{y},0] \end{cases}$

which is clearly continuous. But the infimum function $g(x)$ is roughly the Heaviside function: $g(x) = 1$ if $x \geq 0$, and $g(x) = 0$ if $x < 0$.

So what is it about the first example that makes the argument work? What is the different between the minimum and the infimum? A naive guess maybe that in the finite case, we are taking a minimum, and therefore the infimum is attained. This guess is not unreasonable: there are a lot of arguments in analysis where when the infimum can be assumed to be attained, the problem becomes a lot easier (when we are then allowed to deal with a minimizer instead of a minimizing sequence). But sadly that is not (entirely) the case here: for every $x_0$, we can certainly find a $y_0$ such that $f(x_0,y_0) = g(x_0)$. So attaining the infimum point-wise is not enough.

What we need, here, is compactness. In fact, we have the following

Theorem If $X,Y$ are topological spaces and $Y$ is compact. Then for any continuous $f:X\times Y\to\mathbb{R}$, the function $g(x) := \inf_{y\in Y} f(x,y)$ is well-defined and continuous.

Proof usually proceeds in three parts. That $g(x) > -\infty$ follows from the fact that for any fixed $x\in X$, $f(x,\cdot):Y\to\mathbb{R}$ is a continuous function defined on a compact space, and hence is bounded (in fact the infimum is attained). Then using that the sets $(-\infty,a)$ and $(b,\infty)$ form a subbase for the topology of $\mathbb{R}$, it suffices to check that $g^{-1}((-\infty,a))$ and $g^{-1}((b,\infty))$ are open.

Let $\pi_X$ be the canonical projection $\pi_X:X\times Y\to X$, which we recall is continuous and open. It is easy to see that $g^{-1}((-\infty,a)) = \pi_X \circ f^{-1}((-\infty,a))$. So continuity of $f$ implies that this set is open. (Note that this part does not depend on compactness of $Y$. In fact, a minor modification of this proof shows that for any family of upper semicontinuous functions $\{f_c\}_C$, the pointwise infimum $\inf_{c\in C} f_c$ is also upper semicontinuous, a fact that is very useful in convex analysis. And indeed, the counterexample function given above is upper semicontinuous.)

It is in this last part, showing that $g^{-1}((b,\infty))$ is open, that compactness is crucially used. Observe that $g(x) > b \implies f(x,y) > b~ \forall y$. In other words $g(x) > b \implies \forall y, (x,y) \in f^{-1}((b,\infty))$ an open set. This in particular implies that $\forall x\in g^{-1}((b,\infty)) \forall y\in Y$ there exists a “box” neighborhood $U_{(x,y)}\times V_{(x,y)}$ contained in $f^{-1}((b,\infty))$. Now using compactness of $Y$, a finite subset $\{(x,y_i)\}$ of all these boxes cover $\{x\}\times Y$. And in particular we have

$\displaystyle \{x\}\times Y \subset \left(\cap_{i = 1}^k U_{(x,y_i)}\right)\times Y \subset f^{-1}((b,\infty))$

and hence $g^{-1}((b,\infty)) = \cup_{x\in g^{-1}((b,\infty))} \cap_{i = 1}^{k(x)} U_{x,y_i}$ is open. Q.E.D.

One question we may ask is how sharp is the requirement that $Y$ is compact. As with most things in topology, counterexamples abound.

Example Let $Y$ be any uncountably infinite set equipped with the co-countable topology. That is, the collection of open subsets are precisely the empty set and all subsets whose complement is countable. The two interesting properties of this topology are (a) $Y$ is not compact and (b) $Y$ is hyperconnected. (a) is easy to see: let $C$ be some countably infinite subset of $Y$. For each $c\in C$ let $U_c = \{c\}\cup (Y\setminus C)$. This forms an open cover with not finite sub-cover. Hyperconnected spaces are, roughly speaking, spaces in which all open nonempty sets are “large”, in the sense that they mutually overlap a lot. In particular, a continuous map from a hyperconnected space to a Hausdorff space must be constant. In our case we can see this directly: suppose $h:Y\to \mathbb{R}$ is a continuous map. Fix $y_1,y_2\in Y$. Let $N_{1,2}\subset \mathbb{R}$ be open neighborhoods of $f(y_{1,2})$. Since $h$ is continuous, $h^{-1}(N_1)\cap h^{-1}(N_2)$ is open and non-empty (by the co-countable assumption). Therefore $N_1\cap N_2\neq \emptyset$ for any pairs of neighborhoods. Since $\mathbb{R}$ is Hausdorff, this forces $h$ to be the constant map. This implies that for any topological space $X$, a continuous function $f:X\times Y\to\mathbb{R}$ is constant along $Y$, and hence for any $y_0\in Y$, we have $\inf_{y\in Y} f(x,y) =: g(x) = f(x,y_0)$ is continuous.

One can try to introduce various regularity/separation assumptions on the spaces $X,Y$ to see at what level compactness becomes a crucial requirement. As an analyst, however, I really only care about topological manifolds. In which case the second counterexample up top can be readily used. We can slightly weaken the assumptions and still prove the following partial converse in essentially the same way.

Theorem Let $X$ be Tychonoff, connected, and first countable, such that $X$ contains a non-trivial open subset whose closure is not the entire space; and let $Y$ be paracompact, Lindelof. Then if $Y$ is noncompact, there exists a continuous function $f:X\times Y\to\mathbb{R}$ such that $\inf_{y\in Y}f:X\to \mathbb{R}$ is not continuous.

Remark Connected (nontrivial) topological manifolds automatically satisfy the conditions on $X$ and $Y$ except for non-compactness. The conditions given are not necessary for the theorem to hold; but they more or less capture the topological properties used in the construction of the second counterexample above.

Remark If $X$ is such that every open set’s closure is the entire space, we must have that it is hyperconnected (let $C\subset X$ be a closed set. Suppose $D\subset X$ is another closed set such that $C\cup D = X$. Then $C\subset D^c$ and vice versa, but $D^c$ is open, so $C = X$. Hence $X$ cannot be written as the union of two proper closed subsets). And if it is Tychonoff, then $X$ is either the empty-set or the one-point set.

Lemma For a paracompact Lindelof space that is noncompact, there exists a countably infinite open cover $\{U_k\}$ and a sequence of points $y_k \in U_k$ such that $\{y_k\}\cap U_j = \emptyset$ if $j\neq k$.

Proof: By noncompactness, there exists an open cover that is infinite. By Lindelof, this open cover can be assumed to be countable, which we enumerate by $\{V_k\}$ and assume WLOG that $\forall k, V_k \setminus \cup_{j =1}^{k-1} V_j \neq \emptyset$. Define $\{U_k\}$ and $\{y_k\}$ inductively by: $U_k = V_k \setminus \cup_{j = 1}^{k-1} \{ y_j\}$ and choose $y_k \in U_k \setminus \cup_{j=1}^{k-1}U_j$.

Proof of theorem: We first construct a sequence of continuous functions on $X$. Let $G\subset X$ be a non-empty open set such that its closure-complement $H = (\bar{G})^c$ is a non-empty open set ($G$ exists by assumption). By connectedness $\bar{G}\cap \bar{H} \neq \emptyset$, so we can pick $x_0$ in the intersection. Let $\{x_j\}\subset H$ be a sequence of points converging to $x_0$, which exists by first countability. Using Tychonoff, we can get a sequence of continuous functions $f_j$on $X$ such that $f_j|_{\bar{G}} = 0$ and $f_j(x_j) = -1$.

On $Y$, choose an open cover $\{U_k\}$ and points $\{y_k\}$ per the previous Lemma. By paracompactness we have a partition of unity $\{\psi_k\}$ subordinate to $U_k$, and by the conclusion of the Lemma we have that $\psi_k(y_k) = 1$. Now we define the function

$\displaystyle f(x,y) = \sum_{k} f_k(x)\psi_k(y)$

which is continuous, and such that $f|_{\bar{G}\times Y} = 0$. But by construction $\inf_{y\in Y}f(x,y) \leq f(x_k,y_k) = f_k(x_k) = -1$, which combined with the fact that $x_k \to x_0 \in \bar{G}$ shows the desired result. q.e.d.

### Gauge invariance, geometrically

A somewhat convoluted chain of events led me to think about the geometric description of partial differential equations. And a question I asked myself this morning was

Question
What is the meaning of gauge invariance in the jet-bundle treatment of partial differential equations?

The answer, actually, is quite simple.

Review of geometric formulation PDE
We consider here abstract PDEs formulated geometrically. All objects considered will be smooth. For more about the formal framework presented here, a good reference is H. Goldschmidt, “Integrability criteria for systems of nonlinear partial differential equations”, JDG (1967) 1:269–307.

A quick review: the background manifold $X$ is assumed (here we take a slightly more restrictive point of view) to be a connected smooth manifold. The configuration space $\mathcal{C}$ is defined to be a fibred manifold $p:\mathcal{C}\to X$. By $J^r\mathcal{C}$ we refer to the fibred manifold of $r$-jets of $\mathcal{C}$, whose projection $p^r = \pi^r_0 \circ p$ where for $r > s$ we use $\pi^r_s: J^r\mathcal{C}\to J^s\mathcal{C}$ for the canonical projection.

A field is a (smooth) section $\phi \subset \Gamma \mathcal{C}$. A simple example that capture most of the usual cases: if we are studying mappings between manifolds $\phi: X\to N$, then we take $\mathcal{C} = N\times X$ the trivial fibre bundle. The $s$-jet operator naturally sends $j^s: \Gamma\mathcal{C} \ni \phi \mapsto j^s\phi \in \Gamma J^r\mathcal{C}$.

A partial differential equation of order $r$ is defined to be a fibred submanifold $J^r\mathcal{C} \supset R^r \to X$. A field is said to solve the PDE if $j^r\phi \subset R^r$.

In the usual case of systems of PDEs on Euclidean space, $X$ is taken to be $\mathbb{R}^d$ and $\mathcal{C} = \mathbb{R}^n\times X$ the trivial vector bundle. A system of $m$ PDEs of order $r$ is usually taken to be $F(x,\phi, \partial\phi, \partial^2\phi, \ldots, \partial^r\phi) = 0$ where

$\displaystyle F: X\times \mathbb{R}^n \times \mathbb{R}^{dn} \times \mathbb{R}^{\frac{1}{2}d(d+1)n} \times \cdots \times \mathbb{R}^{{d+r-1 \choose r} n} \to \mathbb{R}^m$

is some function. We note that the domain of $F$ can be identified in this case with $J^r\mathcal{C}$, We can then extend $F$ to $\tilde{F}: J^r\mathcal{C} \ni c \mapsto (F(c),p^r(c)) \in \mathbb{R}^m\times X$ a fibre bundle morphism.

If we assume that $\tilde{F}$ has constant rank, then $\tilde{F}^{-1}(0)$ is a fibred submanifold of $J^r\mathcal{C}$, and this is our differential equation.

Gauge invariance
In this frame work, the gauge invariance of a partial differential equation relative to certain symmetry groups can be captured by requiring $R^r$ be an invariant submanifold.

More precisely, we take

Definition
A symmetry/gauge group $\mathcal{G}$ is a subgroup of $\mathrm{Diff}(\mathcal{C})$, with the property that for any $g\in\mathcal{G}$, there exists a $g'\in \mathrm{Diff}(X)$ with $p\circ g = g' \circ p$.

It is important we are looking at the diffeomorphism group for $\mathcal{C}$, not $J^r\mathcal{C}$. In general diffeomorphisms of $J^r\mathcal{C}$ will not preserve holonomy for sections of the form $j^r\phi$, a condition that is essential for solving PDEs. The condition that the symmetry operation “commutes with projections” is to ensure that $g:\Gamma\mathcal{C}\to\Gamma\mathcal{C}$, which in particular guarantees that $g$ extends to a diffeomorphism of $J^rC$ with itself that commutes with projections.

From this point of view, a (system of) partial differential equation(s) $R^r$ is said to be $\mathcal{G}$-invariant if for every $g\in\mathcal{G}$, we have $g(R^r) \subset R^r$.

We give two examples showing that this description agrees with the classical notions.

Gauge theory. In classical gauged theories, the configuration space $\mathcal{C}$ is a fibre bundle with structure group $G$ which acts on the fibres. A section of $G\times X \to X$ induces a diffeomorphism of $\mathcal{C}$ by fibre-wise action. In fact, the gauge symmetry is a fibre bundle morphism (fixes the base points).

General relativity. In general relativity, the configuration space is the space of Lorentzian metrics. So the background manifold is the space-time $X$. And the configuration space is the open submanifold of $S^2T^*X$ given by non-degenerate symmetric bilinear forms with signature (-+++). A diffeomorphism $\Psi:X\to X$ induces $T^*\Psi = (\Psi^{-1})^*: T^*X \to T^*X$ and hence a configuration space diffeomorphism that commutes with projection. It is in this sense that Einstein’s equations are diffeomorphism invariant.

Notice of course, this formulation does not contain the “physical” distinction between global and local gauge transformations. For example, for a linear PDE (so $\mathcal{C}$ is a vector bundle and $R^r$ is closed under linear operations), the trivial “global scaling” of a solution is considered in this frame work a gauge symmetry, though it is generally ignored in physics.