… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

## Category: Require introductory level university maths

### Riemann-, Generalized-Riemann-, and Darboux-Stieltjes integrals

(The following is somewhat rough and may have typos.)

Let us begin by setting the notations and recalling what happens without the Stieltjes part.

Defn (Partition)
Let $I$ be a closed interval. A partition $P$ is a finite collection of closed subintervals $\{I_\alpha\}$ such that

1. $P$ is finite;
2. $P$ covers $I$, i.e. $\cup P = I$;
3. $P$ is pairwise almost disjoint, i.e. for $I_\alpha, I_\beta$ distinct elements of $P$, their intersection contains at most one point.

We write $\mathscr{P}$ for the set of all partitions of $I$.

Defn (Refinement)
Fix $I$ a closed interval, and $P, Q$ two partitions. We say that $P$ refines $Q$ or that $P \preceq Q$ if for every $I_\alpha\in P$ there exists $J_\beta \in Q$ such that $I_\alpha \subseteq J_\beta$.

Defn (Selection)
Given $I$ a closed interval and $P$ a partition, a selection $\sigma: P \to I$ is a mapping that satisfies $\sigma(I_\alpha) \in I_\alpha$.

Defn (Size)
Given $I$ a closed interval and $P$ a partition, the size of $P$ is defined as $|P| = \sup_{I_\alpha \in P} |I_\alpha|$, where $|I_\alpha|$ is the length of the closed interval $I_\alpha$.

Remark In the above we have defined two different preorders on the set $\mathscr{P}$ of all partitions. One is induced by the size: we say that $P \leq Q$ if $|P| \leq |Q|$. The other is given by the refinement $P\preceq Q$. Note that neither are partial orders. (But that the preorder given by refinement can be made into a partial order if we disallow zero-length degenerate closed intervals.) Note also that if $P\preceq Q$ we must have $P \leq Q$.

Now we can define the notions of integrability.

Defn (Integrability)
Let $I$ be a closed, bounded interval and $f:I \to \mathbb{R}$ be a bounded function. We say that $f$ is integrable with integral $s$ in the sense of

• Riemann if for every $\epsilon > 0$ there exists $P_0\in \mathcal{P}$ such that for every $P \leq P_0$ and every selection $\sigma:P \to I$ we have
$\displaystyle \left| \sum_{I' \in P} f(\sigma(I')) |I'| - s \right| < \epsilon$

• Generalised-Riemann if for every $\epsilon > 0$ there exists $P_0 \in \mathcal{P}$ such that for every $P \preceq P_0$ and every selection $\sigma: P\to I$ we have
$\displaystyle \left| \sum_{I' \in P} f(\sigma(I')) |I'| - s \right| < \epsilon$

• Darboux if
$\displaystyle \inf_{P\in\mathscr{P}} \sum_{I' \in P} (\sup_{I'} f )|I'| = \sup_{P\in\mathscr{P}} \sum_{I' \in P} (\inf_{I'} f )|I'| = s$

From the definition it is clear that “Riemann integrable” implies “Generalised-Riemann integrable”. Furthermore, we have clearly that for a fixed $P$
$\displaystyle \sum_{I' \in P} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} f(\sigma(I')) |I'| \leq \sum_{I' \in P} (\sup_{I'} f) |I'|$
and that if $P \preceq Q$ we have
$\displaystyle \sum_{I' \in Q} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} (\sup_{I'} f) |I'| \leq \sum_{I' \in Q} (\inf_{I'} f) |I'|$
so “Darboux integrable” also implies “Generalised-Riemann integrable”. A little bit more work shows that “Generalised-Riemann integrable” also implies “Darboux integrable” (if the suprema and infima are obtained on the intervals $I'$, this would follow immediately; using the boundedness of the intervals we can find $\sigma$ such that the Riemann sum approximates the upper or lower Darboux sums arbitrarily well.

The interesting part is the following
Theorem
Darboux integrable functions are Riemann integrable. Thus all three notions are equivalent.

Proof. Let $P, Q$ be partitions. Let $|P| \leq \inf_{I'\in Q, |I'| \neq 0} |I'|$, and let $m$ be the number of non-degenerate subintervals in $Q$. We have the following estimate
$\displaystyle \sum_{I'\in Q} (\inf_{I'} f) |I'| - (m-1) |P| (\sup_I 2|f|) \leq \sum_{J'\in P} f(\sigma(J')) |J'| \leq \sum_{I'\in Q} (\sup_{I'} f) |I'| + (m-1) |P| (\sup_I 2|f|)$
The estimate follows by noting that “most” of the $J'\in P$ will be proper subsets of $I'\in Q$, and there can be at most $m-1$ of the $J'$ that straddles between two different non-degenerate sub-intervals of $Q$. To prove the theorem it suffices to choose first a $Q$ such that the upper and lower Darboux sums well-approximates the integral. Then we can conclude for all $P$ with $|P|$ sufficiently small the Riemann sum is almost controlled by the $Q$-Darboux sums. Q.E.D.

Now that we have recalled the case of the usual integrability. Let us consider the case of the Stieltjes integrals: instead of integrating against $\mathrm{d}x$, we integrate against $\mathrm{d}\rho$, where $\rho$ is roughly speaking a “cumulative distribution function”: we assume that $\rho:I \to \mathbb{R}$ is a bounded monotonically increasing function.

The definition of the integrals are largely the same, except that at every step we replace the width of the interval $|I'|$ by the diameter of $\rho(I')$, i.e. $\sup_{I'} \rho - \inf_{I'} \rho$. The arguments above immediately also imply that

• “Riemann-Stieltjes integrable” implies “Generalised-Riemann-Stieltjes integrable”
• “Darboux-Stieltjes integrable” implies “Generalised-Riemann-Stieltjes integrable”
• “Generalised-Riemann-Stieltjes integrable” implies “Darboux-Stientjes integrable”

However, Darboux-Stieltjes integrable functions need not be Riemann-Stieltjes integrable. The possibility of failure can be seen in the proof of the theorem above, where we used the fact that $|P|$ is allow to be made arbitrarily small. The same estimate, in the case of the Stieltjes version of the integrals, has $|P|$ replaced by $\sup_{J'\in P} (\sup_{J'} \rho - \inf_{J'} \rho)$, which for arbitrary partitions need to shrink to zero. To have a concrete illustration, we give the following:

Example
Let $I = [0,1]$. Let $\rho(x) = 0$ if $x < \frac12$ and $1$ otherwise. Let $f(x) = 0$ if $x \leq \frac12$ and $1$ otherwise. Let $Q_0$ be the partition $\{ [0,\frac12], [\frac12,1]\}$. We have that
$\displaystyle \sum_{I'\in Q_0} (\sup_{I'} f) (\sup_{I'} \rho - \inf_{I'} \rho) = 0 \cdot (1 - 0) + 1\cdot (1 - 1) = 0$
while
$\displaystyle \sum_{I'\in Q_0} (\inf_{I'} f) (\sup_{I'} \rho - \inf_{I'} \rho) = 0 \cdot (1-0) + 0 \cdot(1-1) = 0$
so we have that in particular the pair $(f,\rho)$ is Darboux-Stieltjes integrable with integral 0. However, let $k$ be any odd integer, consider the partition $P_k$ of $[0,1]$ into $k$ equal portions. Depending on the choice of the selection $\sigma$, we see that the sum can take the values
$\displaystyle \sum_{I'\in P_k} f(\sigma(I')) (\sup_{I'} \rho - \inf_{I'}\rho) = f(\sigma([\frac12 - \frac1{2k},\frac12 + \frac1{2k}])) (1 - 0) \in \{0,1\}$
which shows that the Riemann-Stieltjes condition can never be satisfied.

The example above where both $f$ and $\rho$ are discontinuous at the same point is essentially sharp. A easy modification of the previous theorem shows that
Prop
If at least one of $f,\rho$ is continuous, then Darboux-Stieltjes integrability is equivalent to Riemann-Stieltjes integrability.

Remark The nonexistence of Riemann-Stieltjes integral when $f$ and $g$ has shared discontinuity points is similar in spirit to the idea in distribution theory where whether the product of two distributions is well-defined (as a distribution) depends on their wave-front sets.

### Bouncing a quantum particle back and forth

If you have not seen my previous two posts, you should read them first.

In the two previous posts, I shot particles (okay, simulated the shooting on a computer) at a single potential barrier and looked at what happens. What happens when we have more than one barrier? In the classical case the picture is easy to understand: a particle with insufficient energy to escape will be trapped in the local potential well for ever, while a particle with sufficiently high energy will gain freedom and never come back. But what happens in the quantum case?

If the intuition we developed from scattering a quantum particle against a potential barrier, where we see that depending on the frequency (energy) of the particle, some portion gets transmitted and some portion gets reflected, is indeed correct, what we may expect to see is that the quantum particle bounces between the two barriers, each time losing some amplitude due to tunneling.

But we also saw that the higher frequency components of the quantum particle have higher transmission amplitudes. So we may expect that the high frequency components to decay more rapidly than the low frequency ones, so the frequency of the “left over” parts will continue to decay in time. This however, would be wrong, because we would be overlooking one simple fact: by the uncertainty principle again, very low frequency waves cannot be confined to a small physical region. So when we are faced with two potential barriers, the distance between them gives a characteristic frequency. Below this frequency (energy) it is actually not possible to fit a (half) wave between the barriers, and so the low frequency waves must have significant physical extent beyond the barriers, which means that large portions of these low frequency waves will just radiate away freely. Much above the characteristic frequency, however, the waves have large transmission coefficients and will not be confined.

So the net result is that we should expect for each double barrier a characteristic frequency at which the wave can remain “mostly” stuck between the two barriers, losing a little bit of amplitude at each bounce. This will look like a slowly, but exponentially, decaying standing wave. And I have some videos to show for that!

In the video we start with the same random initial data and evolve it under the linear wave equation with different potentials: the equations look like

$\displaystyle - \partial^2_{tt} u + \partial^2_{xx} u - V u = 0$

where $V$ is a non-negative potential taken in the form

$\displaystyle V(x) = a_1 \exp( - x^2 / b_1) - a_2 \exp( -x^2 / b_2)$

which is a difference of two Gaussians. For the five waves shown the values of $a_1, b_1$ are the same throughout. The coefficients $a_2$ (taken to be $\leq a_1$) and $b_2$ (taken to be $< b_1$) increases from top to bottom, resulting in more and more-widely separated double barriers. Qualitatively we see, as we expected,

• The shallower and narrower the dip the faster the solution decays.
• The shallower and narrower the dip the higher the “characteristic frequency”.

As an aside: the video shown above is generated using Python, in particular NumPy and MatPlotLib; the code took significantly longer to run (20+hours) than to write (not counting the HPDE solver I wrote before for a different project, coding and debugging this simulation took about 3 hours or less). On the other hand, this only uses one core of my quad-core machine, and leaves the computer responsive in the mean time for other things. Compare that to Auto-QCM: the last time I ran it to grade a stack of 400+ multiple choice exams it locked up all four cores of my desktop computer for almost an entire day.

As a further aside, this post is related somewhat to my MathOverflow question to which I have not received a satisfactory answer.

### … and scattering a quantum particle

In the previous post we shot a classical particle at a potential barrier. In this post we shoot a quantum particle.

Whereas the behaviour of the classical particle is governed by Newton’s laws (where the external force providing the acceleration is given as minus the gradient of the potential), we allow our quantum particle to be governed by the Klein-Gordon equations.

• Mathematically, the Klein-Gordon equation is a partial differential equation, whereas Newton’s laws form ordinary differential equations. A typical physical interpretation is that the state space of quantum particles are infinite dimensional, whereas the phase space of physics has finite dimensions.
• Note that physically the Klein-Gordon equation was designed to model a relativistic particle, while in the previous post we used the non-relativistic Newton’s laws. In some ways it would’ve been better to model the quantum particle using Schroedinger’s equation. I plead here however that (a) qualitatively there is not a big difference in terms of the simulated outcomes and (b) it is more convenient for me to use the Klein-Gordon model as I already have a finite-difference solver for hyperbolic PDEs coded in Python on my computer.

To model a particle, we set the initial data to be a moving wave packet, such that at the initial time the solution is strongly localized and satisfies $\partial_t u + \partial_x u = 0$. Absent the mass and potential energy terms in the Klein-Gordon equation (so under the evolution of the free wave equation), this wave packet will stay coherent and just translate to the right as time goes along. The addition of the mass term causes some small dispersion, but the mass is chosen small so that this is not a large effect. The main change to the evolution is the potential barrier, which you can see illustrated in the simulation.

The video shows 8 runs of the simulation with different initial data. Whereas in the classical picture the initial kinetic energy is captured by the initial speed at which the particle is moving, in the quantum/wave picture the kinetic energy is related to the central frequency of your wave packet. So each of the 8 runs have increasing frequency offset that represents increasing kinetic energy. The simulation has two plots, the top shows the square of the solution itself, which gives a good indication of where physically the wave packet is located. The bottom shows a normalized kinetic energy density (I have to include a normalization since the kinetic energy of the first and last particles differ roughly 10 fold).

One notices that in the first two runs, the kinetic energy is sufficiently small that the particle mostly bounces back to the left after hitting the potential.

For the third and fourth runs (frequency shift $\sqrt{2}$ and $\sqrt{3}$ respectively) we see that while a significant portion of the particle bounces back, a noticeable portion “tunnels through” the barrier: this caused by a combination of the quantum tunneling phenomenon and the wave packet form of the initial data.

The phenomenon of quantum tunneling manifests in that all non-zero energy waves will penetrate a finite potential barrier a little bit. But the amount of penetration decays to zero as the energy of the wave goes to zero: this is known as the semiclassical regime. In the semiclassical limit it is known that quantum mechanics converge toward classical mechanics, and so in the low-energy limit we expect our particle to behave like a classical particle and bounce off. So we see that naturally increasing the energy (frequency) of our wave packet we expect more of the tunneling to happen.

Further, observe that by shaping our data into a wave packet it necessarily contains some high frequency components (due to Heisenberg uncertainty principle); high frequency, and hence high energy components do not “see” the potential barrier. Even in the classical picture high energy particles would fly over the potential barrier. So for wave packets there will always be some (perhaps not noticeable due to the resolution of our computation) leakage of energy through the potential barrier. The quantum effect on these high energy waves is that they back-scatter. Whereas the classical high energy particles just fly directly over the barrier, a high energy quantum particle will leave some parts of itself behind the barrier always. We see this in the sixth and seventh runs of the simulation, where the particle mostly passes through the barrier, but a noticeable amount bounces off in the opposite direction.

In between during the fifth run, where the frequency shift is 2, we see that the barrier basically split the particle in two and send one half flying to the right and the other half flying to the left. Classically this is the turning point between particles that go over the bump and particles that bounces back, and would be the case (hard to show numerically!) where a classical particle comes in from afar with just enough energy that it comes to a half at the top of the potential barrier!

And further increasing the energy after the seventh run, we see in the final run a situation where only a negligible amount of the particle scatters backward with almost all of it passing through the barrier unchanged. One interesting thing to note however is that just like the case of the classical particle, the wave packet appears to “slow down” a tiny bit as it goes over the potential barrier.

### An optimization problem: variation

Examining the theorem proven in the previous post, we are led naturally to ask whether there are higher order generalizations.

Question: Let $f \in C^{k}([-1,1])$ with $f^{(k)} > 0$. What can we say about the minimizer of $C = \int_{-1}^1 |f(x) - p(x)|~\mathrm{d}x$ where $p$ ranges over degree $k-1$ polynomials?

It is pretty easy to see that we expect $p$ to intersect $f$ at the maximum number of points, which is $k$. We label those points $x_1, \ldots, x_k$ and call $x_0 = -1$ and $x_{k+1}= 1$. Then the cost function can be written as
$\displaystyle C = \sum_{j = 0}^k (-1)^j \int_{x_j}^{x_{j+1}} f(x) - p(x; x_1, \ldots, x_k) ~\mathrm{d}x$
Since we know that values of $p$ at the points $x_1, \ldots, x_k$ we can write down the interpolation polynomial explicitly using Sylvester’s formula:
$\displaystyle p = \sum_{j = 1}^k \left( \prod_{1 \leq m \leq k, m\neq j} \frac{x - x_m}{x_j - x_m} \right) f(x_j) = \sum L_j(x; x_1, \ldots, x_k) f(x_j)$

The partial derivatives are now
$\displaystyle \partial_n C = \sum_{j = 0}^k (-1)^{j+1} \int_{x_j}^{x_{j+1}} \partial_n p(x; x_1, \ldots, x_k) ~\mathrm{d}x$
It remains to compute $\partial_n p$ for $1 \leq n \leq k$. We observe that when $n \neq j$
$\displaystyle \partial_n L_j = - \frac{1}{x - x_n} L_j + \frac{1}{x_j - x_n} L_j$
and also
$\displaystyle \partial_n L_n = - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n$
So
$\displaystyle \partial_n p = \sum_{j \neq n} \frac{x-x_j}{(x_j - x_n)(x - x_n)} L_j f(x_j) + L_n f'(x_n) - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n f(x_n)$
Now, we observe that
$\displaystyle \frac{x - x_j}{x - x_n} L_j = - \left( \prod_{m \neq n,j} \frac{x_n - x_m}{x_j - x_m} \right) L_n$
so after some computation we arrive at
$\displaystyle \partial_n p = L_n(x) \cdot \left[ f'(x_n) - \sum_{j \neq n} \frac{1}{x_j - x_n} \left(\left( \prod_{m \neq j,n}\frac{x_n - x_m}{x_j - x_m}\right)f(x_j) - f(x_n) \right)\right]$
which we can further simplify to
$\displaystyle \partial_n p = L_n(x) \cdot \left( f'(x_n) - p'(x_n)\right)$
Now, since $f$ and $p$ cross transversely at $x_n$, the difference of their derivatives is non-zero. (This harks back to our assumption that $f^{(k)} > 0$.) So we are down, as in the case where $k = 2$, to equations entirely independent of $f$.

More precisely, we see that the stationarity condition becomes the choice of $x_1, \ldots, x_k$ such that the integrals
$\displaystyle \sum_{j = 0}^k (-1)^{j} \int_{x_j}^{x_{j+1}} L_n(x) ~\mathrm{d}x = 0$
for each $n$. Since $L_n$ form a basis for the polynomials of degree at most $k-1$, we have that the function
$\chi(x) = (-1)^j \qquad x \in (x_j, x_{j+1})$
is $L^2$ orthogonal to every polynomial of degree at most $k-1$. So in particular the $x_j$ are solutions to the following system of equations
$x_0 = -1, \qquad x_{k+1} = 1$
$\sum_{j = 0}^k (-1)^j \left[ x_{j+1}^d - x_{j}^d \right] = 0 \qquad \forall d \in \{1, \ldots, k\}$

From symmetry considerations we have that $x_j = - x_{k+1 - j}$. This also kills about half of the equations. For the low $k$ we have

1. $\{ 0\}$
2. $\{ -1/2, 1/2\}$
3. $\{-1/2, 0, 1/2\}$
4. $\{ (\pm 1 \pm \sqrt{5})/4 \}$
5. $\{ 0, \pm\frac12, \pm \frac{\sqrt{3}}2 \}$

### Bessaga’s converse to the contraction mapping theorem

In preparing some lecture notes for the implicit function theorem, I took a look at Schechter’s delightfully comprehensive Handbook of Analysis and its Foundations (which you can also find on his website), and I learned something new about the Banach fixed point theorem. To quote Schechter:

… although Banach’s theorem is quite easy to prove, a longer proof cannot yield stronger results.

I will write a little bit here about a “converse” to the Banach theorem due to Bessaga, which uses a little bit of help from the Axiom of Choice.

### Compactifying (p,q)-Minkowski space

In a previous post I described a method of thinking about conformal compactifications, and I mentioned in passing that, in principle, the method should also apply to arbitrary signature pseudo-Euclidean space $\mathbb{R}^{p,q}$. A few days ago while visiting Oxford I had a conversation with Sergiu Klainerman where this came up, and we realised that we don’t actually know what the conformal compactifications are! So let me write down here the computations in case I need to think about it again in the future. Read the rest of this entry »

### Products and expectation values

Let us start with an instructive example (modified from one I learned from Steven Landsburg). Let us play a game:

I show you three identical looking boxes. In the first box there are 3 red marbles and 1 blue one. In the second box there are 2 red marbles and 1 blue one. In the last box there is 1 red marble and 4 blue ones. You choose one at random. What is …

• The expected number of red marbles you will find?
• The expected number of blue marbles you will find?
• The expected number of marbles, irregardless of colour, you will find?
• The expected percentage of red marbles you will find?
• The expected percentage of blue marbles you will find?

### Mariş’s Theorem

During a literature search (to answer my question concerning symmetries of “ground states” in variational problem, I came across a very nice theorem due to Mihai Mariş. The theorem itself is, more than anything else, a statement about the geometry of Euclidean spaces. I will give a rather elementary write-up of (a special case of) the theorem here. (The proof presented here can equally well be applied to get the full strength of the theorem as presented in Maris’s paper; I give just the special case for clarity of the discussion.)

### Continuity of the infimum

Just realised (two seeks ago, but only gotten around to finish this blog posting now) that an argument used to prove a proposition in a project I am working on is wrong. After reducing the problem to its core I found that it is something quite elementary. So today’s post would be of a different flavour from the ones of recent past.

Question Let $X,Y$ be topological spaces. Let $f:X\times Y\to\mathbb{R}$ be a bounded, continuous function. Is the function $g(x) = \inf_{y\in Y}f(x,y)$ continuous?

Intuitively, one may be tempted to say “yes”. Indeed, there are plenty of examples where the answer is in the positive. The simplest one is when we can replace the infimum with the minimum:

Example Let the space $Y$ be a finite set with the discrete topology. Then $g(x) = \min_{y\in Y} f(x,y)$ is continuous.
Proof left as exercise.

But in fact, the answer to the question is “No”. Here’s a counterexample:

Example Let $X = Y = \mathbb{R}$ with the standard topology. Define

$\displaystyle f(x,y) = \begin{cases} 1 & x > 0 \\ 0 & x < -e^{y} \\ 1 + x e^{-y} & x\in [-e^{y},0] \end{cases}$

which is clearly continuous. But the infimum function $g(x)$ is roughly the Heaviside function: $g(x) = 1$ if $x \geq 0$, and $g(x) = 0$ if $x < 0$.

So what is it about the first example that makes the argument work? What is the different between the minimum and the infimum? A naive guess maybe that in the finite case, we are taking a minimum, and therefore the infimum is attained. This guess is not unreasonable: there are a lot of arguments in analysis where when the infimum can be assumed to be attained, the problem becomes a lot easier (when we are then allowed to deal with a minimizer instead of a minimizing sequence). But sadly that is not (entirely) the case here: for every $x_0$, we can certainly find a $y_0$ such that $f(x_0,y_0) = g(x_0)$. So attaining the infimum point-wise is not enough.

What we need, here, is compactness. In fact, we have the following

Theorem If $X,Y$ are topological spaces and $Y$ is compact. Then for any continuous $f:X\times Y\to\mathbb{R}$, the function $g(x) := \inf_{y\in Y} f(x,y)$ is well-defined and continuous.

Proof usually proceeds in three parts. That $g(x) > -\infty$ follows from the fact that for any fixed $x\in X$, $f(x,\cdot):Y\to\mathbb{R}$ is a continuous function defined on a compact space, and hence is bounded (in fact the infimum is attained). Then using that the sets $(-\infty,a)$ and $(b,\infty)$ form a subbase for the topology of $\mathbb{R}$, it suffices to check that $g^{-1}((-\infty,a))$ and $g^{-1}((b,\infty))$ are open.

Let $\pi_X$ be the canonical projection $\pi_X:X\times Y\to X$, which we recall is continuous and open. It is easy to see that $g^{-1}((-\infty,a)) = \pi_X \circ f^{-1}((-\infty,a))$. So continuity of $f$ implies that this set is open. (Note that this part does not depend on compactness of $Y$. In fact, a minor modification of this proof shows that for any family of upper semicontinuous functions $\{f_c\}_C$, the pointwise infimum $\inf_{c\in C} f_c$ is also upper semicontinuous, a fact that is very useful in convex analysis. And indeed, the counterexample function given above is upper semicontinuous.)

It is in this last part, showing that $g^{-1}((b,\infty))$ is open, that compactness is crucially used. Observe that $g(x) > b \implies f(x,y) > b~ \forall y$. In other words $g(x) > b \implies \forall y, (x,y) \in f^{-1}((b,\infty))$ an open set. This in particular implies that $\forall x\in g^{-1}((b,\infty)) \forall y\in Y$ there exists a “box” neighborhood $U_{(x,y)}\times V_{(x,y)}$ contained in $f^{-1}((b,\infty))$. Now using compactness of $Y$, a finite subset $\{(x,y_i)\}$ of all these boxes cover $\{x\}\times Y$. And in particular we have

$\displaystyle \{x\}\times Y \subset \left(\cap_{i = 1}^k U_{(x,y_i)}\right)\times Y \subset f^{-1}((b,\infty))$

and hence $g^{-1}((b,\infty)) = \cup_{x\in g^{-1}((b,\infty))} \cap_{i = 1}^{k(x)} U_{x,y_i}$ is open. Q.E.D.

One question we may ask is how sharp is the requirement that $Y$ is compact. As with most things in topology, counterexamples abound.

Example Let $Y$ be any uncountably infinite set equipped with the co-countable topology. That is, the collection of open subsets are precisely the empty set and all subsets whose complement is countable. The two interesting properties of this topology are (a) $Y$ is not compact and (b) $Y$ is hyperconnected. (a) is easy to see: let $C$ be some countably infinite subset of $Y$. For each $c\in C$ let $U_c = \{c\}\cup (Y\setminus C)$. This forms an open cover with not finite sub-cover. Hyperconnected spaces are, roughly speaking, spaces in which all open nonempty sets are “large”, in the sense that they mutually overlap a lot. In particular, a continuous map from a hyperconnected space to a Hausdorff space must be constant. In our case we can see this directly: suppose $h:Y\to \mathbb{R}$ is a continuous map. Fix $y_1,y_2\in Y$. Let $N_{1,2}\subset \mathbb{R}$ be open neighborhoods of $f(y_{1,2})$. Since $h$ is continuous, $h^{-1}(N_1)\cap h^{-1}(N_2)$ is open and non-empty (by the co-countable assumption). Therefore $N_1\cap N_2\neq \emptyset$ for any pairs of neighborhoods. Since $\mathbb{R}$ is Hausdorff, this forces $h$ to be the constant map. This implies that for any topological space $X$, a continuous function $f:X\times Y\to\mathbb{R}$ is constant along $Y$, and hence for any $y_0\in Y$, we have $\inf_{y\in Y} f(x,y) =: g(x) = f(x,y_0)$ is continuous.

One can try to introduce various regularity/separation assumptions on the spaces $X,Y$ to see at what level compactness becomes a crucial requirement. As an analyst, however, I really only care about topological manifolds. In which case the second counterexample up top can be readily used. We can slightly weaken the assumptions and still prove the following partial converse in essentially the same way.

Theorem Let $X$ be Tychonoff, connected, and first countable, such that $X$ contains a non-trivial open subset whose closure is not the entire space; and let $Y$ be paracompact, Lindelof. Then if $Y$ is noncompact, there exists a continuous function $f:X\times Y\to\mathbb{R}$ such that $\inf_{y\in Y}f:X\to \mathbb{R}$ is not continuous.

Remark Connected (nontrivial) topological manifolds automatically satisfy the conditions on $X$ and $Y$ except for non-compactness. The conditions given are not necessary for the theorem to hold; but they more or less capture the topological properties used in the construction of the second counterexample above.

Remark If $X$ is such that every open set’s closure is the entire space, we must have that it is hyperconnected (let $C\subset X$ be a closed set. Suppose $D\subset X$ is another closed set such that $C\cup D = X$. Then $C\subset D^c$ and vice versa, but $D^c$ is open, so $C = X$. Hence $X$ cannot be written as the union of two proper closed subsets). And if it is Tychonoff, then $X$ is either the empty-set or the one-point set.

Lemma For a paracompact Lindelof space that is noncompact, there exists a countably infinite open cover $\{U_k\}$ and a sequence of points $y_k \in U_k$ such that $\{y_k\}\cap U_j = \emptyset$ if $j\neq k$.

Proof: By noncompactness, there exists an open cover that is infinite. By Lindelof, this open cover can be assumed to be countable, which we enumerate by $\{V_k\}$ and assume WLOG that $\forall k, V_k \setminus \cup_{j =1}^{k-1} V_j \neq \emptyset$. Define $\{U_k\}$ and $\{y_k\}$ inductively by: $U_k = V_k \setminus \cup_{j = 1}^{k-1} \{ y_j\}$ and choose $y_k \in U_k \setminus \cup_{j=1}^{k-1}U_j$.

Proof of theorem: We first construct a sequence of continuous functions on $X$. Let $G\subset X$ be a non-empty open set such that its closure-complement $H = (\bar{G})^c$ is a non-empty open set ($G$ exists by assumption). By connectedness $\bar{G}\cap \bar{H} \neq \emptyset$, so we can pick $x_0$ in the intersection. Let $\{x_j\}\subset H$ be a sequence of points converging to $x_0$, which exists by first countability. Using Tychonoff, we can get a sequence of continuous functions $f_j$on $X$ such that $f_j|_{\bar{G}} = 0$ and $f_j(x_j) = -1$.

On $Y$, choose an open cover $\{U_k\}$ and points $\{y_k\}$ per the previous Lemma. By paracompactness we have a partition of unity $\{\psi_k\}$ subordinate to $U_k$, and by the conclusion of the Lemma we have that $\psi_k(y_k) = 1$. Now we define the function

$\displaystyle f(x,y) = \sum_{k} f_k(x)\psi_k(y)$

which is continuous, and such that $f|_{\bar{G}\times Y} = 0$. But by construction $\inf_{y\in Y}f(x,y) \leq f(x_k,y_k) = f_k(x_k) = -1$, which combined with the fact that $x_k \to x_0 \in \bar{G}$ shows the desired result. q.e.d.

### Inverted time translations

Plot of the vector field K_0 and its stream function

In the study of the global properties of wave-type equations, a well-developed method is the vector field method due to Sergiu Klainerman and Demetrios Christodoulou. Maybe in another day I will write a more detailed treatise on what the vector field method is and how to apply it; I won’t do it now. The method is crucial in many proofs of nonlinear stability for wave-type problems, and with perhaps the most striking application the global nonlinear stability of Minkowski space. The main idea behind the vector field method is to construct a tensor that measures the local energy content of the solution to our equations, and exploit the properties of this tensor via vector fields. Examples of this tensor includes the Einstein-Hilbert stress for electromagnetism, as well as the Bel-Robinson tensor for spin-2 (graviton) fields. To exploit the fine properties of this tensor field, one applies the divergence theorem to the tensor field contracted against suitable vector fields. For vector fields associated to the symmetries of the problem, this procedure will produce conservation laws, which will give control of the physical solution at a later time based on control at the present.

As it turns out, the useful symmetries of the equation, in the geometrical case, are closedly related to the conformal symmetries of Minkowski space. These include the true symmetries (translations, rotations, and Lorentzian boosts), as well as the conformal scaling and, what we will discuss here, the inverted time translation, which lies at the heart of decay estimates for spin-1 and spin-2 fields on Minkowski space.

The inverted time translation, often denoted $K_0$, is the vector field given in radial coordinates $K_0 = (t^2 + r^2)\partial_t + 2tr \partial_r$. In the picture to the upper left, the vector field is plotted along with its stream function. This vector field is a conformal symmetry of Minkowski space. The name of the vector field indicates the fact that it is associated to a conformal inversion (which is also used in the conformal compactification of Minkowski space). On Minkowski space, the inversion map $x^\mu \mapsto \frac{x^\mu}{\langle x,x\rangle}$ is a conformal isometry. The vector field $K_0$ can be checked to be the vector field $\partial_t$ conjugated by the inversion map. As such, it has a very nice property compared with the other symmetry vector fields. The time translation $\partial_t$ and the inverted time translation $K_0$ are essentially (up to Lorentz boosts) the only globally causal conformal vector fields of Minkowski space. As such, with a dominant-energy type condition, they are the ones associated to which we have nonnegative energy controls.