Bubbles Bad; Ripples Good

… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

Continuity of the infimum

Just realised (two seeks ago, but only gotten around to finish this blog posting now) that an argument used to prove a proposition in a project I am working on is wrong. After reducing the problem to its core I found that it is something quite elementary. So today’s post would be of a different flavour from the ones of recent past.

Question Let X,Y be topological spaces. Let f:X\times Y\to\mathbb{R} be a bounded, continuous function. Is the function g(x) = \inf_{y\in Y}f(x,y) continuous?

Intuitively, one may be tempted to say “yes”. Indeed, there are plenty of examples where the answer is in the positive. The simplest one is when we can replace the infimum with the minimum:

Example Let the space Y be a finite set with the discrete topology. Then g(x) = \min_{y\in Y} f(x,y) is continuous.
Proof left as exercise.

But in fact, the answer to the question is “No”. Here’s a counterexample:

Example Let X = Y = \mathbb{R} with the standard topology. Define

\displaystyle f(x,y) = \begin{cases} 1 & x > 0 \\ 0 & x < -e^{y} \\ 1 + x e^{-y} & x\in [-e^{y},0]  \end{cases}

which is clearly continuous. But the infimum function g(x) is roughly the Heaviside function: g(x) = 1 if x \geq 0, and g(x) = 0 if x < 0.

So what is it about the first example that makes the argument work? What is the different between the minimum and the infimum? A naive guess maybe that in the finite case, we are taking a minimum, and therefore the infimum is attained. This guess is not unreasonable: there are a lot of arguments in analysis where when the infimum can be assumed to be attained, the problem becomes a lot easier (when we are then allowed to deal with a minimizer instead of a minimizing sequence). But sadly that is not (entirely) the case here: for every x_0, we can certainly find a y_0 such that f(x_0,y_0) = g(x_0). So attaining the infimum point-wise is not enough.

What we need, here, is compactness. In fact, we have the following

Theorem If X,Y are topological spaces and Y is compact. Then for any continuous f:X\times Y\to\mathbb{R}, the function g(x) := \inf_{y\in Y} f(x,y) is well-defined and continuous.

Proof usually proceeds in three parts. That g(x) > -\infty follows from the fact that for any fixed x\in X, f(x,\cdot):Y\to\mathbb{R} is a continuous function defined on a compact space, and hence is bounded (in fact the infimum is attained). Then using that the sets (-\infty,a) and (b,\infty) form a subbase for the topology of \mathbb{R}, it suffices to check that g^{-1}((-\infty,a)) and g^{-1}((b,\infty)) are open.

Let \pi_X be the canonical projection \pi_X:X\times Y\to X, which we recall is continuous and open. It is easy to see that g^{-1}((-\infty,a)) = \pi_X \circ f^{-1}((-\infty,a)). So continuity of f implies that this set is open. (Note that this part does not depend on compactness of Y. In fact, a minor modification of this proof shows that for any family of upper semicontinuous functions \{f_c\}_C, the pointwise infimum \inf_{c\in C} f_c is also upper semicontinuous, a fact that is very useful in convex analysis. And indeed, the counterexample function given above is upper semicontinuous.)

It is in this last part, showing that g^{-1}((b,\infty)) is open, that compactness is crucially used. Observe that g(x) > b \implies f(x,y) > b~ \forall y. In other words g(x) > b \implies \forall y, (x,y) \in f^{-1}((b,\infty)) an open set. This in particular implies that \forall x\in g^{-1}((b,\infty)) \forall y\in Y there exists a “box” neighborhood U_{(x,y)}\times V_{(x,y)} contained in f^{-1}((b,\infty)). Now using compactness of Y, a finite subset \{(x,y_i)\} of all these boxes cover \{x\}\times Y. And in particular we have

\displaystyle \{x\}\times Y \subset \left(\cap_{i = 1}^k U_{(x,y_i)}\right)\times Y \subset f^{-1}((b,\infty))

and hence g^{-1}((b,\infty)) = \cup_{x\in g^{-1}((b,\infty))} \cap_{i = 1}^{k(x)} U_{x,y_i} is open. Q.E.D.

One question we may ask is how sharp is the requirement that Y is compact. As with most things in topology, counterexamples abound.

Example Let Y be any uncountably infinite set equipped with the co-countable topology. That is, the collection of open subsets are precisely the empty set and all subsets whose complement is countable. The two interesting properties of this topology are (a) Y is not compact and (b) Y is hyperconnected. (a) is easy to see: let C be some countably infinite subset of Y. For each c\in C let U_c = \{c\}\cup (Y\setminus C). This forms an open cover with not finite sub-cover. Hyperconnected spaces are, roughly speaking, spaces in which all open nonempty sets are “large”, in the sense that they mutually overlap a lot. In particular, a continuous map from a hyperconnected space to a Hausdorff space must be constant. In our case we can see this directly: suppose h:Y\to \mathbb{R} is a continuous map. Fix y_1,y_2\in Y. Let N_{1,2}\subset \mathbb{R} be open neighborhoods of f(y_{1,2}). Since h is continuous, h^{-1}(N_1)\cap h^{-1}(N_2) is open and non-empty (by the co-countable assumption). Therefore N_1\cap N_2\neq \emptyset for any pairs of neighborhoods. Since \mathbb{R} is Hausdorff, this forces h to be the constant map. This implies that for any topological space X, a continuous function f:X\times Y\to\mathbb{R} is constant along Y, and hence for any y_0\in Y, we have \inf_{y\in Y} f(x,y) =: g(x) = f(x,y_0) is continuous.

One can try to introduce various regularity/separation assumptions on the spaces X,Y to see at what level compactness becomes a crucial requirement. As an analyst, however, I really only care about topological manifolds. In which case the second counterexample up top can be readily used. We can slightly weaken the assumptions and still prove the following partial converse in essentially the same way.

Theorem Let X be Tychonoff, connected, and first countable, such that X contains a non-trivial open subset whose closure is not the entire space; and let Y be paracompact, Lindelof. Then if Y is noncompact, there exists a continuous function f:X\times Y\to\mathbb{R} such that \inf_{y\in Y}f:X\to \mathbb{R} is not continuous.

Remark Connected (nontrivial) topological manifolds automatically satisfy the conditions on X and Y except for non-compactness. The conditions given are not necessary for the theorem to hold; but they more or less capture the topological properties used in the construction of the second counterexample above.

Remark If X is such that every open set’s closure is the entire space, we must have that it is hyperconnected (let C\subset X be a closed set. Suppose D\subset X is another closed set such that C\cup D = X. Then C\subset D^c and vice versa, but D^c is open, so C = X. Hence X cannot be written as the union of two proper closed subsets). And if it is Tychonoff, then X is either the empty-set or the one-point set.

Lemma For a paracompact Lindelof space that is noncompact, there exists a countably infinite open cover \{U_k\} and a sequence of points y_k \in U_k such that \{y_k\}\cap U_j = \emptyset if j\neq k.

Proof: By noncompactness, there exists an open cover that is infinite. By Lindelof, this open cover can be assumed to be countable, which we enumerate by \{V_k\} and assume WLOG that \forall k, V_k \setminus \cup_{j =1}^{k-1} V_j \neq \emptyset. Define \{U_k\} and \{y_k\} inductively by: U_k = V_k \setminus \cup_{j = 1}^{k-1} \{ y_j\} and choose y_k \in U_k \setminus \cup_{j=1}^{k-1}U_j.

Proof of theorem: We first construct a sequence of continuous functions on X. Let G\subset X be a non-empty open set such that its closure-complement H = (\bar{G})^c is a non-empty open set (G exists by assumption). By connectedness \bar{G}\cap \bar{H} \neq \emptyset, so we can pick x_0 in the intersection. Let \{x_j\}\subset H be a sequence of points converging to x_0, which exists by first countability. Using Tychonoff, we can get a sequence of continuous functions f_jon X such that f_j|_{\bar{G}} = 0 and f_j(x_j) = -1.

On Y, choose an open cover \{U_k\} and points \{y_k\} per the previous Lemma. By paracompactness we have a partition of unity \{\psi_k\} subordinate to U_k, and by the conclusion of the Lemma we have that \psi_k(y_k) = 1. Now we define the function

\displaystyle f(x,y) = \sum_{k} f_k(x)\psi_k(y)

which is continuous, and such that f|_{\bar{G}\times Y} = 0. But by construction \inf_{y\in Y}f(x,y) \leq f(x_k,y_k) = f_k(x_k) = -1, which combined with the fact that x_k \to x_0 \in \bar{G} shows the desired result. q.e.d.

Gauge invariance, geometrically

A somewhat convoluted chain of events led me to think about the geometric description of partial differential equations. And a question I asked myself this morning was

Question
What is the meaning of gauge invariance in the jet-bundle treatment of partial differential equations?

The answer, actually, is quite simple.

Review of geometric formulation PDE
We consider here abstract PDEs formulated geometrically. All objects considered will be smooth. For more about the formal framework presented here, a good reference is H. Goldschmidt, “Integrability criteria for systems of nonlinear partial differential equations”, JDG (1967) 1:269–307.

A quick review: the background manifold X is assumed (here we take a slightly more restrictive point of view) to be a connected smooth manifold. The configuration space \mathcal{C} is defined to be a fibred manifold p:\mathcal{C}\to X. By J^r\mathcal{C} we refer to the fibred manifold of r-jets of \mathcal{C}, whose projection p^r = \pi^r_0 \circ p where for r > s we use \pi^r_s: J^r\mathcal{C}\to J^s\mathcal{C} for the canonical projection.

A field is a (smooth) section \phi \subset \Gamma \mathcal{C}. A simple example that capture most of the usual cases: if we are studying mappings between manifolds \phi: X\to N, then we take \mathcal{C} = N\times X the trivial fibre bundle. The s-jet operator naturally sends j^s: \Gamma\mathcal{C} \ni \phi \mapsto j^s\phi \in \Gamma J^r\mathcal{C}.

A partial differential equation of order r is defined to be a fibred submanifold J^r\mathcal{C} \supset R^r \to X. A field is said to solve the PDE if j^r\phi \subset R^r.

In the usual case of systems of PDEs on Euclidean space, X is taken to be \mathbb{R}^d and \mathcal{C} = \mathbb{R}^n\times X the trivial vector bundle. A system of m PDEs of order r is usually taken to be F(x,\phi, \partial\phi, \partial^2\phi, \ldots, \partial^r\phi) = 0 where

\displaystyle F: X\times \mathbb{R}^n \times \mathbb{R}^{dn} \times \mathbb{R}^{\frac{1}{2}d(d+1)n} \times \cdots \times \mathbb{R}^{{d+r-1 \choose r} n} \to \mathbb{R}^m

is some function. We note that the domain of F can be identified in this case with J^r\mathcal{C}, We can then extend F to \tilde{F}: J^r\mathcal{C} \ni c \mapsto (F(c),p^r(c)) \in \mathbb{R}^m\times X a fibre bundle morphism.

If we assume that \tilde{F} has constant rank, then \tilde{F}^{-1}(0) is a fibred submanifold of J^r\mathcal{C}, and this is our differential equation.

Gauge invariance
In this frame work, the gauge invariance of a partial differential equation relative to certain symmetry groups can be captured by requiring R^r be an invariant submanifold.

More precisely, we take

Definition
A symmetry/gauge group \mathcal{G} is a subgroup of \mathrm{Diff}(\mathcal{C}), with the property that for any g\in\mathcal{G}, there exists a g'\in \mathrm{Diff}(X) with p\circ g = g' \circ p.

It is important we are looking at the diffeomorphism group for \mathcal{C}, not J^r\mathcal{C}. In general diffeomorphisms of J^r\mathcal{C} will not preserve holonomy for sections of the form j^r\phi, a condition that is essential for solving PDEs. The condition that the symmetry operation “commutes with projections” is to ensure that g:\Gamma\mathcal{C}\to\Gamma\mathcal{C}, which in particular guarantees that g extends to a diffeomorphism of J^rC with itself that commutes with projections.

From this point of view, a (system of) partial differential equation(s) R^r is said to be \mathcal{G}-invariant if for every g\in\mathcal{G}, we have g(R^r) \subset R^r.

We give two examples showing that this description agrees with the classical notions.

Gauge theory. In classical gauged theories, the configuration space \mathcal{C} is a fibre bundle with structure group G which acts on the fibres. A section of G\times X \to X induces a diffeomorphism of \mathcal{C} by fibre-wise action. In fact, the gauge symmetry is a fibre bundle morphism (fixes the base points).

General relativity. In general relativity, the configuration space is the space of Lorentzian metrics. So the background manifold is the space-time X. And the configuration space is the open submanifold of S^2T^*X given by non-degenerate symmetric bilinear forms with signature (-+++). A diffeomorphism \Psi:X\to X induces T^*\Psi = (\Psi^{-1})^*: T^*X \to T^*X and hence a configuration space diffeomorphism that commutes with projection. It is in this sense that Einstein’s equations are diffeomorphism invariant.

Notice of course, this formulation does not contain the “physical” distinction between global and local gauge transformations. For example, for a linear PDE (so \mathcal{C} is a vector bundle and R^r is closed under linear operations), the trivial “global scaling” of a solution is considered in this frame work a gauge symmetry, though it is generally ignored in physics.

Extensions of (co)vector fields to tangent bundles

I am reading Sasaki’s original paper on the construction of the Sasaki metric (a canonical Riemannian metric on the tangent bundle of a Riemannian manifold), and the following took me way too long to understand. So I’ll write it down in case I forgot in the future.

In section two of the paper, Sasaki consider “extended transformations and extended tensors”. Basically he wanted to give a way to “lift” tensor fields from a manifold to tensor fields of the same rank on its tangent bundle. And he did so in the language of coordinate changes, which geometrical content is a bit hard to parse. I’ll discuss his construction in a bit. But first I’ll talk about something different.

The trivial lifts
Let M, N be smooth manifolds, and let f:M\to N a submersion. Then we can trivially lift covariant objects on N to equivalent objects on M by the pull-back operation. To define the pull-back, we start with a covariant tensor field \vartheta \in \Gamma T^0_kN, and set f^*\vartheta \in \Gamma T^0_kM by the formula:

\displaystyle f^*\vartheta(X_1,\ldots,X_k) = \vartheta(df\circ X_1, \ldots, df\circ X_k)

where the X_1, \ldots, X_k \in T_pM, and we use that df(p): T_pM \to T_{f(p)}N. Observe that for a function g: N \to \mathbb{R}, the pull-back is simply f^*g = g\circ f :M\to N\to\mathbb{R}.

On the other hand, for contravariant tensor fields, the pull-back is not uniquely defined: using that f is a submersion, we have that TM / \ker(df) = TN, so while, given a vector field v on N, we can always find a vector field w on M such that df(w) = v, the vector field w is only unique up to an addition of a vector field that lies in the kernel of df. If, however, that M is Riemannian, then we can take the orthogonal decomposition of TM into the kernel and its complement, thereby getting a well-defined lift of the vector field (in other words, by exploiting the identification between the tangent and cotangent spaces).

Remarkably, the extensions defined by Sasaki is not this one.

(Let me just add a remark here: given two manifolds, once one obtain a well defined way of lifting vectors, covectors, and functions from one to the other, such that they are compatible (\vartheta^*(v^*) = [\vartheta(v)]^*), one can extend this mapping to arbitrary tensor fields.)

The extensions defined by Sasaki
As seen above, if we just rely on the canonical submersion \pi:TM\to M, we cannot generally extend vector fields. Sasaki’s construction, however, strongly exploits the fact that TM is the tangent bundle of M.

We start by looking at the vector field extension defined by equation (2.6) of the linked paper. We first observe that a vector field v on a manifold M is a section of the tangent bundle. That is, v is a map M\to TM such that the composition with the canonical projection \pi\circ v:M\to M is the identity map. This implies, using the chain rule, that the map d(\pi\circ v)= d\pi \circ dv: TM\to TM is also the identity map. Now, d\pi: T(TM) \to TM is the projection induced by the projection map \pi, which is different from the canonical projection \pi_2: T(TM) \to TM from the tangent bundle of a manifold to the manifold itself. However, a Proposition of Kobayashi (see “Theory of Connections” (1957), Proposition 1.4), shows that there exists an automorphism \alpha:T(TM) \to T(TM) such that d\pi \circ \alpha = \pi_2 and \pi_2\circ\alpha = d\pi. So v as a differential mapping induces a map \alpha\circ dv: TM \to T(TM), which is a map from the tangent bundle TM to the double tangent bundle T(TM), which when composed with the canonical projection \pi_2 is the identity. In other words, \alpha\circ dv is a vector field on TM.

Next we look at the definition (2.7) for one-forms. Give \vartheta a one-form on M, it induces naturally a scalar function on TM: for p\in M, v\in T_pM, we have \vartheta: TM\to \mathbb{R} taking value \vartheta(p)\cdot v. Hence its differential d\vartheta is a one-form over TM.

Now, what about scalar functions? Let \vartheta be a one-form and v be a vector field on M, we consider the pairing of their extensions to TM. It is not too hard to check that the corresponding scalar field to \vartheta(v), when evaluated at (p,w)\in TM, is in fact d(\vartheta(v))|_{p,w}, the derivative of the scalar function \vartheta(v) in the direction of w at point p. In general, the compatible lift of scalar fields g:M\to \mathbb{R} to TM is the function \tilde{g}(p,v) = dg(p)[v].

Using this we can extend the construction to arbitrary tensor fields, and a simple computation yields that this construction is in fact identical, for rank-2 tensors, to the expressions given in (2.8), (2.9), and (2.10) in the paper.

The second extension
The above extension is not the only map sending vectors on M to vectors on TM. In the statement of Lemmas 3 there is also another construction. Given a vector field v, it induces a one parameter family of diffeomorphisms on TM via that maps \psi_t(p,w) = (p, w+vt). Its differential \frac{d}{dt}\psi_t|_{t=0} is a vector field over TM.

The construction in the statement of Lemma 4 is the trivial one mentioned at the start of this post.

Decay of waves IIIb: tails for homogeneous linear equation on curved background

Now we will actually show that the specific decay properties of the linear wave equation on Minkowski space–in particular the strong Huygens’ principle–is very strongly tied to the global geometry of that space-time. In particular, we’ll build, by hand, an example of a space-time where geometry itself induces back-scattering, and even linear, homogeneous waves will exhibit a tail.

For convenience, the space-time we construct will be spherically symmetric, and we will only consider spherically symmetric solutions of the wave equation on it. We will also focus on the 1+3 dimensional case. Read the rest of this entry »

Decay of waves IIIa: nonlinear tails in Minkowski space redux

Before we move on to the geometric case, I want to flesh out the nonlinear case mentioned in the end of the last post a bit more. Recall that it was shown for generic nonlinear (actually semilinear; for quasilinear and worse equations we cannot use Duhamel’s principle) wave equations, if we put in compact support for the initial data, we expect the first iterate to exhibit a tail. One may ask whether it is possible that, in fact, this is an artifact of the successive approximation scheme; that in fact somehow it always transpires that a conspiracy happens, and all the higher order iterates cancel out the tail coming from the first iterate. This is rather unlikely, owing to the fact that the convergence to \phi_\infty is dominated by a geometric series. But to just make double sure, here we give a nonlinear system of wave equations such that the successive approximation scheme converges after finitely many steps (in fact, after the first iterate), and so we can also explicitly compute the rate of decay for the nonlinear tail. While the decay rate is not claimed to be generic (though it is), the existence of one such example with a fixed decay rate shows that for a statement quantifying over all nonlinear wave equations, it would be impossible to demonstrate better decay rate than the one exhibited. Read the rest of this entry »

Decay of waves IIb: Minkowski space, with right-hand side

In the first half of this second part of the series, we considered solutions to the linear, homogeneous wave equation on flat Minkowski space, and showed that for compactly supported initial data, we have strong Huygens’ principle. We further made references to the fact that this behaviour is expected to be unstable. In this post, we will further illustrate this instability by looking at Equation 1 first with a fixed source F = F(t,x), and then with a nonlinearity F = F(t,x, \phi, \partial\phi).

Duhamel’s Principle

To study how one can incorporate inhomogeneous terms into a linear equation, and to get a qualitative grasp of how the source term contributes to the solution, we need to discuss the abstract method known as Duhamel’s Principle. We start by illustrating this for a very simple ordinary differential equation.

Consider the ODE satisfied by a scalar function \alpha:

Equation 13
\displaystyle \frac{d}{ds}\alpha(s) = k(s)\alpha(s) + \beta(s)

when \beta\equiv 0, we can easily solve the equation with integration factors

\displaystyle \alpha(s) = \alpha(0) e^{\int_0^s k(t) dt}

Using this as a sort of an ansatz, we can solve the inhomogeneous equation as follows. For convenience we denote by K(s) = \int_0^s k(t) dt the anti-derivative of k. Then multiplying Equation 13 through by \exp -K(s), we have that

Equation 14
\displaystyle \frac{d}{ds} \left( e^{-K(s)}\alpha(s)\right) = e^{-K(s)}\beta(s)

which we solve by integrating

Equation 15
\displaystyle \alpha(s) = e^{K(s)}\alpha(0) + e^{K(s)} \int_0^s e^{-K(t)}\beta(t) dt

If we write K(s;t) = \int_t^s k(u) du, then we can rewrite Equation 15 as given by an integral operator

Equation 15′
\displaystyle \alpha(s) = e^{K(s)}\alpha(0) + \int_0^s e^{K(s;t)}\beta(t) dt

Read the rest of this entry »

Decay of waves IIa: Minkowski background, homogeneous case

Now let us get into the mathematics. The wave equations that we will consider take the form

Equation 1
-\partial_t^2 \phi + \triangle \phi = F

where \phi:\mathbb{R}^{1+n}\to\mathbb{R} is a real valued function defined on (1+n)-dimensional Minkowski space that describes our solution, and F represents a “source” term. When F vanishes identically, we say that we are looking at the linear, homogeneous wave equation. When F is itself a function of \phi and its first derivatives, we say that the equation is a semilinear wave equation.

We first start with the homogeneous, linear case.

Homogeneous wave equation in one spatial dimension

One interesting aspect of the wave equation is that it only possesses the second, multidimensional, dispersive mechanism as described in my previous post. In physical parlance, the “phase velocity” and the “group velocity” of the wave equation are the same. And therefore, a solution of the wave equation, quite unlike a solution of the Schroedinger equation, will not exhibit decay when there is only one spatial dimension (mathematically this is one significant difference between relativistic and quantum mechanics). In this section we make a computation to demonstrate this, a fact that would also be useful later on when we look at higher (in particular, three) dimensions.

Use x\in\mathbb{R} for the variable representing spatial position. The wave equation can be written as

-\partial_t^2 \phi + \partial_x^2\phi = 0

Now we perform a change of variables: let u = \frac{1}{2}(t-x) and v = \frac{1}{2}(t+x) be the canonical null variables. The change of variable formula replaces

Equation 2
\displaystyle \partial_t \to \frac{\partial u}{\partial t} \partial_u + \frac{\partial v}{\partial t} \partial v = \frac{1}{2}\partial_u + \frac{1}{2}\partial_v
\displaystyle \partial_x \to \frac{\partial u}{\partial x} \partial_u + \frac{\partial v}{\partial x} \partial v = -\frac{1}{2}\partial_u + \frac{1}{2}\partial_v

and we get that in the (u,v) coordinate system,

Equation 3
-\partial_u \partial_v \phi = 0

Read the rest of this entry »

Decay of waves I: Introduction

In the next week or so, I will compose a series of posts on the heuristics for the decay of the solutions of the wave equation on curved (and flat) backgrounds. (I have my fingers crossed that this does not end up aborted like my series of posts on compactness.) In this first post I will give some physical intuition of why waves decay. In the next post I will write about the case of linear and nonlinear waves on flat space-time, which will be used to motivate the construction, in post number three, of an example space-time which gives an upper bound on the best decay that can be generally expected for linear waves on non-flat backgrounds. This last argument, due to Mihalis Dafermos, shows that why the heuristics known as Price’s Law is as good as one can reasonably hope for in the linear case. (In the nonlinear case, things immediately get much much worse as we will see already in the next post.)

This first post will not be too heavily mathematical, indeed, the only realy foray into mathematics will be in the appendix; the next ones, however, requires some basic familiarity with partial differential equations and pseudo-Riemannian geometry. Read the rest of this entry »

Why do Volvox spin?

For today’s High-Energy-Physics/General-Relativity Colloquium, we had a speaker whose research is rather far from the usual topics. Raymond Goldstein of DAMTP gave a talk on the physics of multicellular organisms, with particular focus (since the field is so broad and so new for most of the audience members) on the example of Volvox, a kind of green algae composed of spherical colonies of about 50,000 cells.

One of the very interesting things about them is that, if you look under a microscope (or even a magnifying glass! Each colony is about half a millimeter across, so you can even see them with the naked eye), they spin. (Yes, the Goldstein lab has its own YouTube channel.)

(The video also shows how their motion can be constrained by hydrodynamical bound states formed due to their individual spinning motion.)

Now, we have a pretty good idea of the very basic locomotive mechanism of these organisms. Each colony is formed with an exterior ball of “swimming” cells and some interior balls of “reproducing” cells. The swimming cells each have two flagella pointed outwards into the surrounding fluid. Their beating will give rise to the motion for the whole colony. But the strange thing is that they do not swim straight: the cells colonies tend to travel in one direction, will spinning with the axes aligned with the direction of travel. Why? Isn’t is inefficient to expand extra energy to spin all the time? This was a central question around which the presentation today was built.

Two main results were described in the talk today. First is a result about how the two flagella of each cell interacts. It was observed (some time ago) that, by direct observation under a microscope, the two flagella can exhibit three types of interaction. First is complete synchronisation: the two flagella beats in unison, like how a swimmer’s arms move when pulling the breaststroke. This is observed 85% of the time. Then there is “slippage”, where for some reason one flagellum is slips out-of-phase from the other briefly, and recovers after a while. This happens about 10% of the time. And lastly there is a completely lack of synchronisation when the two flagella beats with different frequencies for about 5% of the time. The original report on this surmised that this difference represents three different “types” of cells: since each observation is short in time, they didn’t observe much in terms of transitions from one type to the other. What was discovered more recently is that, in fact, the three behaviour all belong to the one single type of cells making up Volvox, and the transition is stochastic!

Now, why this may be surprising is the following: each flagellum is a mechanical beater and has some innate characteristic frequency at which it beats. So in an ideal, linear situation, the two independent flagella should not interact. And so there cannot be reinforcements of any type. Now, one may guess that since the two flagella are swimming in water, the hydrodynamics may serve as a medium for the interaction. However, a pure hydrodynamic interaction should lead to something like sympathy, a phenomenon first observed by Huygens. Basically, Huygens put two pendulum clocks on the same wall, and set their pendulums to be of arbitrary phase relative to each other. After some time, however, he discovers that invariably the clocks settle down to a state where the their pendulums are completely out of phase. This “tuning” is attributed to vibrations being passed along the supporting beams of the wall.

(One can do a similar experiment at home with a board, two metronomes, and two soda cans. A dramatic example is shown below.)

But the problem with the synchronisation theory is that it can only explain the 85% of the time occurrence of completely in phase swimming, but not the other 15%. The solution to this problem requires real consideration of the chaos on a molecular level. As it turns out, one of the force we have so far neglected is the force driving the flagella. This is dependent on the biochemical processes inside the cells. By considering the biochemical noise which contributes a stochastic forcing on the entire system, one can recover the other 15% of out-of-phase behaviour. (The noise is not thermal, as thermal noise should have much lower amplitude than required to cause the phenomena.)

The second beautiful result described in the talk was on the spinning of the colonies, and its relation to phototaxis, the attraction of the green algae to light. How are to two related? It is a quite magnificent feat of evolution. Now, in this colony of 50,000 cells, there is no central nervous system, so how do the cells coordinate their motion to swim toward the light? You cannot rely on chemical signals, or even hydrodynamical synchronisation, since the physical distance between the cells are typically larger than the cells themselves. The effects of this signalling would be too weak and too slow. It is more reasonable to expect the behaviour to be “crowd sourced” (for viewers of the Ghost in the Shell anime series, a cellular level of “stand alone complex”): each cell is programmed to behave in a certain way, and when taken as a whole, their joint behaviour gives rise to the desired response of the colony as a whole.

Well, at the level of the cell, what can they do? Each cell is equipped with a photosensing organelle. And like the classic Gary Larsen cartoon, each cell is really only capable of stimulus-response. Experimentally it was confirmed that each individual cell reacts to light. When a cell is initially “facing away” (the light-sensor is direction sensitive) from the light source, and turns to “see” the light, the stimulus would shock the cell into slowing down its flagella’s beating. After a very short while the cell gets used to the light, and the beating resumes in earnest. The reverse change from light to darkness, however, does not cause changes in the beating of the flagella.

And this explains the spinning of the Volvox! Imagine the colony swimming along, minding its own business, when suddenly light hits one side of the colony. The cells on the lit side slows down its flagella beating, and gradually recovers its beating as it rotates out of view of the light source. So the net effect of the spinning of the colony is that “new” cells kept being brought into view of the light, receive the shock, slows its flagella, and recovers as it “retreats into the night”, only to be shocked again “as sun rises the next day”. So the flagella beats more fervently on the dark side of the colony compared to the bright side, so, as anyone who has tried swimming one-armed would know, the colony will slowly turn toward the light source.

The best part about this process is that it is self-correcting. As the axis of rotation gets more and more aligned with the light source, more and more of the cells experience an “Alaskan summer” with the “sun” perpetually overhead. These cells that are not brought back into darkness no-longer receive the periodic shock that slows their flagella, and so swim equally as hard through the entire “day”, and therefore no longer contributes to turning. When the spin axis is perfectly aligned with the light source, the entire “northern hemisphere” is perpetually illuminated, while the “southern” is not, so until the light-source changes directions, the colony will cease to change directions and move straight toward the light source.

For this all to work, it requires that the spin rate of the colony be exactly the same as the rate at which the cells recover from the shock of seeing the light. And this is experimentally confirmed. (An interesting question brought up at the end is whether we can use this as a laboratory test for evolution: if we add some syrup or something to the water to make it more viscous, the spin rate will necessarily slow down. Then the original strand of Volvox will not be as effective at swimming toward the light. It would be interesting to see whether after a few hundred generations, a mutant strand evolves with slower recovery time from illumination.)

Shock singularities in Burgers’ equation

It is generally well known that partial differential equations that model fluid motion can exhibit “shock waves”. In fact, the subject I will write about today is generally presented as the canonical example for such behaviour in a first course in partial differential equations (while also introducing the method of characteristics). The focus here, however, will not be so much on the formation of shocks, but on the profile of the shock boundary. This discussion tends to be omitted from introductory texts.

Solving Burgers’ equation
First we recall the inviscid Burgers’ equation, a fundamental partial differential equation in the study of fluids. The equation is written

Equation 1. Inviscid Burgers’ equation
\displaystyle \frac{\partial}{\partial t} u  + u \frac{\partial}{\partial x} u = 0

where u = u(t,x) is the “local fluid velocity” at time t and at spatial coordinate x. The solution of the equation is closely related to its derivation: notice that we can re-write the equation as

v \cdot \nabla u = (\partial_t + u \partial_x) u = 0

The question we consider is the initial value problem for the PDE: given some initial velocity configuration u_0(x), we want to find a solution u(t,x) to Burgers’ equation such that u(0,x) = u_0(x).

The traditional way of obtaining a solution is via the method of characteristics. We first observe (1) the alternate form of the equation above means that if X(t) is a curve tangent to the vector field v = \partial_t + u\partial_x, we must have u(t,X(t)) be a constant valued function of the parameter t. (2) Plugging this back in implies that along such a curve X(t), the vector field v = \partial_t + u\partial_x = \partial_t + u_0 \partial_x is constant. (3) A curve whose tangent vector is constant is a straight line. So we have that a solution of the Burgers’ equation must verify

u(t, x + u_0(x) \cdot t) = u_0(x)

And we call the family of curves given by X_x(t) = x + u_0(x) \cdot t the characteristic curves of the solution.

To extract more qualitative information about Burgers’ equation, let us take another spatial derivative of the equation, and call the function w = \partial_x u. Then we have

\partial_t w + w^2 + u \partial_x w = 0 \implies v \cdot w + w^2 = 0

So letting X(t) be a characteristic curve, and write W(t) = w(t, X(t)), we have that along the characteristic curve

\displaystyle \frac{d}{dt}W = - W^2 \implies W(t) = \frac{1}{t+W(0)^{-1}}

So in particular, we see that if W(0) < 0, W(t) must blow up in time t \leq |W(0)|^{-1}.

Plot of divergent flow So what does this mean? We’ve seen that along characteristic lines, the value of u stays constant. But we’ve also seen that along those lines, the value of its spatial derivative can blow up if the initial slope is negative. Perhaps the best thing to do is to illustrate it with two pictures. In the pictures the thick, red curve is the initial velocity distribution u_0(x), shown with the black line representing the x-axis: so when the curve is above the axis, initially the local fluid velocity is positive, and the fluid is moving to the right. The blue curves are the characteristic lines. In the first image to the right, we see that the initial velocity distribution is such that the velocity is increasing to the right. And so w(0,x) is always positive. We see that in this situation the flow is divergent, the flow lines getting further and further apart, corresponding to the solution where w(t,x) gets smaller and smaller along a flow line. For the second image here on our left, the situation is different. The initial velocity distribution starts out increasing, then hits a maximum, dips down to a minimum, and finally increases again. In the regions where the velocity distribution is increasing, we see the same “spreading out” behaviour as before, with the flow lines getting further and further apart (especially in the upper left region). But for flowlines originating in the region where the velocity distribution is decreasing, those characteristic curves gets bunched together as time goes on, eventually intersecting! This intersection is what is known as a shock. From the picture, it becomes clear what the blow-up of W(t) means: Suppose the initial velocity distribution is such that for two points x_1  u_0(x_2). Since the flow line originating from x_1 is moving faster, it will eventually catch up to the the flow line originating from x_2. When the two flow lines intersect, we have a problem: if we follow the flow line from x_1, the function u must take the value u_0(x_1) at the point; but if we follow the flow line from x_2, the function must take the value u_0(x_2) at the point. So we cannot consistently assign a value to the function u at the points of intersection for flow-lines in a way that satisfies Burgers’ equation.

Another way of thinking about this difficulty is in terms of particle dynamics. Imagine the line being a highway, and points on it being cars. The dynamics of the traffic flow described by Burgers’ equation is one in which each driver starts at one speed (which can be in reverse), and maintains that speed completely without regard for the cars in front of or behind it. If we start out with a distribution where the leading cars always drive faster than the trailing ones, then the cars will spread further apart as time goes on. But if we start out with a distribution where a car in front is driving slower than a car behind, the second car will eventually catch up and crash into the one in front. And this is the formation of the shock wave.

(Now technically, in this view, once the two cars crash their flow-lines should end, and so cars that are in front of the collision and moving forward should not be affected by the collision at all. But if we imagine that instead of real cars, we are driving bumper cars, so after a collision, the car in front maintains speed at the velocity of the car that hit it, while the car in back drives at the velocity of the car it hit [so the they swap speeds in an elastic collision], then we have something like the picture plotted above.)

Shock boundary
Having established that shocks can form, we move on to the main discussion of this post: the geometry of the set of shock singularities. We will consider the purely local effects of the shocks; by which we mean that we will ignore the chain reactions as described in the parenthetical remark above. Therefore we will assume that at the formation of the shock, the flow-lines terminate and the particles they represent disappear. In other words, we will consider only shocks coming from nearest neighbor collisions. In this scenario, the time of existence of a characteristic line is precisely governed by the equation on W we derived before: that is given u_0(x), the characteristic line emanating from x = x_0 will run into the shock precisely at the time t = - \frac{1}{\partial_x u_0(x_0)}. (It will continue indefinitely in the future if the derivative is positive.)

The most well-known image of a shock formation is the image on the right, where we see the classic fan/wedge type shock. (Due to the simplicity in sketching thie diagram by hand, this is probably how most people are introduced to this type of diagrams, either on a homework set or in class.) What we see here is an illustration of the fact that

If for x_1 < x < x_2, we have \partial^2_{xx} u_0(x) = 0, and \partial_x u_0(x) < 0, then the shock boundary is degenerate: it consists of a single focal point.

To see this analytically: observe that because the blow-up time depends on the first derivative of the initial velocity distribution, for such a set-up the blow-up time t_0 = - (\partial_x u_0)^{-1} is constant for the various points. Then we see that the spatial coordinate of the blow-up will be x + u_0(x) t_0. But since u_0(x) is linear in x, we have

\displaystyle x + u_0(x) t_0 = x_1 + (x-x_1) + u_0(x_1)t_0 + \partial_xu_0 \cdot (x - x_1) t_0 = x_1 + u_0(x_1) t_0

is constant. And therefore the shock boundary is degenerate.


Next we consider the case where \partial^2_{xx} u_0 vanishes at some point x_0, but \partial^3_{xxx}u_0(x_0) \neq 0. The two pictures to the right of this paragraph illustrates the typical shock boundary behaviour. On the far right we have the slightly aphysical situation: notice that for a particle coming in from the left, before it hits its shock boundary, it first crosses the shock boundary formed by the particles coming in from the right. This is the situation where the third derivative is positive, and the cusp point which corresponds to the shock boundary for x_0 opens to the future. The nearer picture is the situation where the third derivative is negative, with the cusp point opening downwards. Notice that since we are in a neighborhood of a point where the second derivative vanishes, the initial velocity distributions both look almost straight, and it is hard to distinguish from this image the sign of the third derivative. The picture on the far right is based on an arctan type initial distribution, whereas the nearer picture is based on an x^3 type initial distribution. Let us again analyse the situation more deeply. Near the point x_0, we shall assume that \partial^3_{xxx}u_0 \sim \partial^3_{xxx}u_0(x_0) = C for some constant. And we will assume, using Galilean transformations, that u_0(x_0) = 0 = x_0. Then letting t_0 = - (\partial_x u_0(x_0))^{-1}, we have

\displaystyle u_0(x) = \frac{C}{6} x^3 - \frac{1}{t_0} x

Thus as a function of x, the blow-up times of flow lines are given by

\displaystyle t(x) = \frac{t_0}{1 - \frac{C}{2}t_0 x^2}

Solving for their blow-up profile y = x + u_0(x) t(x) then gives (after quite a bit of algebraic manipulation)

\displaystyle \frac{ (\frac{t}{t_0} - 1)^3}{t} = \frac{9C}{8} y^2

which can be easily seen to be a cusp: \frac{dy}{dt} = 0 at y=0, t = t_0. And it is clear that the side the cusp opens is dependent on the sign of the third derivative, C.

The last bit of computation we will do is for the case D = \partial^2_{xx}u_0(x) \neq 0. In this case we can take

\displaystyle u_0(x) = - \frac{1}{t_0}x + \frac{D}{2} x^2

as an approximation. Then the blowup times will be

\displaystyle t(x) = \frac{t_0}{1 - D t_0 x}

which leads to the blowup profile y being [Thanks to Huy for the correction.]

\displaystyle y = -\frac{1}{2Dt} \left( 1 - \frac{t}{t_0}\right)^2

and a direct computation will then lead to the conclusion that in this generic scenario, the shock boundary will be everywhere tangent to the flow-line that ends there.