Bubbles Bad; Ripples Good

… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

Category: Maths

Heat ball

There are very few things I find unsatisfactory in L.C. Evans’ wonderful textbook on Partial Differential Equations; one of them is the illustration (on p.53 of the second edition) of the “heat ball”.

The heat ball is a region with respect to which an analogue of the mean value property of solutions to Laplace’s equation can be expressed, now for solutions of the heat equation. In the case of the Laplace’s equation, the regions are round balls. In the case of the heat equation, the regions are somewhat more complicated. They are defined by the expression

\displaystyle E(x,t;r) := \left\{ (y,s)\in \mathbb{R}^{n+1}~|~s \leq t, \Phi(x-y, t-s) \geq \frac{1}{r^n} \right\}

where \Phi is the fundamental solution of the heat equation

\displaystyle \Phi(x,t) := \frac{1}{(4\pi t)^{n/2}} e^{- \frac{|x|^2}{4t}}.

In the expressions above, the constant n is the number of spatial dimensions; r is the analogue of the radius of the ball, and in E(x,t;r), the point (x,r) is the center. Below is a better visualization of the heat balls: the curves shown are the boundaries \partial E(0,5;r) in dimension n = 1, for radii between 0.75 and 4 in steps of 0.25 (in particular all the red curves have integer radii). In higher dimensions the shape is generally the same, though they appear more “squashed” in the t direction.

1-dimensional heat balls centered at (0,5) for various radii. (Made using Desmos)


A characterization of geodesics

Joseph O’Rourke’s question at MathOverflow touched on an interesting characterization of geodesics in pseudo-Riemannian geometry, which was apparently originally due to Einstein, Infeld, and Hoffmann in their analysis of the geodesic hypothesis in general relativity. (One of my two undergraduate junior theses is on this topic, but I certainly did not appreciate this result as much when I was younger.) Sternberg’s book has a very good presentation on the theorem, but I want to try to give a slightly different interpretation in this post.

Geodesics and variation
One of the classical formulation of the criterion for a curve to be geodesic is that it is a stationary point of the length functional. Let (M,g) be a Riemannian manifold, and let $latex: \gamma:[0,1]\to M$ be a C^1 mapping. Define the length functional to be
\displaystyle L: \gamma \mapsto \int_0^1 \sqrt{g_{ab}(\gamma(s)) \dot{\gamma}^a(s) \dot{\gamma}^b(s)} ~\mathrm{d}s.
A geodesic then is a curve \gamma that is a critical point of L under perturbations that fix the endpoints $\latex \gamma(0)$ and \gamma(1).

One minor annoyance about the length functional L is that it is invariant under reparametrization of \gamma, and so it does not admit unique solutions. One way to work around this is to instead consider the energy functional (which also has the advantage of also being easily generalizable to pseudo-Riemannian manifolds)
\displaystyle E: \gamma \mapsto \int_0^1 g_{ab}(\gamma(s)) \dot{\gamma}^a(s) \dot{\gamma}^b(s) ~\mathrm{d}s.
It turns out that critical points of the energy functional are always critical points of the length functional. Furthermore, the energy functional has some added convexity: a curve is a critical point of the energy functional if it is a geodesic and that it has constant speed (in the sense that g_{ab} \dot{\gamma}^a \dot{\gamma}^b is independent of the parameter s).

The standard way to analyze the variation of E is by first fixing a coordinate system \{ x^1, x^2, \ldots, x^n\}. Writing the infinitesimal perturbation as \delta \gamma, we can compute the first variation of E:
\displaystyle \delta E[\gamma] \approx \int_0^1 \partial_c g_{ab}(\gamma) \cdot \delta\gamma^c \cdot  \dot{\gamma}^a \dot{\gamma}^b + 2 g_{ab}(\gamma) \cdot \dot{\gamma}^a \cdot \dot{\delta\gamma}^b ~\mathrm{d}s.
Integrating the second term by parts we recover the familiar geodesic equation in local coordinates.

There is a second way to analyze the variation. Using the diffeomorphism invariance, we can imagine instead of varying \gamma while fixing the manifold, we can deform the manifold M while fixing the curve \gamma. From the point of view of the energy functional the two should be indistinguishable. Consider the variation \delta\gamma, which can be regarded as a vector field along \gamma which vanishes at the two end points. Let V be a vector field on M that extends \delta \gamma. Then the infinitesimal variation of moving the curve in the direction \delta \gamma should be reproducible by flowing the manifold by V and pulling back the metric. To be more precise, let \phi_\tau be the one parameter family of diffeomorphisms generated by the vector field V, the first variation can be analogously represented as
\displaystyle \frac{1}{\tau} \lim_{\tau\to 0} \int_0^1 \left[(\phi_\tau^* g)_{ab}(\gamma) - g_{ab}(\gamma)\right] \dot{\gamma}^a \dot{\gamma}^b ~\mathrm{d}s
By the definition of the Lie derivative we get the following characterizing condition for a geodesic:

A curve \gamma is an affinely parametrized geodesic if and only if for every vector field V vanishing near \gamma(0) and \gamma(1), the integral
\displaystyle \int_0^1  (L_V g)_{ab} \dot{\gamma}^a \dot{\gamma}^b ~\mathrm{d}s = 0

Noticing that (L_Vg)_{ab} = \nabla_a V_b + \nabla_b V_a, where \nabla is the Levi-Civita connection, we have that the above integral condition is equivalent to requiring
\displaystyle \int_0^1 \langle \nabla_{\dot{\gamma}} V, \dot{\gamma}\rangle_g ~\mathrm{d}s = 0.
Using the boundary conditions and integrating by parts we see this also gives us, without passing through the local coordinate formulation, the geodesic equation
\displaystyle \nabla_{\dot{\gamma}} \dot{\gamma} = 0.

The Einstein-Infeld-Hoffmann theorem
The EIH theorem reads:

Theorem (EIH)
A curve \gamma is geodesic if and only if there exists a non-vanishing contravariant symmetric two tensor \Xi along \gamma such that for every vector field V vanishing near \gamma(0) and \gamma(1), the integral
\displaystyle \int_{\gamma} (L_V g)_{ab} \Xi^{ab} ~\mathrm{d}\sigma \equiv 0
(where \mathrm{d}\sigma is the induced length measure on \gamma).

The EIH theorem follows immediately from the discussion in the previous section and the following lemma.

A contravariant symmetric two tensor \Xi that satisfies the assumptions in the previous theorem must be proportional to \dot{\gamma}\otimes\dot{\gamma}.

Proof: Choose an orthonormal frame along \gamma for (M,g) such that e_n is tangent to \gamma. Write \hat{\Xi} = \Xi - \langle \Xi, e_n \otimes e_n\rangle e_n \otimes e_n. Suppose \hat{\Xi} \neq 0. Then there exists a vector field W such that W|_{\gamma}= 0 and the symmetric part of \nabla W is equal to \hat{\Xi}. (We can construct W by choosing a local coordinate system in a tubular neighborhood of \gamma such that \partial_i|_{\gamma} = e_i. Then W can be prescribed by its first order Taylor expansion in the normal direction to \gamma.) Let \psi be a non-negative cut-off function and setting V = \psi W we note that \nabla V |_{\gamma} = \psi \nabla W since W vanishes along \gamma. Therefore we have that the desired integral condition cannot hold. q.e.d.


  • Shlomo Sternberg, Curvature in Mathematics and Physics
  • Einstein, Infeld, Hoffmann, “Gravitational Equations and the Problem of Motion”

In defense of integration by parts

A prominent academic, who happens not to be a mathematician, visited my home institution recently and gave a public address about the role of the university in the modern world. Most of what he said concerning our teaching mission are the usual platitudes about not being stuck in the past and making sure that our curricular content and learning objectives are aligned with what we would expect a 21st century college graduates to need.

It however bugged me to no end that the recurring example this particular individual returns to for something old-fashioned and “ought not be taught” is integration by parts; and he justifies this by mentioning that computer algebra systems (or even just google) can do the integrals faster and better than we humans can.

I don’t generally mind others cracking jokes at mathematicians’ expense. But this particular self-serving strawman uttered by so well-regarded an individual is, to those of us actually in the field teaching calculus to freshmen and sophomores, very damaging and disingenuous.

I happened to have just spent the entirety of last year rethinking how we can best teach calculus to the modern engineering majors. Believe me, students nowadays know perfectly well when we are just asking them to do busywork; they also know perfectly well that computer algebra systems are generally better at finding closed-form integral expressions than we can. Part of the challenge of the redesign that I am involved in is precisely to convince the students that calculus is worth learning in spite of computers. The difficulty is not in dearth of reason; on the contrary, there are many good reasons why a solid grounding of calculus is important to a modern engineering students. To give a few examples:

  1. Taylor series are in fact important because of computers, since they provide a method of compactly encoding an entire function.
  2. Newton’s method for root finding (and its application to, say, numerical optimization) is build on a solid understanding of differential calculus.
  3. The entirety of the finite element method of numerical simulation, which underlies a lot of civil and mechanical engineering applications, are based on a variational formulation of differential equations that, guess what, only make sense when one understand integration by parts.
  4. The notion of Fourier transform which is behind a lot of signal/image processing requires understanding how trigonometric functions behave under integration.

No, the difficulty for me and my collaborator is narrowing down a list of examples that we can not only reasonably explain to undergraduate students, but also have them have some hands-on experience working with.

When my collaborator and I were first plunged into this adventure of designing engineering-specific calculus material, one of the very first things that we did was to seek out inputs from our engineering colleagues. My original impulse was to cut some curricular content in order to give the students a chance to develop deeper understanding of fewer topics. To that end I selected some number of topics which I thought are old-fashioned, out-dated, and no longer used in this day and age. How wrong I was! Even something like “integration by partial fractions” which most practicing mathematicians will defer to a computer to do has its advocates (those who have to teach control theory insists that a lot of fundamental examples in their field can be reduced to evaluating integrals of rational functions, and a good grasp of how such integrals behave is key to developing a general sense of how control theory works).

In short, unlike some individuals will have you believe, math education is not obsolete because we all have calculators. In fact, I would argue the opposite: math education is especially pertinent now that we all have calculators. Long gone was the age where a superficial understanding of mathematics in terms of its rote computations is a valuable skill. A successful scientist or engineer needs to be able to effectively leverage the large toolbox that is available to her, and this requires a much deeper understanding of mathematics, one that goes beyond just the how but also the what and the why.

There are indeed much that can be done to better math education for the modern student. But one thing that shouldn’t be done is getting rid of integration by parts.

A better estimate of Kempner’s series

The Kempner series recently regained some notoriety due to a Saturday Morning Breakfast Cereal comic (the last panel). The observation first appeared in a 1914 American Mathematical Monthly article, in which it was shown that the series consisting of the usual harmonic series

\displaystyle \sum_{n = 1}^{\infty} \frac{1}{n}

but with all the terms, whose decimal expansion includes the digit ‘9’, removed, in fact converges to some number below 80. The original proof is given in the Wikipedia article linked above, so I will not repeat it. But to make it easier to see the idea: let us first think about the case where the number is expressed in base 2. In base 2, all the positive integers has the leading binary bit being 1 (since it cannot be zero). Therefore there are no binary positive numbers without the bit ‘1’ in its expansion. So the corresponding series converges trivially to zero. How about the case of the bit ‘0’? The only binary numbers without any ‘0’ bits are

1, 3 = (11)_2, 7 = (111)_2, 15 = (1111)_2, \ldots, 2^n - 1.

So the corresponding series actually becomes

\displaystyle \sum_{n = 1}^\infty \frac{1}{2^n - 1} \leq \sum_{n = 1}^\infty \frac{1}{2^{n-1}} = 2

So somewhere from the heavily divergent harmonic series, we pick up a rapidly converging geometric series. So what’s at work here? Among all the n-bit binary numbers, exactly 1 has all bits not being 0. So the density of these kinds of numbers decays rather quickly: in base 2, there are 2^{n-1} numbers that are exactly n-bit long. So if a number x has a binary representation that is exactly n bits long (which means that 2^{n} \leq x < 2^{n+1}), the chances that it is one of the special type of numbers is \frac{1}{2^{n-1}} \approx \frac{2}{x}. This probability we can treat then as a density: replacing the discrete sum \sum \frac{1}{n} by the integral \int \frac{1}{x}\mathrm{d}x (calculus students may recognize this as the germ of the “integral test”) and replacing the \mathrm{d}x by the density \frac{2}{x} \mathrm{d}x, we get the estimate

\displaystyle \text{Binary Kempner series} \approx \int_1^\infty \frac{2}{x^2} = 2.

Doing the same thing with the original Kempner series gives that the chances a n-digit number does not contain the digit nine to be

\displaystyle \left(\frac89\right)\left(\frac9{10}\right)^{n-1} \approx \left( \frac{9}{10}\right)^{n}

The length of the decimal expansion of a natural number x is basically 1 + \log x. So the density we are interested in becomes

\displaystyle \left( \frac{9}{10}\right)^{1+\log x} ~\mathrm{d}x

From this we can do an integral estimate

\displaystyle \text{Kempner series} \approx 0.9 \times \int_1^\infty \left( \frac{9}{100}\right)^{\log x} ~\mathrm{d}x

The integral can be computed using that

\displaystyle a^{\log b} = b^{\log a}

to get

\displaystyle 0.9 \times \int_1^{\infty} \left( \frac{9}{100}\right)^{\log x} ~\mathrm{d}X = 0.9\times \int_1^\infty x^{\log 9 - 2} ~\mathrm{d}x = \frac{0.9}{1 - \log 9} \approx 19.66

Notice that this estimate is much closer to the currently known value of roughly 22.92 than to the original upper bound of 80 computed by Kempner.

Kempner’s estimate is a heavy overestimate because he performed a summation replacing every n-digit long number that does not contain the digit 9 by 10^{n-1}; this number can be many times (up to 9) times smaller than the original number. Our estimate is low because among the n-digit long numbers, the numbers that do not contain the digit 9 are not evenly distributed: they tend to crowd in the front rather than in the back (in fact, we do not allow them to crowd in the back because none of the numbers that start with the digit 9 is admissible). So if in the original question we had asked for numbers that do not contain the digit 1, then our computation will give an overestimate instead since these numbers tend to crowd to the back.

Simulating closed cosmic strings (or yet another adventure in Julia)

One of my current research interests is in the geometry of constant-mean-curvature time-like submanifolds of Minkowski space. A special case of this is the study of cosmic strings under the Nambu-Goto action. Mathematically the classical (as opposed to quantum) behavior of such strings are quite well understood, by combining the works of theoretical physicists since the 80s (especially those of Jens Hoppe and collaborators) together with recent mathematical developments (such as the work of Nguyen and Tian and that of Jerrard, Novaga, and Orlandi). To get a better handle on what is going on with these results, and especially to provide some visual aids, I’ve tried to produce some simulations of the dynamics of closed cosmic strings. (This was also an opportunity for me to practice code-writing in Julia and learn a bit about the best practices for performance and optimization in that language.)

The code

After some false starts, here are some reasonably stable code.

function MC_corr3!(position::Array{Float64,2}, prev_vel::Array{Float64,2}, next_vel::Array{Float64,2}, result::Array{Float64,2}, dt::Float64)
  # We will assume that data is stored in the format point(coord,number), so as a 3x1000 array or something. 
  num_points = size(position,2)
  num_dims = size(position,1)

  curr_vel = zeros(num_dims)
  curr_vs = zeros(num_dims)
  curr_ps = zeros(num_dims)
  curr_pss = zeros(num_dims)

  pred_vel = zeros(num_dims)
  agreement = true

  for col = 1:num_points  #Outer loop is column
    if col == 1
      prev_col = num_points
      next_col = 2
    elseif col == num_points
      prev_col = num_points - 1
      next_col = 1
      prev_col = col -1
      next_col = col + 1

    for row = 1:num_dims
      curr_vel[row] = (next_vel[row,col] + prev_vel[row,col])/2
      curr_vs[row] = (next_vel[row,next_col] + prev_vel[row,next_col] - next_vel[row,prev_col] - prev_vel[row,prev_col])/4
      curr_ps[row] = (position[row,next_col] - position[row,prev_col])/2
      curr_pss[row] = position[row,next_col] + position[row,prev_col] - 2*position[row,col]

    beta = (1 + dot(curr_vel,curr_vel))^(1/2)
    sigma = dot(curr_ps,curr_ps)
    psvs = dot(curr_ps,curr_vs)
    bvvs = dot(curr_vs,curr_vel) / (beta^2)
    pssps = dot(curr_pss,curr_ps)

    for row in 1:num_dims
      result[row,col] = curr_pss[row] / (sigma * beta) - curr_ps[row] * pssps / (sigma^2 * beta) - curr_vel[row] * psvs / (sigma * beta) - curr_ps[row] * bvvs / (sigma * beta)
      pred_vel[row] = prev_vel[row,col] + dt * result[row,col]

    agreement = agreement && isapprox(next_vel[:,col], pred_vel, rtol=sqrt(eps(Float64)))

  return agreement

function find_next_vel!(position::Array{Float64,2}, prev_vel::Array{Float64,2}, next_vel::Array{Float64,2}, dt::Float64; max_tries::Int64=50)
  tries = 1
  result = zeros(next_vel)
  agreement = MC_corr3!(position,prev_vel,next_vel,result,dt)
  for j in 1:size(next_vel,2), i in 1:size(next_vel,1)
    next_vel[i,j] = prev_vel[i,j] + result[i,j]*dt
  while !agreement && tries < max_tries
    agreement = MC_corr3!(position,prev_vel,next_vel,result,dt)
    for j in 1:size(next_vel,2), i in 1:size(next_vel,1)
      next_vel[i,j] = prev_vel[i,j] + result[i,j]*dt
    tries +=1
  return tries, agreement

This first file does the heavy lifting of solving the evolution equation. The scheme is a semi-implicit finite difference scheme. The function MC_Corr3 takes as input the current position, the previous velocity, and the next velocity, and computes the correct current acceleration. The function find_next_vel iterates MC_Corr3 until the computed acceleration agrees (up to numerical errors) with the input previous and next velocities.

Or, in notations:
MC_Corr3: ( x[t], v[t-1], v[t+1] ) --> Delta-v[t]
and find_next_vel iterates MC_Corr3 until
Delta-v[t] == (v[t+1] - v[t-1]) / 2

The code in this file is also where the performance matters the most, and I spent quite some time experimenting with different algorithms to find one with most reasonable speed.

function make_ellipse(a::Float64,b::Float64, n::Int64, extra_dims::Int64=1)  # a,b are relative lengths of x and y axes
  s = linspace(0,2π * (n-1)/n, n)
  if extra_dims == 0
    return vcat(transpose(a*cos(s)), transpose(b*sin(s)))
  elseif extra_dims > 0
    return vcat(transpose(a*cos(s)), transpose(b*sin(s)), zeros(extra_dims,n))
    error("extra_dims must be non-negative")

function perturb_data!(data::Array{Float64,2}, coeff::Vector{Float64}, num_modes::Int64) 
  # num_modes is the number of modes
  # coeff are the relative sizes of the perturbations

  numpts = size(data,2)

  for j in 2:num_modes
    rcoeff = rand(length(coeff),2)

    for pt in 1:numpts
      theta = 2j * π * pt / numpts
      for d in 1:length(coeff)
        data[d,pt] += ( (rcoeff[d,1] - 0.5) *  cos(theta) + (rcoeff[d,2] - 0.5) * sin(theta)) * coeff[d] / j^2


This file just sets up the initial data. Note that in principle the number of ambient spatial dimensions is arbitrary.

using Plots

pyplot(size=(1920,1080), reuse=true)

function plot_data2D(filename_prefix::ASCIIString, filename_offset::Int64, titlestring::ASCIIString, data::Array{Float64,2}, additional_data...)
  x_max = 1.5
  y_max = 1.5
  plot(transpose(data)[:,1], transpose(data)[:,2] , xlims=(-x_max,x_max), ylims=(-y_max,y_max), title=titlestring)
  if length(additional_data) > 0
    for i in 1:length(additional_data)
      plot!(transpose(additional_data[i][1,:]), transpose(additional_data[i][2,:]))

function plot_data3D(filename_prefix::ASCIIString, filename_offset::Int64, titlestring::ASCIIString, data::Array{Float64,2}, additional_data...)
  x_max = 1.5
  y_max = 1.5
  z_max = 0.9
  tdata = transpose(data)
  plot(tdata[:,1], tdata[:,2],tdata[:,3], xlims=(-x_max,x_max), ylims=(-y_max,y_max),zlims=(-z_max,z_max), title=titlestring)

  if length(additional_data) > 0
    for i in 1:length(additional_data)
      tdata = transpose(additional_data[i])
      plot!(tdata[:,1], tdata[:,2], tdata[:,3]) 


This file provides some wrapper commands for generating the plots.


num_pts = 3000
default_timestep = 0.01 / num_pts
max_time = 3
plot_every_ts = 1500

my_data = make_ellipse(1.0,1.0,num_pts,0)
perturb_data!(my_data, [1.0,1.0], 15)
this_vel = zeros(my_data)
next_vel = zeros(my_data)

for t = 0:floor(Int64,max_time / default_timestep)
  num_tries, agreement = find_next_vel!(my_data, this_vel,next_vel,default_timestep)

  if !agreement
    warn("Time $(t*default_timestep): failed to converge when finding next_vel.")
    warn("Dumping information:")
    max_beta = 1.0
    max_col = 1
    for col in 1:size(my_data,2)
      beta = (1 + dot(next_vel[:,col], next_vel[:,col]))^(1/2)
      if beta > max_beta
        max_beta = beta
	max_col = col
    warn("   Beta attains maximum at position $max_col")
    warn("   Beta = $max_beta")
    warn("   Position = ", my_data[:,max_col])
    prevcol = max_col - 1
    nextcol = max_col + 1
    if max_col == 1
      prevcol = size(my_data,2)
    elseif max_col == size(my_data,2)
      nextcol = 1
    warn("   Deltas")
    warn("    Left:  ", my_data[:,max_col] - my_data[:,prevcol])
    warn("    Right:  ", my_data[:,nextcol] - my_data[:,max_col])
    warn("   Previous velocity: ", this_vel[:,max_col])
    warn("   Putative next velocity: ", next_vel[:,max_col])

  for col in 1:size(my_data,2)
    beta = (1 + dot(next_vel[:,col], next_vel[:,col]))^(1/2)
    for row in 1:size(my_data,1)
      my_data[row,col] += next_vel[row,col] * default_timestep / beta
      this_vel[row,col] = next_vel[row,col]

    if beta > 1e7
      warn("time: ", t * default_timestep)
      warn("Almost null... beta = ", beta)
      warn("current position = ", my_data[:,col])
      warn("current Deltas")
      prevcol = col - 1
      nextcol = col + 1
      if col == 1
        prevcol = size(my_data,2)
      elseif col == size(my_data,2)
        nextcol = 1
      warn(" Left: ", my_data[:,col] - my_data[:,prevcol])
      warn(" Right: ", my_data[:,nextcol] - my_data[:,col])

  if t % plot_every_ts ==0
    plot_data2D("3Dtest", div(t,plot_every_ts), @sprintf("elapsed: %0.4f",t*default_timestep), my_data, make_ellipse(cos(t*default_timestep), cos(t*default_timestep),100,0))
    info("Frame $(t/plot_every_ts):  used $num_tries tries.")

And finally the main file. Mostly it just ties the other files together to produce the plots using the simulation code; there are some diagnostics included for me to keep an eye on the output.

The results

First thing to do is to run a sanity check against explicit solutions. In rotational symmetry, the solution to the cosmic string equations can be found analytically. As you can see below the simulation closely replicates the explicit solution in this case.

The video ends when the simulation stopped. The simulation stopped because a singularity has formed; in this video the singularity can be seen as the collapse of the string to a single point.

Next we can play around with a more complicated initial configuration.

In this video the blue curve is the closed cosmic string, which starts out as a random perturbation of the circle with zero initial speed. The string contracts with acceleration determined by the Nambu-Goto action. The simulation ends when a singularity has formed. It is perhaps a bit hard to see directly where the singularity happened. The diagnostic messages, however, help in this regard. From it we know that the onset of singularity can be seen in the final frame:


The highlighted region is getting quite pointy. In fact, that is accompanied with the “corner” picking up infinite acceleration (in other words, experiencing an infinite force). The mathematical singularity corresponds to something unreasonable happening in the physics.

To make it easier to see the “speed” at which the curve is moving, the following videos show the string along with its “trail”. This first one again shows how a singularity can happen as the curve gets gradually more bent, eventually forming a corner.

This next one does a good job emphasizing the “wave” nature of the motion.

The closed cosmic strings behave like a elastic band. The string, overall, wants to contract to a point. Small undulations along the string however are propagated like traveling waves. Both of these tendencies can be seen quite clearly in the above video. That the numerical solver can solve “past” the singular point is a happy accident; while theoretically the solutions can in fact be analytically continued past the singular points, the renormalization process involved in this continuation is numerically unstable and we shouldn’t be able to see it on the computer most of the time.

The next video also emphasizes the wave nature of the motion. In addition to the traveling waves, pay attention to the bottom left of the video. Initially the string is almost straight there. This total lack of curvature is a stationary configuration for the string, and so initially there is absolutely no acceleration of that segment of the string. The curvature from the left and right of that segment slowly intrudes on the quiescent piece until the whole thing starts moving.

The last video for this post is a simulation when the ambient space is 3 dimensional. The motion of the string, as you can see, becomes somewhat more complicated. When the ambient space is 2 dimensional a point either accelerates or decelerates based on the local (signed) curvature of the string. But when the ambient space is 3 dimensional, the curvature is now a vector and this additional degree of freedom introduces complications into the behavior. For example, when the ambient space is 2 dimensional it is known that all closed cosmic strings become singular in finite time. But in 3 dimensions there are many closed cosmic strings that vibrate in place without every becoming singular. The video below is one that does however become singular. In addition to a fading trail to help visualize the speed of the curve, this plot also includes the shadows: projections of the curve onto the three coordinate planes.

Riemann-, Generalized-Riemann-, and Darboux-Stieltjes integrals

(The following is somewhat rough and may have typos.)

Let us begin by setting the notations and recalling what happens without the Stieltjes part.

Defn (Partition)
Let I be a closed interval. A partition P is a finite collection of closed subintervals \{I_\alpha\} such that

  1. P is finite;
  2. P covers I, i.e. \cup P = I;
  3. P is pairwise almost disjoint, i.e. for I_\alpha, I_\beta distinct elements of P, their intersection contains at most one point.

We write \mathscr{P} for the set of all partitions of I.

Defn (Refinement)
Fix I a closed interval, and P, Q two partitions. We say that P refines Q or that P \preceq Q if for every I_\alpha\in P there exists J_\beta \in Q such that I_\alpha \subseteq J_\beta.

Defn (Selection)
Given I a closed interval and P a partition, a selection \sigma: P \to I is a mapping that satisfies \sigma(I_\alpha) \in I_\alpha.

Defn (Size)
Given I a closed interval and P a partition, the size of P is defined as |P| = \sup_{I_\alpha \in P} |I_\alpha|, where |I_\alpha| is the length of the closed interval I_\alpha.

Remark In the above we have defined two different preorders on the set \mathscr{P} of all partitions. One is induced by the size: we say that P \leq Q if |P| \leq |Q|. The other is given by the refinement P\preceq Q. Note that neither are partial orders. (But that the preorder given by refinement can be made into a partial order if we disallow zero-length degenerate closed intervals.) Note also that if P\preceq Q we must have P \leq Q.

Now we can define the notions of integrability.

Defn (Integrability)
Let I be a closed, bounded interval and f:I \to \mathbb{R} be a bounded function. We say that f is integrable with integral s in the sense of

  • Riemann if for every \epsilon > 0 there exists P_0\in \mathcal{P} such that for every P \leq P_0 and every selection \sigma:P \to I we have
    \displaystyle \left| \sum_{I' \in P} f(\sigma(I')) |I'| - s \right| < \epsilon

  • Generalised-Riemann if for every \epsilon > 0 there exists P_0 \in \mathcal{P} such that for every P \preceq P_0 and every selection \sigma: P\to I we have
    \displaystyle \left| \sum_{I' \in P} f(\sigma(I')) |I'| - s \right| < \epsilon

  • Darboux if
    \displaystyle \inf_{P\in\mathscr{P}} \sum_{I' \in P} (\sup_{I'} f )|I'| = \sup_{P\in\mathscr{P}} \sum_{I' \in P} (\inf_{I'} f )|I'| = s

From the definition it is clear that “Riemann integrable” implies “Generalised-Riemann integrable”. Furthermore, we have clearly that for a fixed P
\displaystyle \sum_{I' \in P} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} f(\sigma(I')) |I'| \leq \sum_{I' \in P} (\sup_{I'} f) |I'|
and that if P \preceq Q we have
\displaystyle \sum_{I' \in Q} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} (\inf_{I'} f) |I'| \leq \sum_{I' \in P} (\sup_{I'} f) |I'| \leq \sum_{I' \in Q} (\inf_{I'} f) |I'|
so “Darboux integrable” also implies “Generalised-Riemann integrable”. A little bit more work shows that “Generalised-Riemann integrable” also implies “Darboux integrable” (if the suprema and infima are obtained on the intervals I', this would follow immediately; using the boundedness of the intervals we can find \sigma such that the Riemann sum approximates the upper or lower Darboux sums arbitrarily well.

The interesting part is the following
Darboux integrable functions are Riemann integrable. Thus all three notions are equivalent.

Proof. Let P, Q be partitions. Let |P| \leq \inf_{I'\in Q, |I'| \neq 0} |I'|, and let m be the number of non-degenerate subintervals in Q. We have the following estimate
\displaystyle   \sum_{I'\in Q} (\inf_{I'} f) |I'| - (m-1) |P| (\sup_I 2|f|) \leq \sum_{J'\in P} f(\sigma(J')) |J'| \leq \sum_{I'\in Q} (\sup_{I'} f) |I'| + (m-1) |P| (\sup_I 2|f|)
The estimate follows by noting that “most” of the J'\in P will be proper subsets of I'\in Q, and there can be at most m-1 of the J' that straddles between two different non-degenerate sub-intervals of Q. To prove the theorem it suffices to choose first a Q such that the upper and lower Darboux sums well-approximates the integral. Then we can conclude for all P with |P| sufficiently small the Riemann sum is almost controlled by the Q-Darboux sums. Q.E.D.

Now that we have recalled the case of the usual integrability. Let us consider the case of the Stieltjes integrals: instead of integrating against \mathrm{d}x, we integrate against \mathrm{d}\rho, where \rho is roughly speaking a “cumulative distribution function”: we assume that \rho:I \to \mathbb{R} is a bounded monotonically increasing function.

The definition of the integrals are largely the same, except that at every step we replace the width of the interval |I'| by the diameter of \rho(I'), i.e. \sup_{I'} \rho - \inf_{I'} \rho. The arguments above immediately also imply that

  • “Riemann-Stieltjes integrable” implies “Generalised-Riemann-Stieltjes integrable”
  • “Darboux-Stieltjes integrable” implies “Generalised-Riemann-Stieltjes integrable”
  • “Generalised-Riemann-Stieltjes integrable” implies “Darboux-Stientjes integrable”

However, Darboux-Stieltjes integrable functions need not be Riemann-Stieltjes integrable. The possibility of failure can be seen in the proof of the theorem above, where we used the fact that |P| is allow to be made arbitrarily small. The same estimate, in the case of the Stieltjes version of the integrals, has |P| replaced by \sup_{J'\in P} (\sup_{J'} \rho - \inf_{J'} \rho), which for arbitrary partitions need to shrink to zero. To have a concrete illustration, we give the following:

Let I = [0,1]. Let \rho(x) = 0 if x < \frac12 and 1 otherwise. Let f(x) = 0 if x \leq \frac12 and 1 otherwise. Let Q_0 be the partition \{ [0,\frac12], [\frac12,1]\}. We have that
\displaystyle \sum_{I'\in Q_0} (\sup_{I'} f) (\sup_{I'} \rho - \inf_{I'} \rho) = 0 \cdot (1 - 0) + 1\cdot (1 - 1) = 0
\displaystyle \sum_{I'\in Q_0} (\inf_{I'} f) (\sup_{I'} \rho - \inf_{I'} \rho) = 0 \cdot (1-0) + 0 \cdot(1-1) = 0
so we have that in particular the pair (f,\rho) is Darboux-Stieltjes integrable with integral 0. However, let k be any odd integer, consider the partition P_k of [0,1] into k equal portions. Depending on the choice of the selection \sigma, we see that the sum can take the values
\displaystyle \sum_{I'\in P_k} f(\sigma(I')) (\sup_{I'} \rho - \inf_{I'}\rho) = f(\sigma([\frac12 - \frac1{2k},\frac12 + \frac1{2k}])) (1 - 0) \in \{0,1\}
which shows that the Riemann-Stieltjes condition can never be satisfied.

The example above where both f and \rho are discontinuous at the same point is essentially sharp. A easy modification of the previous theorem shows that
If at least one of f,\rho is continuous, then Darboux-Stieltjes integrability is equivalent to Riemann-Stieltjes integrability.

Remark The nonexistence of Riemann-Stieltjes integral when f and g has shared discontinuity points is similar in spirit to the idea in distribution theory where whether the product of two distributions is well-defined (as a distribution) depends on their wave-front sets.

Bouncing a quantum particle back and forth

If you have not seen my previous two posts, you should read them first.

In the two previous posts, I shot particles (okay, simulated the shooting on a computer) at a single potential barrier and looked at what happens. What happens when we have more than one barrier? In the classical case the picture is easy to understand: a particle with insufficient energy to escape will be trapped in the local potential well for ever, while a particle with sufficiently high energy will gain freedom and never come back. But what happens in the quantum case?

If the intuition we developed from scattering a quantum particle against a potential barrier, where we see that depending on the frequency (energy) of the particle, some portion gets transmitted and some portion gets reflected, is indeed correct, what we may expect to see is that the quantum particle bounces between the two barriers, each time losing some amplitude due to tunneling.

But we also saw that the higher frequency components of the quantum particle have higher transmission amplitudes. So we may expect that the high frequency components to decay more rapidly than the low frequency ones, so the frequency of the “left over” parts will continue to decay in time. This however, would be wrong, because we would be overlooking one simple fact: by the uncertainty principle again, very low frequency waves cannot be confined to a small physical region. So when we are faced with two potential barriers, the distance between them gives a characteristic frequency. Below this frequency (energy) it is actually not possible to fit a (half) wave between the barriers, and so the low frequency waves must have significant physical extent beyond the barriers, which means that large portions of these low frequency waves will just radiate away freely. Much above the characteristic frequency, however, the waves have large transmission coefficients and will not be confined.

So the net result is that we should expect for each double barrier a characteristic frequency at which the wave can remain “mostly” stuck between the two barriers, losing a little bit of amplitude at each bounce. This will look like a slowly, but exponentially, decaying standing wave. And I have some videos to show for that!

In the video we start with the same random initial data and evolve it under the linear wave equation with different potentials: the equations look like

\displaystyle - \partial^2_{tt} u + \partial^2_{xx} u - V u = 0

where V is a non-negative potential taken in the form

\displaystyle V(x) = a_1 \exp( - x^2 / b_1) - a_2 \exp( -x^2 / b_2)

which is a difference of two Gaussians. For the five waves shown the values of a_1, b_1 are the same throughout. The coefficients a_2 (taken to be \leq a_1) and b_2 (taken to be < b_1) increases from top to bottom, resulting in more and more-widely separated double barriers. Qualitatively we see, as we expected,

  • The shallower and narrower the dip the faster the solution decays.
  • The shallower and narrower the dip the higher the “characteristic frequency”.

As an aside: the video shown above is generated using Python, in particular NumPy and MatPlotLib; the code took significantly longer to run (20+hours) than to write (not counting the HPDE solver I wrote before for a different project, coding and debugging this simulation took about 3 hours or less). On the other hand, this only uses one core of my quad-core machine, and leaves the computer responsive in the mean time for other things. Compare that to Auto-QCM: the last time I ran it to grade a stack of 400+ multiple choice exams it locked up all four cores of my desktop computer for almost an entire day.

As a further aside, this post is related somewhat to my MathOverflow question to which I have not received a satisfactory answer.

… and scattering a quantum particle

In the previous post we shot a classical particle at a potential barrier. In this post we shoot a quantum particle.

Whereas the behaviour of the classical particle is governed by Newton’s laws (where the external force providing the acceleration is given as minus the gradient of the potential), we allow our quantum particle to be governed by the Klein-Gordon equations.

  • Mathematically, the Klein-Gordon equation is a partial differential equation, whereas Newton’s laws form ordinary differential equations. A typical physical interpretation is that the state space of quantum particles are infinite dimensional, whereas the phase space of physics has finite dimensions.
  • Note that physically the Klein-Gordon equation was designed to model a relativistic particle, while in the previous post we used the non-relativistic Newton’s laws. In some ways it would’ve been better to model the quantum particle using Schroedinger’s equation. I plead here however that (a) qualitatively there is not a big difference in terms of the simulated outcomes and (b) it is more convenient for me to use the Klein-Gordon model as I already have a finite-difference solver for hyperbolic PDEs coded in Python on my computer.

To model a particle, we set the initial data to be a moving wave packet, such that at the initial time the solution is strongly localized and satisfies \partial_t u + \partial_x u = 0. Absent the mass and potential energy terms in the Klein-Gordon equation (so under the evolution of the free wave equation), this wave packet will stay coherent and just translate to the right as time goes along. The addition of the mass term causes some small dispersion, but the mass is chosen small so that this is not a large effect. The main change to the evolution is the potential barrier, which you can see illustrated in the simulation.

The video shows 8 runs of the simulation with different initial data. Whereas in the classical picture the initial kinetic energy is captured by the initial speed at which the particle is moving, in the quantum/wave picture the kinetic energy is related to the central frequency of your wave packet. So each of the 8 runs have increasing frequency offset that represents increasing kinetic energy. The simulation has two plots, the top shows the square of the solution itself, which gives a good indication of where physically the wave packet is located. The bottom shows a normalized kinetic energy density (I have to include a normalization since the kinetic energy of the first and last particles differ roughly 10 fold).

One notices that in the first two runs, the kinetic energy is sufficiently small that the particle mostly bounces back to the left after hitting the potential.

For the third and fourth runs (frequency shift \sqrt{2} and \sqrt{3} respectively) we see that while a significant portion of the particle bounces back, a noticeable portion “tunnels through” the barrier: this caused by a combination of the quantum tunneling phenomenon and the wave packet form of the initial data.

The phenomenon of quantum tunneling manifests in that all non-zero energy waves will penetrate a finite potential barrier a little bit. But the amount of penetration decays to zero as the energy of the wave goes to zero: this is known as the semiclassical regime. In the semiclassical limit it is known that quantum mechanics converge toward classical mechanics, and so in the low-energy limit we expect our particle to behave like a classical particle and bounce off. So we see that naturally increasing the energy (frequency) of our wave packet we expect more of the tunneling to happen.

Further, observe that by shaping our data into a wave packet it necessarily contains some high frequency components (due to Heisenberg uncertainty principle); high frequency, and hence high energy components do not “see” the potential barrier. Even in the classical picture high energy particles would fly over the potential barrier. So for wave packets there will always be some (perhaps not noticeable due to the resolution of our computation) leakage of energy through the potential barrier. The quantum effect on these high energy waves is that they back-scatter. Whereas the classical high energy particles just fly directly over the barrier, a high energy quantum particle will leave some parts of itself behind the barrier always. We see this in the sixth and seventh runs of the simulation, where the particle mostly passes through the barrier, but a noticeable amount bounces off in the opposite direction.

In between during the fifth run, where the frequency shift is 2, we see that the barrier basically split the particle in two and send one half flying to the right and the other half flying to the left. Classically this is the turning point between particles that go over the bump and particles that bounces back, and would be the case (hard to show numerically!) where a classical particle comes in from afar with just enough energy that it comes to a half at the top of the potential barrier!

And further increasing the energy after the seventh run, we see in the final run a situation where only a negligible amount of the particle scatters backward with almost all of it passing through the barrier unchanged. One interesting thing to note however is that just like the case of the classical particle, the wave packet appears to “slow down” a tiny bit as it goes over the potential barrier.

Shooting a classical particle…

Here’s a small animation of what happens when you try to shoot a classical particle when there’s a potential barrier. For small initial kinetic energies, the particle bounces back. For large initial kinetic energies, the particle goes over the hump, first decelerating and then accelerating in the process.

(It may be best to watch this full screen with HD if the network supports it.)

(The NumPy code is pretty simple to write for this; and it runs relatively fast. The one for my next post is a bit more complicated and takes rather much longer to run. Stay tuned!)

An optimization problem: variation

Examining the theorem proven in the previous post, we are led naturally to ask whether there are higher order generalizations.

Question: Let f \in C^{k}([-1,1]) with f^{(k)} > 0. What can we say about the minimizer of C = \int_{-1}^1 |f(x) - p(x)|~\mathrm{d}x where p ranges over degree k-1 polynomials?

It is pretty easy to see that we expect p to intersect f at the maximum number of points, which is k. We label those points x_1, \ldots, x_k and call x_0 = -1 and x_{k+1}= 1. Then the cost function can be written as
\displaystyle C = \sum_{j = 0}^k (-1)^j \int_{x_j}^{x_{j+1}} f(x) - p(x; x_1, \ldots, x_k) ~\mathrm{d}x
Since we know that values of p at the points x_1, \ldots, x_k we can write down the interpolation polynomial explicitly using Sylvester’s formula:
\displaystyle p = \sum_{j = 1}^k \left( \prod_{1 \leq m \leq k, m\neq j} \frac{x - x_m}{x_j - x_m} \right) f(x_j) = \sum L_j(x; x_1, \ldots, x_k) f(x_j)

The partial derivatives are now
\displaystyle \partial_n C = \sum_{j = 0}^k (-1)^{j+1} \int_{x_j}^{x_{j+1}} \partial_n p(x; x_1, \ldots, x_k) ~\mathrm{d}x
It remains to compute \partial_n p for 1 \leq n \leq k. We observe that when n \neq j
\displaystyle \partial_n L_j = - \frac{1}{x - x_n} L_j + \frac{1}{x_j - x_n} L_j
and also
\displaystyle \partial_n L_n = - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n
\displaystyle \partial_n p =  \sum_{j \neq n} \frac{x-x_j}{(x_j - x_n)(x - x_n)} L_j f(x_j) + L_n f'(x_n) - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n f(x_n)
Now, we observe that
\displaystyle \frac{x - x_j}{x - x_n} L_j =   - \left( \prod_{m \neq n,j} \frac{x_n - x_m}{x_j - x_m} \right)  L_n
so after some computation we arrive at
\displaystyle \partial_n p = L_n(x) \cdot \left[ f'(x_n) - \sum_{j \neq n} \frac{1}{x_j - x_n} \left(\left( \prod_{m \neq j,n}\frac{x_n - x_m}{x_j - x_m}\right)f(x_j) - f(x_n) \right)\right]
which we can further simplify to
\displaystyle \partial_n p = L_n(x) \cdot \left( f'(x_n) - p'(x_n)\right)
Now, since f and p cross transversely at x_n, the difference of their derivatives is non-zero. (This harks back to our assumption that f^{(k)} > 0.) So we are down, as in the case where k = 2, to equations entirely independent of f.

More precisely, we see that the stationarity condition becomes the choice of x_1, \ldots, x_k such that the integrals
\displaystyle \sum_{j = 0}^k (-1)^{j} \int_{x_j}^{x_{j+1}} L_n(x)  ~\mathrm{d}x = 0
for each n. Since L_n form a basis for the polynomials of degree at most k-1, we have that the function
\chi(x) = (-1)^j \qquad x \in (x_j, x_{j+1})
is L^2 orthogonal to every polynomial of degree at most k-1. So in particular the x_j are solutions to the following system of equations
x_0 = -1, \qquad x_{k+1} = 1
\sum_{j = 0}^k (-1)^j \left[ x_{j+1}^d - x_{j}^d \right] = 0 \qquad \forall d \in \{1, \ldots, k\}

From symmetry considerations we have that x_j = - x_{k+1 - j}. This also kills about half of the equations. For the low k we have

  1. \{ 0\}
  2. \{ -1/2, 1/2\}
  3. \{-1/2, 0, 1/2\}
  4. \{ (\pm 1 \pm \sqrt{5})/4 \}
  5. \{ 0, \pm\frac12, \pm \frac{\sqrt{3}}2 \}