… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

Category: Require high school maths

A better estimate of Kempner’s series

The Kempner series recently regained some notoriety due to a Saturday Morning Breakfast Cereal comic (the last panel). The observation first appeared in a 1914 American Mathematical Monthly article, in which it was shown that the series consisting of the usual harmonic series

$\displaystyle \sum_{n = 1}^{\infty} \frac{1}{n}$

but with all the terms, whose decimal expansion includes the digit ‘9’, removed, in fact converges to some number below 80. The original proof is given in the Wikipedia article linked above, so I will not repeat it. But to make it easier to see the idea: let us first think about the case where the number is expressed in base 2. In base 2, all the positive integers has the leading binary bit being 1 (since it cannot be zero). Therefore there are no binary positive numbers without the bit ‘1’ in its expansion. So the corresponding series converges trivially to zero. How about the case of the bit ‘0’? The only binary numbers without any ‘0’ bits are

$1, 3 = (11)_2, 7 = (111)_2, 15 = (1111)_2, \ldots, 2^n - 1$.

So the corresponding series actually becomes

$\displaystyle \sum_{n = 1}^\infty \frac{1}{2^n - 1} \leq \sum_{n = 1}^\infty \frac{1}{2^{n-1}} = 2$

So somewhere from the heavily divergent harmonic series, we pick up a rapidly converging geometric series. So what’s at work here? Among all the n-bit binary numbers, exactly 1 has all bits not being 0. So the density of these kinds of numbers decays rather quickly: in base 2, there are $2^{n-1}$ numbers that are exactly n-bit long. So if a number $x$ has a binary representation that is exactly n bits long (which means that $2^{n} \leq x < 2^{n+1}$), the chances that it is one of the special type of numbers is $\frac{1}{2^{n-1}} \approx \frac{2}{x}$. This probability we can treat then as a density: replacing the discrete sum $\sum \frac{1}{n}$ by the integral $\int \frac{1}{x}\mathrm{d}x$ (calculus students may recognize this as the germ of the “integral test”) and replacing the $\mathrm{d}x$ by the density $\frac{2}{x} \mathrm{d}x$, we get the estimate

$\displaystyle \text{Binary Kempner series} \approx \int_1^\infty \frac{2}{x^2} = 2$.

Doing the same thing with the original Kempner series gives that the chances a n-digit number does not contain the digit nine to be

$\displaystyle \left(\frac89\right)\left(\frac9{10}\right)^{n-1} \approx \left( \frac{9}{10}\right)^{n}$

The length of the decimal expansion of a natural number $x$ is basically $1 + \log x$. So the density we are interested in becomes

$\displaystyle \left( \frac{9}{10}\right)^{1+\log x} ~\mathrm{d}x$

From this we can do an integral estimate

$\displaystyle \text{Kempner series} \approx 0.9 \times \int_1^\infty \left( \frac{9}{100}\right)^{\log x} ~\mathrm{d}x$

The integral can be computed using that

$\displaystyle a^{\log b} = b^{\log a}$

to get

$\displaystyle 0.9 \times \int_1^{\infty} \left( \frac{9}{100}\right)^{\log x} ~\mathrm{d}X = 0.9\times \int_1^\infty x^{\log 9 - 2} ~\mathrm{d}x = \frac{0.9}{1 - \log 9} \approx 19.66$

Notice that this estimate is much closer to the currently known value of roughly 22.92 than to the original upper bound of 80 computed by Kempner.

Kempner’s estimate is a heavy overestimate because he performed a summation replacing every n-digit long number that does not contain the digit 9 by $10^{n-1}$; this number can be many times (up to 9) times smaller than the original number. Our estimate is low because among the n-digit long numbers, the numbers that do not contain the digit 9 are not evenly distributed: they tend to crowd in the front rather than in the back (in fact, we do not allow them to crowd in the back because none of the numbers that start with the digit 9 is admissible). So if in the original question we had asked for numbers that do not contain the digit 1, then our computation will give an overestimate instead since these numbers tend to crowd to the back.

… and scattering a quantum particle

In the previous post we shot a classical particle at a potential barrier. In this post we shoot a quantum particle.

Whereas the behaviour of the classical particle is governed by Newton’s laws (where the external force providing the acceleration is given as minus the gradient of the potential), we allow our quantum particle to be governed by the Klein-Gordon equations.

• Mathematically, the Klein-Gordon equation is a partial differential equation, whereas Newton’s laws form ordinary differential equations. A typical physical interpretation is that the state space of quantum particles are infinite dimensional, whereas the phase space of physics has finite dimensions.
• Note that physically the Klein-Gordon equation was designed to model a relativistic particle, while in the previous post we used the non-relativistic Newton’s laws. In some ways it would’ve been better to model the quantum particle using Schroedinger’s equation. I plead here however that (a) qualitatively there is not a big difference in terms of the simulated outcomes and (b) it is more convenient for me to use the Klein-Gordon model as I already have a finite-difference solver for hyperbolic PDEs coded in Python on my computer.

To model a particle, we set the initial data to be a moving wave packet, such that at the initial time the solution is strongly localized and satisfies $\partial_t u + \partial_x u = 0$. Absent the mass and potential energy terms in the Klein-Gordon equation (so under the evolution of the free wave equation), this wave packet will stay coherent and just translate to the right as time goes along. The addition of the mass term causes some small dispersion, but the mass is chosen small so that this is not a large effect. The main change to the evolution is the potential barrier, which you can see illustrated in the simulation.

The video shows 8 runs of the simulation with different initial data. Whereas in the classical picture the initial kinetic energy is captured by the initial speed at which the particle is moving, in the quantum/wave picture the kinetic energy is related to the central frequency of your wave packet. So each of the 8 runs have increasing frequency offset that represents increasing kinetic energy. The simulation has two plots, the top shows the square of the solution itself, which gives a good indication of where physically the wave packet is located. The bottom shows a normalized kinetic energy density (I have to include a normalization since the kinetic energy of the first and last particles differ roughly 10 fold).

One notices that in the first two runs, the kinetic energy is sufficiently small that the particle mostly bounces back to the left after hitting the potential.

For the third and fourth runs (frequency shift $\sqrt{2}$ and $\sqrt{3}$ respectively) we see that while a significant portion of the particle bounces back, a noticeable portion “tunnels through” the barrier: this caused by a combination of the quantum tunneling phenomenon and the wave packet form of the initial data.

The phenomenon of quantum tunneling manifests in that all non-zero energy waves will penetrate a finite potential barrier a little bit. But the amount of penetration decays to zero as the energy of the wave goes to zero: this is known as the semiclassical regime. In the semiclassical limit it is known that quantum mechanics converge toward classical mechanics, and so in the low-energy limit we expect our particle to behave like a classical particle and bounce off. So we see that naturally increasing the energy (frequency) of our wave packet we expect more of the tunneling to happen.

Further, observe that by shaping our data into a wave packet it necessarily contains some high frequency components (due to Heisenberg uncertainty principle); high frequency, and hence high energy components do not “see” the potential barrier. Even in the classical picture high energy particles would fly over the potential barrier. So for wave packets there will always be some (perhaps not noticeable due to the resolution of our computation) leakage of energy through the potential barrier. The quantum effect on these high energy waves is that they back-scatter. Whereas the classical high energy particles just fly directly over the barrier, a high energy quantum particle will leave some parts of itself behind the barrier always. We see this in the sixth and seventh runs of the simulation, where the particle mostly passes through the barrier, but a noticeable amount bounces off in the opposite direction.

In between during the fifth run, where the frequency shift is 2, we see that the barrier basically split the particle in two and send one half flying to the right and the other half flying to the left. Classically this is the turning point between particles that go over the bump and particles that bounces back, and would be the case (hard to show numerically!) where a classical particle comes in from afar with just enough energy that it comes to a half at the top of the potential barrier!

And further increasing the energy after the seventh run, we see in the final run a situation where only a negligible amount of the particle scatters backward with almost all of it passing through the barrier unchanged. One interesting thing to note however is that just like the case of the classical particle, the wave packet appears to “slow down” a tiny bit as it goes over the potential barrier.

Shooting a classical particle…

Here’s a small animation of what happens when you try to shoot a classical particle when there’s a potential barrier. For small initial kinetic energies, the particle bounces back. For large initial kinetic energies, the particle goes over the hump, first decelerating and then accelerating in the process.

(It may be best to watch this full screen with HD if the network supports it.)

(The NumPy code is pretty simple to write for this; and it runs relatively fast. The one for my next post is a bit more complicated and takes rather much longer to run. Stay tuned!)

An optimization problem: theme

Let’s start simple:

Question 1: What is the linear function $\ell(x)$ that minimizes the integral $\int_{-1}^1 |x^2 + x - \ell(x)| ~\mathrm{d}x$? In other words, what is the best linear approximation of $x^2 + x$ in the $L^1([-1,1])$ sense?

This is something that can in principle be solved by a high schooler with some calculus training. Here’s how one solution may go:

Solution: All linear functions take the form $\ell(x) = ax + b$. The integrand is equal to $x^2 + x - \ell(x)$ when $x^2 + x \geq \ell(x)$, and $\ell(x) - x - x^2$ otherwise. So we need to find the points of intersection. This requires solve $x^2 + x - ax - b = 0$, which we can solve by the quadratic formula. In the case where $x^2 + x - a x - b$ is signed, we see that changing $b$ we can make the integrand strictly smaller, and hence we cannot attain a minimizer. So we know that at the minimizer there must exist at least one root.

Consider the case where there is only one root in the interval (counted with multiplicity), call the root $x_0$. We have that the integral to minimize is equal to
$\displaystyle \left| \int_{-1}^{x_0} x^2 + x - ax - b~\mathrm{d}x \right| + \left| \int_{x_0}^1 x^2 + x - a x - b ~\mathrm{d}x \right|$
each part of which can be computed explicitly to obtain
$\displaystyle \left| \frac13 x_0^3 + \frac13 + \frac{1-a}{2} x_0^2 - \frac{1-a}{2} - b x_0 - b \right| + \left| \frac13 - \frac13 x_0^3 + \frac{1-a}{2} - \frac{1-a}{2} x_0^2 - b + b x_0\right|$
Since we know that the two terms comes from integrands with different signs, we can combine to get
$\displaystyle \left| \frac23 x_0^3 + (1-a) x_0^2 - (1-a) - 2b x_0 \right|$
as the integrand. Now, we cannot just take the partial derivatives of the above expression with respect to $a,b$ and set that to zero and see what we get: the root $x_0$ depends also on the parameters. So what we would do then is to plug in $x_0$ using the expression derived from the quadratic formula, $x_0 = \frac{1}{2} \left( a - 1 \pm \sqrt{ (1-a)^2 + 4b}\right)$, and then take the partial derivatives. Before that, though, we can simplify a little bit: since $x_0^3 + (1-a) x_0^2 - b x_0 = 0$ from the constraint, the quantity to minimize is now
$\displaystyle \left| - \frac13 x_0^3 - (1-a) - b x_0 \right|$
A long computation taking the $\partial_b$ now shows that necessarily $x_0 = 0$, which implies that $b = 0$. But for the range of $a$ where there is only one root in the interval, the quantity does not achieve a minimum. (The formal minimizer happens at $a = 1$ but we see for this case the integrand of the original cost function is signed.

So we are down to the case where there are two roots in the interval. Now we call the roots $x_+$ and $x_-$, and split the integral into
$\displaystyle \left| \int_{-1}^{x_-} x^2 + x - ax - b~\mathrm{d}x - \int_{x_-}^{x_+} x^2 + x - ax - b~\mathrm{d}x + \int_{x_+}^1 x^2 + x - ax - b~\mathrm{d}x \right|$
and proceed as before. (The remainder of the proof is omitted; the reader is encouraged to carry out this computation out by hand to see how tedious it is.) Read the rest of this entry »

Find the errors!

I was tasked with grading the following exam question:

Using methods discussed in class this term, find the mean value over $[-\pi,\pi]$ of the function $f(x) = \sin(2x) \cdot \exp [1 - \cos (2x)]$.

The conceptual parts of the question are (based on the syllabus of the course)

1. Connecting “mean value of a continuous function over an interval” with “integration”, an application of calculus to probability theory and statistics.
2. Evaluating an integral by substitutions/change of variables.
3. Familiarity with the trigonometric functions $\sin, \cos$ and their properties (periodicity, derivative relations, etc).

I was told to grade with an emphasis on the above, so I prepared a grading rubric such that the above three key ideas gave most of the points. Here’s an otherwise reasonable answer that unfortunately does not use the methods discussed in class and so would receive (close to) zero credit (luckily no students turned in an answer like this):

The function $f(x)$ satisfies $f(x) = - f(-x)$, i.e. it is odd. So the average $(f(x) + f(-x))/2 = 0$. Since for every $x\in [-\pi,\pi]$, we also have $-x \in [-\pi,\pi]$, the mean value of $f(x)$ over that interval must be zero.

Here are some responses that can get quite a good number of points* (at least more than the above answer) based on the grading rubric (I guess it means I wasn’t imaginative enough in coming up with possible student errors). (I took the liberty of combining some of the most awful bits from different answers; the vast majority of the students’ answers are not nearly that horrible**, though only one student remembered that when changing variables one also needs to change the limits of integration.) Since most students who made any reasonable attempt on the question successfully wrote down the integral

$\displaystyle \mu = \frac{1}{\pi - (-\pi)} \int_{-\pi}^\pi f(y)~\mathrm{d}y$

(which is not to say no unreasonable attempts were made: just ask the poor bloke who decided that the Mean Value Theorem must play a role in this question), I will start from there. All mistakes below are intentional on my part. What amazed me most is how many students were able to get to the correct mean value… Read the rest of this entry »

Decay of waves I: Introduction

In the next week or so, I will compose a series of posts on the heuristics for the decay of the solutions of the wave equation on curved (and flat) backgrounds. (I have my fingers crossed that this does not end up aborted like my series of posts on compactness.) In this first post I will give some physical intuition of why waves decay. In the next post I will write about the case of linear and nonlinear waves on flat space-time, which will be used to motivate the construction, in post number three, of an example space-time which gives an upper bound on the best decay that can be generally expected for linear waves on non-flat backgrounds. This last argument, due to Mihalis Dafermos, shows that why the heuristics known as Price’s Law is as good as one can reasonably hope for in the linear case. (In the nonlinear case, things immediately get much much worse as we will see already in the next post.)

This first post will not be too heavily mathematical, indeed, the only realy foray into mathematics will be in the appendix; the next ones, however, requires some basic familiarity with partial differential equations and pseudo-Riemannian geometry. Read the rest of this entry »

Snell’s law and geometry

Today’s post is somewhat inspired by this question on math.stackexchange. To begin with, recall Snell’s Law from geometric optics. It gives a rule for describing the propagation of light between two media of different indices of refraction: namely that across an interface where on one side the index of refraction is $n_1$ and on the other side $n_2$, the angles of incidence and refraction $\theta_1$ and $\theta_2$, as measured from the normal to the interface, satisfies the relation

$\displaystyle \frac{\sin \theta_1}{\sin \theta_2} = \frac{n_2}{n_1}$.

Here I pose two problems:

1. Given an observer situated above the interface, and a fish below the interface, find the “apparent position” of the fish according to the observer.
2. Given an observer situated above the interface, and the “apparent position” of a fish below the interface, find the actual position of the fish.

Without loss of generality we can assume that the interface is the $x$-axis, and the observer is situated at the point $(0,1)$ in the $x-y$ plane. Read the rest of this entry »

Repeating decimals

Here’s something a bit random for Friday. (Presumably this actually is quite well-known; but having never taken a course in number theory…)

Question: Given a fraction m/n in lowest terms, and expand it in “decimal notation” in a number base b. What are the conditions on m, n, and b that guarantees that the expansion eventually consists of a single repeating “digit”? (Note, we can clearly assume 0 < m < n.)

To make the question more clear, let’s look at some examples in base b = 10. Obviously, if n = 1, then the fraction is in fact integral, and its expansion is $m.\dot{0}$ has a repeated digit 0. If n = 2, the decimal expansion is either $0.\dot0$ or $0.5\dot{0}$. Similarly n = 4, 5, 8, 10 all have terminating decimals, so repeats the digit 0 eventually. n = 3,6,9 on the other hand, will lead to repeating digits other than 0, whereas n = 7 will lead to a repetition of a string of the 6 digits …142857142857…

Here’s the solution. Suppose m/n has an expansion of the prescribed form. Recall that a “decimal expansion”

$0.a_1a_2a_3a_4\ldots$ in base $b$

is in fact a short hand for

$\displaystyle 0 + \frac{a_1}{b} + \frac{a_2}{b^2} + \frac{a_3}{b^3} \ldots$.

So the criterion specified in the question is equivalent to the condition that

There exists some integer K, a number $c < b^K$, and a digit d such that
$\displaystyle \frac{m}{n} = \frac{c}{b^K} + \frac{d}{b^K} \sum_{j = 1}^{\infty} \frac{1}{b^j}$
which corresponds to the decimal expansion
$\displaystyle 0.\underbrace{XXXXXXXXXXXX}_{\mbox{the digits given by }c} ddddddddddd\ldots$

The infinite sum on the far right can be solved: $\sum_1^\infty b^{-j} = (b-1)^{-1}$. Multiplying the expression through by $b^K$ we have

$\displaystyle \frac{m b^K}{n} = c + \frac{d}{b-1}$

Now, by redefining $c \to c\cdot b^k + d \sum_1^{k-1} b^j$, we can replace $K\to K+k$. So we can set $K$ arbitrarily large. Which means that by doing so, after setting the left hand side to lowest terms, we can “remove” from n any prime factors that also divides $b$. To be more precise: suppose $p$ is a prime such that $p^\ell | n$ and $p^{\ell+1}$ does not divide $n$. (So that p goes into n exactly $\ell$ times.) Now suppose $p|b$ also, then the fraction $b^\ell / n$, when written in lowest terms, has a denominator that cannot be divided by $p$. Repeating this for all common prime factors of n and b we can get rid of all common prime factors from the denominator. Let us denote by $n_0$ the number $n$ with all the prime divisors of $b$ removed.

Our equation then implies that there exists some integer $m'$ that is coprime with $n_0$ such that

$\displaystyle \frac{m'}{n_0} = \frac{d}{b-1}$

which means that $n_0$ must divide $(b-1)$. That is

Answer: Let $n_0$ be the number $n$ with all prime divisors of $b$ removed. Then $n_0$ must divide $(b-1)$.

For base b = 10, (b-1) = 9. This means that any n for which the decimal expansion is eventually repeating with period 1 must have the form

$n = 2^s\cdot 5^t\cdot 3^r$ where $s,t$ are non-negative integers, and $r$ is one of 0, 1, or 2.

What is a function anyway?

I tried to teach differential geometry to a biophysicist friend yesterday (at his request; he doesn’t really need to know it, but he wanted to know how certain formulae commonly used in their literature came about). Rather surprisingly I hit an early snag. Hence the title of this entry.

Part of the problem was, as usual, my own folly. Since he is more interested in vector fields and tensor fields, I thought I can take a short cut and introduce notions more with a sheafy flavour. (At the end of the day, tangent and cotangent spaces are defined (rather circularly) as dual of each other, and each with a partial, hand-wavy description.) I certainly didn’t expect having to spend a large amount of time explaining the concept of the function.
Read the rest of this entry »

Arrow’s Impossibility Theorem

Partially prompted by Terry’s buzz, I decided to take a look at Arrow’s Impossibility Theorem. The name I have heard before, since I participated in CollegeBowl as an undergraduate, and questions about Arrow’s theorem are perennial favourites. The theorem’s most famous interpretation is in voting theory:

Some definitions

1. Given a set of electors $E$ and a finite set of candidates $C$, a preference $\pi$ assigns to each elector $e \in E$ an ordering of the set $C$. In particular, we can write $\pi_e(c_1) > \pi_e(c_2)$ for the statement “the elector $e$ prefers candidate $c_1$ to candidate $c_2$“. The set of all possible preferences is denoted $\Pi$.
2. A voting system $v$ assigns to each preference $\pi\in\Pi$ an ordering of the set $C$.
3. Given a preference $\pi$ and two candidates $c_1,c_2$, a bloc biased toward $c_1$ is defined as the subset $b(\pi,c_1,c_2) := \{ e\in E | \pi_e(c_1) > \pi_e(c_2) \}$
4. The voting system is said to be
1. unanimous if, whenever all electors prefer candidate $c_1$ to $c_2$, the voting system will return as such. In other words, “$\pi_e(c_1) > \pi_e(c_2) \forall e\in E \implies v(\pi,c_1) > v(\pi,c_2)$“.
2. independent if the voting results comparing candidates $c_1$ and $c_2$ only depend on the individual preferences between them. In particular, whether $v(\pi,c_1) > v(\pi,c_2)$ only depends on $b(\pi,c_1,c_2)$. An independent system is said to be monotonic if, in addition, a strictly larger biased bloc will give the same voting result: if $v(\pi,c_1) > v(\pi,c_2)$ and $b(\pi,c_1,c_2) \subset b(\pi',c_1,c_2)$, then $v(\pi',c_1) > v(\pi',c_2)$ necessarily.
3. dictator-free if there isn’t one elector $e_0\in E$ whose vote always coincides with the end-result. In other words, we define a dictator to be an elector $e_0$ such that $v(\pi,c_1) > v(\pi,c_2) \iff \pi_{e_0}(c_1) > \pi_{e_0}(c_2)$ for any $\pi\in \Pi, c_1,c_2\in C$.
5. A voting system is said to be fair if it is unanimous, independent and monotonic, and has no dictators.

And the theorem states

Arrow’s Impossibility Theorem
In an election consisting of a finite set of electors $E$ with at least three candidates $C$, there can be no fair voting system.

As we shall see, finiteness of the set of electors and the lower-bound on the number of candidates are crucial. In the case where there are only two candidates, the simple-majority test is a fair voting system. (Finiteness is more subtle.) It is also easy to see that if we allow dictators, i.e. force the voting results to align with the preference of a particular predetermined individual, then unanimity, independence, and monotonicity are all trivially satisfied.

What’s wrong with the simple majority test in more than three candidates? The problem is that it is not, by definition, a proper voting system: it can create loops! Imagine we have three electors $e_1, e_2, e_3$ and three candidates $c_1,c_2,c_3$. The simple majority test says that $v(\pi,c_1) > v(\pi,c_2)$ if and only if two or more of the electors prefer $c_1$ to $c_2$. But this causes a problem in the following scenario:

$e_1: c_1 > c_2 > c_3$
$e_2: c_2 > c_3 > c_1$
$e_3: c_3 > c_1 > c_2$

then the voting result will have $v(c_1) > v(c_2)$, $v(c_2) > v(c_3)$, and $v(c_3) > v(c_1)$, a circular situation which implies that the “result” is not an ordering of the candidates! (An ordering of the set requires the comparisons to be transitive.) So the simple-majority test is, in fact, not a valid voting system.

From this first example, we see already that, in the situation of more than 2 candidates, designing a voting system is a non-trivial problem. Making them fair, as we shall see, will be impossible. Read the rest of this entry »