… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

## Category: Require high school maths

### An optimization problem: theme

Let’s start simple:

Question 1: What is the linear function $\ell(x)$ that minimizes the integral $\int_{-1}^1 |x^2 + x - \ell(x)| ~\mathrm{d}x$? In other words, what is the best linear approximation of $x^2 + x$ in the $L^1([-1,1])$ sense?

This is something that can in principle be solved by a high schooler with some calculus training. Here’s how one solution may go:

Solution: All linear functions take the form $\ell(x) = ax + b$. The integrand is equal to $x^2 + x - \ell(x)$ when $x^2 + x \geq \ell(x)$, and $\ell(x) - x - x^2$ otherwise. So we need to find the points of intersection. This requires solve $x^2 + x - ax - b = 0$, which we can solve by the quadratic formula. In the case where $x^2 + x - a x - b$ is signed, we see that changing $b$ we can make the integrand strictly smaller, and hence we cannot attain a minimizer. So we know that at the minimizer there must exist at least one root.

Consider the case where there is only one root in the interval (counted with multiplicity), call the root $x_0$. We have that the integral to minimize is equal to
$\displaystyle \left| \int_{-1}^{x_0} x^2 + x - ax - b~\mathrm{d}x \right| + \left| \int_{x_0}^1 x^2 + x - a x - b ~\mathrm{d}x \right|$
each part of which can be computed explicitly to obtain
$\displaystyle \left| \frac13 x_0^3 + \frac13 + \frac{1-a}{2} x_0^2 - \frac{1-a}{2} - b x_0 - b \right| + \left| \frac13 - \frac13 x_0^3 + \frac{1-a}{2} - \frac{1-a}{2} x_0^2 - b + b x_0\right|$
Since we know that the two terms comes from integrands with different signs, we can combine to get
$\displaystyle \left| \frac23 x_0^3 + (1-a) x_0^2 - (1-a) - 2b x_0 \right|$
as the integrand. Now, we cannot just take the partial derivatives of the above expression with respect to $a,b$ and set that to zero and see what we get: the root $x_0$ depends also on the parameters. So what we would do then is to plug in $x_0$ using the expression derived from the quadratic formula, $x_0 = \frac{1}{2} \left( a - 1 \pm \sqrt{ (1-a)^2 + 4b}\right)$, and then take the partial derivatives. Before that, though, we can simplify a little bit: since $x_0^3 + (1-a) x_0^2 - b x_0 = 0$ from the constraint, the quantity to minimize is now
$\displaystyle \left| - \frac13 x_0^3 - (1-a) - b x_0 \right|$
A long computation taking the $\partial_b$ now shows that necessarily $x_0 = 0$, which implies that $b = 0$. But for the range of $a$ where there is only one root in the interval, the quantity does not achieve a minimum. (The formal minimizer happens at $a = 1$ but we see for this case the integrand of the original cost function is signed.

So we are down to the case where there are two roots in the interval. Now we call the roots $x_+$ and $x_-$, and split the integral into
$\displaystyle \left| \int_{-1}^{x_-} x^2 + x - ax - b~\mathrm{d}x - \int_{x_-}^{x_+} x^2 + x - ax - b~\mathrm{d}x + \int_{x_+}^1 x^2 + x - ax - b~\mathrm{d}x \right|$
and proceed as before. (The remainder of the proof is omitted; the reader is encouraged to carry out this computation out by hand to see how tedious it is.) Read the rest of this entry »

### Decay of waves I: Introduction

In the next week or so, I will compose a series of posts on the heuristics for the decay of the solutions of the wave equation on curved (and flat) backgrounds. (I have my fingers crossed that this does not end up aborted like my series of posts on compactness.) In this first post I will give some physical intuition of why waves decay. In the next post I will write about the case of linear and nonlinear waves on flat space-time, which will be used to motivate the construction, in post number three, of an example space-time which gives an upper bound on the best decay that can be generally expected for linear waves on non-flat backgrounds. This last argument, due to Mihalis Dafermos, shows that why the heuristics known as Price’s Law is as good as one can reasonably hope for in the linear case. (In the nonlinear case, things immediately get much much worse as we will see already in the next post.)

This first post will not be too heavily mathematical, indeed, the only realy foray into mathematics will be in the appendix; the next ones, however, requires some basic familiarity with partial differential equations and pseudo-Riemannian geometry. Read the rest of this entry »

### What is a function anyway?

I tried to teach differential geometry to a biophysicist friend yesterday (at his request; he doesn’t really need to know it, but he wanted to know how certain formulae commonly used in their literature came about). Rather surprisingly I hit an early snag. Hence the title of this entry.

Part of the problem was, as usual, my own folly. Since he is more interested in vector fields and tensor fields, I thought I can take a short cut and introduce notions more with a sheafy flavour. (At the end of the day, tangent and cotangent spaces are defined (rather circularly) as dual of each other, and each with a partial, hand-wavy description.) I certainly didn’t expect having to spend a large amount of time explaining the concept of the function.
Read the rest of this entry »

### Arrow’s Impossibility Theorem

Partially prompted by Terry’s buzz, I decided to take a look at Arrow’s Impossibility Theorem. The name I have heard before, since I participated in CollegeBowl as an undergraduate, and questions about Arrow’s theorem are perennial favourites. The theorem’s most famous interpretation is in voting theory:

Some definitions

1. Given a set of electors $E$ and a finite set of candidates $C$, a preference $\pi$ assigns to each elector $e \in E$ an ordering of the set $C$. In particular, we can write $\pi_e(c_1) > \pi_e(c_2)$ for the statement “the elector $e$ prefers candidate $c_1$ to candidate $c_2$“. The set of all possible preferences is denoted $\Pi$.
2. A voting system $v$ assigns to each preference $\pi\in\Pi$ an ordering of the set $C$.
3. Given a preference $\pi$ and two candidates $c_1,c_2$, a bloc biased toward $c_1$ is defined as the subset $b(\pi,c_1,c_2) := \{ e\in E | \pi_e(c_1) > \pi_e(c_2) \}$
4. The voting system is said to be
1. unanimous if, whenever all electors prefer candidate $c_1$ to $c_2$, the voting system will return as such. In other words, “$\pi_e(c_1) > \pi_e(c_2) \forall e\in E \implies v(\pi,c_1) > v(\pi,c_2)$“.
2. independent if the voting results comparing candidates $c_1$ and $c_2$ only depend on the individual preferences between them. In particular, whether $v(\pi,c_1) > v(\pi,c_2)$ only depends on $b(\pi,c_1,c_2)$. An independent system is said to be monotonic if, in addition, a strictly larger biased bloc will give the same voting result: if $v(\pi,c_1) > v(\pi,c_2)$ and $b(\pi,c_1,c_2) \subset b(\pi',c_1,c_2)$, then $v(\pi',c_1) > v(\pi',c_2)$ necessarily.
3. dictator-free if there isn’t one elector $e_0\in E$ whose vote always coincides with the end-result. In other words, we define a dictator to be an elector $e_0$ such that $v(\pi,c_1) > v(\pi,c_2) \iff \pi_{e_0}(c_1) > \pi_{e_0}(c_2)$ for any $\pi\in \Pi, c_1,c_2\in C$.
5. A voting system is said to be fair if it is unanimous, independent and monotonic, and has no dictators.

And the theorem states

Arrow’s Impossibility Theorem
In an election consisting of a finite set of electors $E$ with at least three candidates $C$, there can be no fair voting system.

As we shall see, finiteness of the set of electors and the lower-bound on the number of candidates are crucial. In the case where there are only two candidates, the simple-majority test is a fair voting system. (Finiteness is more subtle.) It is also easy to see that if we allow dictators, i.e. force the voting results to align with the preference of a particular predetermined individual, then unanimity, independence, and monotonicity are all trivially satisfied.

What’s wrong with the simple majority test in more than three candidates? The problem is that it is not, by definition, a proper voting system: it can create loops! Imagine we have three electors $e_1, e_2, e_3$ and three candidates $c_1,c_2,c_3$. The simple majority test says that $v(\pi,c_1) > v(\pi,c_2)$ if and only if two or more of the electors prefer $c_1$ to $c_2$. But this causes a problem in the following scenario:

$e_1: c_1 > c_2 > c_3$
$e_2: c_2 > c_3 > c_1$
$e_3: c_3 > c_1 > c_2$

then the voting result will have $v(c_1) > v(c_2)$, $v(c_2) > v(c_3)$, and $v(c_3) > v(c_1)$, a circular situation which implies that the “result” is not an ordering of the candidates! (An ordering of the set requires the comparisons to be transitive.) So the simple-majority test is, in fact, not a valid voting system.

From this first example, we see already that, in the situation of more than 2 candidates, designing a voting system is a non-trivial problem. Making them fair, as we shall see, will be impossible. Read the rest of this entry »

### Compactness. Part 1: Degrees of freedom

Preamble
This series of posts has its genesis in a couple questions posted on MathOverflow.net. It is taking me a bit longer to write this than expected: I am not entirely sure what level of an audience I should address the post to. Another complication is that it is hard to get ideas organized when I am only writing about this during downtime where I hit a snag in research. The main idea that I want to get to is “a failure of sequential compactness of a domain represents the presence of some sort of infinity.” In this first post I will motivate the definition of compactness in mathematics.

Introduction
We start with the pigeonhole principle: “If one tries to stuff pigeons into pigeonholes, and there are more pigeons than there are pigeonholes, then at least one hole is shared by multiple pigeons.” This can be easily generalized the the case where there are infinitely many pigeons, which is the version we will use

Pigeonhole principle. If one tries to stuff infinitely many pigeons into a finite number of pigeonholes, then at least one hole is occupied by infinitely many pigeons.

Now imagine being the pigeon sorter: I hand you a pigeon, you put it into a pigeonhole. However you sort the pigeons, you are making a (infinite) sequence of assignments of pigeons to pigeonholes. Mathematically, you represent a sequence of points (choices) in a finite set (the list of all holes). In this example, we will say that a point in the set of choices is a cluster point relative to a sequence of choices if that particular point is visited infinitely many times. A mathematical way of formulating the pigeonhole principle then is:

Let $S$ be a finite set. Let $(a_n)$ be a sequence of points in $S$. Then there exist some point $p \in S$ such that $p$ is a cluster point of $(a_n)$.

The key here is the word finite: if you have an infinitely long row of pigeonholes (say running to the right of you), and I keep handing you pigeons, as long as you keep stepping to the right after stuffing a pigeon into a hole, you can keep a “one pigeon per hole (or fewer)” policy. Read the rest of this entry »

### Trip to Oxford; a little lattice problem

Yesterday I gave the relativity seminar at Oxford (the last one to be organized by Piotr Chrusciel, who will be moving to Vienna soon). And I committed one of the cardinal sins of talks: I “put away” information too quickly.

I fully intend to blame it on the technology.

Usually I don’t have this problem with board talks. In the case I have a set number of fixed blackboards, I will go from board 1 to board 2 to board 3 and then back to board 1. Sometimes using board 4 to keep a running tab of important details. In the case I have the sliding blackboards (the kind that has two/three layers of boards and you can slide a completed board up and write on the one underneath), I usually do top layer of board 1 to bottom layer to top layer of board 2 then bottom layer. After filling up all boards I will erase what I don’t need and recycle those boards. Oxford has a rather different system then what I am used to. Firstly, they use whiteboards. While it is more comfortable to hold a marker than a piece of chalk, my handwriting is actually slower and more legible with chalk on blackboard. But most importantly, they have an interesting design of whiteboards. The writing surface is essentially a belt. Imagine three whiteboards looped together so that the bottom of board 1 is connected to top of board 2, bottom of board 2 to top of board 3, and bottom of board 3 to top of board 1. Now mount this belt on a frame that is 1 and a half boards tall. So you can scroll up and down. Now put three of these monsters side by side. Read the rest of this entry »

### Snowflakes

It took me two tries to get out of my flat this morning. I really ought to get into the habit of looking out the window in the morning; too often do I open the front door, ready myself to step out, only to turn back to fetch my umbrella. The annoying thing about snow is that I can’t hear it, unlike the pitter-patter of rain.

Somehow or another I ended up looking at Wilson Bentley’s micro-photographs of snow crystals. And a question forms in my mind, “Why are they all so symmetrical?” If all snowflakes were to look alike, then perhaps the dynamics leading to the formation of snow crystal is stable, and the global convergence unto a completely symmetrical pattern would not be surprising. But not all snowflakes look alike. In fact, colloquially we speak of them as each completely different from every other. This implies that the dynamics of snow crystal growth should be at least somewhat sensitive to atmospheric conditions in a local scale (and perhaps to the nucleus that seeds the crystal) so that the seemingly random to-and-fro dance as the snowflake falls from the sky can effect different shapes and branches.

Now, much experimental evidence has gone to show that the formation of ice crystals tends to by catalyzed by impurities. Pure water can be supercooled, in normal pressure conditions, to temperatures below 273 Kelvin. But in these situations a single mite of impurity dropped into the water can cause the entire beaker to freeze over suddenly. Similarly, ice crystals in the upper atmosphere tend to form around impurities: bacterium floating in the air, dust or ash, or perhaps particles introduced artificially. So one may surmise that the fact that all 6 branches of a snowflake grows in the same way because, somehow, the eventual shape of the snowflake is already encoded in the original formation of the central nucleus. Let me try to explain why this hypothesis is not very convincing. I’ll make one a priori assumption, that the growth of a crystal structure is purely local, and not due to some long-range interaction.

To draw an analogy, consider a large group of acrobats. They are trying to bring themselves into a formation around a leader. Disallowing long-range interaction can be thought of requiring that the leader cannot shout out orders to individual troupe members. But we can allow passing of information by short-range interactions, i.e. whispering instructions to the people already in formation. So the leader stands alone at the start. Then he grabs on a few people nearby to form the nucleus. Then he tells each of the people he grabbed a set of instructions on how to grab more people, where to put them, and what instructions to pass on to those people (included in these instructions are instructions to be passed on to the even more remote layer of acrobats and so on). Then if the instructions were passed correctly, a completely ordered pattern will form. But as anyone who has played the game of telephone can testify, in these scenarios some errors will always work its way into the instructions. In the physical case of snowflakes, these are thermodynamical fluctuations. So some irregularities should happen. Now, if the instructions the leaders were trying to pass down were very short and easy to remember, the errors tend not to build up, and the formation will, for the most part, be correct. But keeping the message short has the drawback that the total number of formations one can form is fewer. In the snowflake case, one can imagine somehow each small group of molecules in the snow crystal can encode some fixed amount of information. If the encoding is very redundant (so the total number of shapes is small), then the thermodynamical fluctuations will not be likely to break the symmetries between the arms. But considering the large number of possible shapes of snowflakes, such encoding of information should be taxed to the limit, and small fluctuation (errors in the game of telephone) should be able to lead one arm to look drastically different from the others. One possible way to get around this difficulty would be to use some sort of self similarity principle. But this will suggest the snowflakes are true fractals, which they are not. Read the rest of this entry »

### Straightedge and compass constructions

Classical Euclidean geometry is based on the familiar five postulates. The first two are equivalent to the assumption of the existence of a straightedge; the third gives the existence of a compass. The fourth essentially states that space is locally Euclidean, while the fifth, the infamous parallel postulate, assumes that space is globally Euclidean.

A quick digression: by now, most people are aware of the concept of non-Euclidean geometry as described by Bolyai and Lobachevsky, and independently by Gauss. These types of geometries make also the first four postulates. By simply abandoning the fifth postulate, we can arrive a geometries in which every pair of “lines” intersect (in other words, geometries in which “parallel lines” do not exist), or in which parallel lines are non-unique. (As a side note, one of the primary preoccupations of mathematicians is the existence-and-uniqueness of objects. With our obsession of classifying things, we ask often the question, “Does an object exist with properties X and Y?” and follow it up with, “Is there only one object that possesses properties X and Y?” One may have been subjected to this analysis as early as high school algebra when one is asked to classify whether a system of linear, algebraic equations have solutions, and whether the solution is a point or a line.) With even this relaxation, the space is still locally Euclidean (by the fourth postulate): a small portion of the surface will look “flat” when you blow it up (the same way how from our point of view, the Earth often looks flat). It is, however, possible to also relax the fourth postulate. The easiest example to imagine is by taking a piece of paper and rolling it up into a cone. At the vertex of the cone, going around one full turn is no longer going around 360 degrees. The “total angle” at the vertex will be smaller. So if we define a “right angle” as an angle that is exactly one quarter of the “total angle” at the point, the right angle at the vertex will be smaller! You can experiment with it yourself by drawing four rays from the vertex and then unfurling the piece of paper. Geometries with this kind of structures are often studied under the name of orbifolds.