Bubbles Bad; Ripples Good

… Data aequatione quotcunque fluentes quantitates involvente fluxiones invenire et vice versa …

Category: general mathematics

An optimization problem: variation

Examining the theorem proven in the previous post, we are led naturally to ask whether there are higher order generalizations.

Question: Let f \in C^{k}([-1,1]) with f^{(k)} > 0. What can we say about the minimizer of C = \int_{-1}^1 |f(x) - p(x)|~\mathrm{d}x where p ranges over degree k-1 polynomials?

It is pretty easy to see that we expect p to intersect f at the maximum number of points, which is k. We label those points x_1, \ldots, x_k and call x_0 = -1 and x_{k+1}= 1. Then the cost function can be written as
\displaystyle C = \sum_{j = 0}^k (-1)^j \int_{x_j}^{x_{j+1}} f(x) - p(x; x_1, \ldots, x_k) ~\mathrm{d}x
Since we know that values of p at the points x_1, \ldots, x_k we can write down the interpolation polynomial explicitly using Sylvester’s formula:
\displaystyle p = \sum_{j = 1}^k \left( \prod_{1 \leq m \leq k, m\neq j} \frac{x - x_m}{x_j - x_m} \right) f(x_j) = \sum L_j(x; x_1, \ldots, x_k) f(x_j)

The partial derivatives are now
\displaystyle \partial_n C = \sum_{j = 0}^k (-1)^{j+1} \int_{x_j}^{x_{j+1}} \partial_n p(x; x_1, \ldots, x_k) ~\mathrm{d}x
It remains to compute \partial_n p for 1 \leq n \leq k. We observe that when n \neq j
\displaystyle \partial_n L_j = - \frac{1}{x - x_n} L_j + \frac{1}{x_j - x_n} L_j
and also
\displaystyle \partial_n L_n = - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n
So
\displaystyle \partial_n p =  \sum_{j \neq n} \frac{x-x_j}{(x_j - x_n)(x - x_n)} L_j f(x_j) + L_n f'(x_n) - \left( \sum_{1\leq m \leq k, m\neq n} \frac{1}{x_n - x_m} \right) L_n f(x_n)
Now, we observe that
\displaystyle \frac{x - x_j}{x - x_n} L_j =   - \left( \prod_{m \neq n,j} \frac{x_n - x_m}{x_j - x_m} \right)  L_n
so after some computation we arrive at
\displaystyle \partial_n p = L_n(x) \cdot \left[ f'(x_n) - \sum_{j \neq n} \frac{1}{x_j - x_n} \left(\left( \prod_{m \neq j,n}\frac{x_n - x_m}{x_j - x_m}\right)f(x_j) - f(x_n) \right)\right]
which we can further simplify to
\displaystyle \partial_n p = L_n(x) \cdot \left( f'(x_n) - p'(x_n)\right)
Now, since f and p cross transversely at x_n, the difference of their derivatives is non-zero. (This harks back to our assumption that f^{(k)} > 0.) So we are down, as in the case where k = 2, to equations entirely independent of f.

More precisely, we see that the stationarity condition becomes the choice of x_1, \ldots, x_k such that the integrals
\displaystyle \sum_{j = 0}^k (-1)^{j} \int_{x_j}^{x_{j+1}} L_n(x)  ~\mathrm{d}x = 0
for each n. Since L_n form a basis for the polynomials of degree at most k-1, we have that the function
\chi(x) = (-1)^j \qquad x \in (x_j, x_{j+1})
is L^2 orthogonal to every polynomial of degree at most k-1. So in particular the x_j are solutions to the following system of equations
x_0 = -1, \qquad x_{k+1} = 1
\sum_{j = 0}^k (-1)^j \left[ x_{j+1}^d - x_{j}^d \right] = 0 \qquad \forall d \in \{1, \ldots, k\}

From symmetry considerations we have that x_j = - x_{k+1 - j}. This also kills about half of the equations. For the low k we have

  1. \{ 0\}
  2. \{ -1/2, 1/2\}
  3. \{-1/2, 0, 1/2\}
  4. \{ (\pm 1 \pm \sqrt{5})/4 \}
  5. \{ 0, \pm\frac12, \pm \frac{\sqrt{3}}2 \}

An optimization problem: theme

Let’s start simple:

Question 1: What is the linear function \ell(x) that minimizes the integral \int_{-1}^1 |x^2  + x - \ell(x)| ~\mathrm{d}x? In other words, what is the best linear approximation of x^2 + x in the L^1([-1,1]) sense?

This is something that can in principle be solved by a high schooler with some calculus training. Here’s how one solution may go:

Solution: All linear functions take the form \ell(x) = ax + b. The integrand is equal to x^2 + x - \ell(x) when x^2 + x \geq \ell(x), and \ell(x) - x - x^2 otherwise. So we need to find the points of intersection. This requires solve x^2 + x - ax - b = 0, which we can solve by the quadratic formula. In the case where x^2 + x - a x - b is signed, we see that changing b we can make the integrand strictly smaller, and hence we cannot attain a minimizer. So we know that at the minimizer there must exist at least one root.

Consider the case where there is only one root in the interval (counted with multiplicity), call the root x_0. We have that the integral to minimize is equal to
\displaystyle \left| \int_{-1}^{x_0} x^2 + x - ax - b~\mathrm{d}x \right| + \left| \int_{x_0}^1 x^2 + x - a x - b ~\mathrm{d}x \right|
each part of which can be computed explicitly to obtain
\displaystyle \left| \frac13 x_0^3  + \frac13 + \frac{1-a}{2} x_0^2 - \frac{1-a}{2} - b x_0 - b \right| + \left| \frac13 - \frac13 x_0^3 + \frac{1-a}{2} - \frac{1-a}{2} x_0^2 - b + b x_0\right|
Since we know that the two terms comes from integrands with different signs, we can combine to get
\displaystyle \left| \frac23 x_0^3 + (1-a) x_0^2 - (1-a) - 2b x_0 \right|
as the integrand. Now, we cannot just take the partial derivatives of the above expression with respect to a,b and set that to zero and see what we get: the root x_0 depends also on the parameters. So what we would do then is to plug in x_0 using the expression derived from the quadratic formula, x_0 = \frac{1}{2} \left( a - 1 \pm \sqrt{ (1-a)^2 + 4b}\right), and then take the partial derivatives. Before that, though, we can simplify a little bit: since x_0^3 + (1-a) x_0^2 - b x_0 = 0 from the constraint, the quantity to minimize is now
\displaystyle \left| - \frac13 x_0^3 - (1-a) - b x_0 \right|
A long computation taking the \partial_b now shows that necessarily x_0 = 0, which implies that b = 0. But for the range of a where there is only one root in the interval, the quantity does not achieve a minimum. (The formal minimizer happens at a = 1 but we see for this case the integrand of the original cost function is signed.

So we are down to the case where there are two roots in the interval. Now we call the roots x_+ and x_-, and split the integral into
\displaystyle \left| \int_{-1}^{x_-} x^2 + x - ax - b~\mathrm{d}x  - \int_{x_-}^{x_+} x^2 + x - ax - b~\mathrm{d}x + \int_{x_+}^1 x^2 + x - ax - b~\mathrm{d}x \right|
and proceed as before. (The remainder of the proof is omitted; the reader is encouraged to carry out this computation out by hand to see how tedious it is.) Read the rest of this entry »