A better estimate of Kempner’s series

The Kempner series recently regained some notoriety due to a Saturday Morning Breakfast Cereal comic (the last panel). The observation first appeared in a 1914 American Mathematical Monthly article, in which it was shown that the series consisting of the usual harmonic series

$\displaystyle \sum_{n = 1}^{\infty} \frac{1}{n}$

but with all the terms, whose decimal expansion includes the digit ‘9’, removed, in fact converges to some number below 80. The original proof is given in the Wikipedia article linked above, so I will not repeat it. But to make it easier to see the idea: let us first think about the case where the number is expressed in base 2. In base 2, all the positive integers has the leading binary bit being 1 (since it cannot be zero). Therefore there are no binary positive numbers without the bit ‘1’ in its expansion. So the corresponding series converges trivially to zero. How about the case of the bit ‘0’? The only binary numbers without any ‘0’ bits are

$1, 3 = (11)_2, 7 = (111)_2, 15 = (1111)_2, \ldots, 2^n - 1$.

So the corresponding series actually becomes

$\displaystyle \sum_{n = 1}^\infty \frac{1}{2^n - 1} \leq \sum_{n = 1}^\infty \frac{1}{2^{n-1}} = 2$

So somewhere from the heavily divergent harmonic series, we pick up a rapidly converging geometric series. So what’s at work here? Among all the n-bit binary numbers, exactly 1 has all bits not being 0. So the density of these kinds of numbers decays rather quickly: in base 2, there are $2^{n-1}$ numbers that are exactly n-bit long. So if a number $x$ has a binary representation that is exactly n bits long (which means that $2^{n} \leq x < 2^{n+1}$), the chances that it is one of the special type of numbers is $\frac{1}{2^{n-1}} \approx \frac{2}{x}$. This probability we can treat then as a density: replacing the discrete sum $\sum \frac{1}{n}$ by the integral $\int \frac{1}{x}\mathrm{d}x$ (calculus students may recognize this as the germ of the “integral test”) and replacing the $\mathrm{d}x$ by the density $\frac{2}{x} \mathrm{d}x$, we get the estimate

$\displaystyle \text{Binary Kempner series} \approx \int_1^\infty \frac{2}{x^2} = 2$.

Doing the same thing with the original Kempner series gives that the chances a n-digit number does not contain the digit nine to be

$\displaystyle \left(\frac89\right)\left(\frac9{10}\right)^{n-1} \approx \left( \frac{9}{10}\right)^{n}$

The length of the decimal expansion of a natural number $x$ is basically $1 + \log x$. So the density we are interested in becomes

$\displaystyle \left( \frac{9}{10}\right)^{1+\log x} ~\mathrm{d}x$

From this we can do an integral estimate

$\displaystyle \text{Kempner series} \approx 0.9 \times \int_1^\infty \left( \frac{9}{100}\right)^{\log x} ~\mathrm{d}x$

The integral can be computed using that

$\displaystyle a^{\log b} = b^{\log a}$

to get

$\displaystyle 0.9 \times \int_1^{\infty} \left( \frac{9}{100}\right)^{\log x} ~\mathrm{d}X = 0.9\times \int_1^\infty x^{\log 9 - 2} ~\mathrm{d}x = \frac{0.9}{1 - \log 9} \approx 19.66$

Notice that this estimate is much closer to the currently known value of roughly 22.92 than to the original upper bound of 80 computed by Kempner.

Kempner’s estimate is a heavy overestimate because he performed a summation replacing every n-digit long number that does not contain the digit 9 by $10^{n-1}$; this number can be many times (up to 9) times smaller than the original number. Our estimate is low because among the n-digit long numbers, the numbers that do not contain the digit 9 are not evenly distributed: they tend to crowd in the front rather than in the back (in fact, we do not allow them to crowd in the back because none of the numbers that start with the digit 9 is admissible). So if in the original question we had asked for numbers that do not contain the digit 1, then our computation will give an overestimate instead since these numbers tend to crowd to the back.