Digital computing and catastrophic failures

by Willie Wong

I just read a wonderful article on Discover magazine. The article centers around Kwabena Boahen (and other members of the school of Carver Mead) in creating electronic circuitry modeled more after the human brain. The main claim is that these types of neurocircuits have the potential in significantly lowering the power consumption for computing. If the claim were correct, though, it will imply there are certain nontrivial relationship between the voltage applied to a transistor and the noise experienced.

The idea, I think, if I understood correctly just from the lay explanation, is a trade-off between error rates versus power. Let us consider the completely simplified and idealized model given by the following. A signal is sent in at voltage V_0. The line introduces thermal noise in the form of a Gaussian distribution. So the signal that comes out at the other end has a distribution \phi_{1,V_0}(V), where the Gaussian family \phi_{\sigma,\mu} is defined as

Definition 1 (Noisy signal)
\displaystyle \phi_{\sigma,\mu}(x) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-(\frac{x}{\sigma} - \sigma\mu)^2}

(Note: our definition is not the standard definition, in particular our Gaussian is centered at \sigma^2\mu! This definition makes calculations later simpler, as we shall see.)

In other words, for a given signal V_0, the output is a random variable \phi_{1,V_0}. Let us suppose we have a sensitive instrument to measure the output in analog. Given one reading, we will interpret a positive voltage as the initial signal being |V_0|, and a negative voltage as it being -|V_0|. So there will be times with a positive signal was sent and our reading, due to the thermal noise, indicates that a negative signal was received. In other words, error!

To reduce the error, there are two things we can do: one is to increase in the input voltage. Assuming the thermal noise does not depend on the input voltage, the increase will shift the center of our Gaussian signal further away from the origin, which in turn reduces the error (since an erroneous signal requires the thermal noise to overcome the original signal, if the signal is bigger, the noise spike has to be bigger too. But the probability for a large enough noise spike decreases rapidly as the required size gets bigger). The other choice is to send the same signal simultaneously through several pathways, and average their readings. By averaging multiple noisy readings we can get a more accurate measurement.

Mathematically choice one means that we shift from V_0 to V'_0, and the corresponding signal readout follows a new distribution of \phi_{1,V'_0}. What about choice two? From basic probability theory, we learn that the sum distribution of independent random variables X_1, \ldots, X_n is obtained by the convolution of their corresponding distributions P_1*P_2*\cdots*P_n. So let us do a computation:

\displaystyle \begin{array}{rcl} \phi_{\sigma,\mu}*\phi_{\tau,\mu}(x) & = & \frac{1}{2\pi\sigma\tau}\int \exp[ -( \frac{x-y}{\sigma} - \sigma\mu)^2 - (\frac{y}{\tau} - \tau\mu)^2 ] dy \\ & = &  \frac{1}{2\pi\sigma\tau}\int \exp[ -( \frac{x^2}{\sigma^2} + \frac{y^2}{\sigma^2} - \frac{2xy}{\sigma^2} + \sigma^2\mu^2 - 2x\mu + 2y\mu) - (\frac{y^2}{\tau^2} - 2y\mu + \tau^2\mu^2)] dy \\ & = & \frac{1}{2\pi\sigma\tau}\int \exp[-(y\sqrt{\sigma^{-2}+\tau^{-2}} - x\sigma^{-1}(1 + \sigma^2/\tau^2)^{-1/2})^2 - ( x / \sqrt{\sigma^2 + \tau^2} - \sqrt{\sigma^2 + \tau^2} \mu)^2] dy \\ & = & \phi_{\sqrt{\sigma^2 + \tau^2},\mu}(x) \end{array}

(Note: if one uses the Fourier transform, the above equality can be proven in just one simple line…) This implies that sending the same signal down N channels will result in a total signal readout on the other end with probability \phi_{\sqrt{N},V_0}.

Now, let us evaluate the error probability. Assuming that the initial input has positive voltage, the error rate due to the Gaussian noise can be found by integrating the probability density from minus infinity to 0. So if one reads a signal probability of \phi_{\sigma,\mu}, the error rate should be

\int_{-\infty}^0 \phi_{\sigma,\mu}(x) dx = \int_{-\infty}^0 \frac{1}{\sigma}\phi_{1,0}(\frac{x - \sigma^2\mu}{\sigma})dx = \int_{-\infty}^{-\sigma^2\mu}\phi_{1,0}(x/\sigma) dx / \sigma = \int_{-\infty}^{-\sigma\mu}\phi_{1,0}(y)dy

What does this tell us? Under our idealized situation, this shows that doubling the input voltage has the same improvement to the error rate as quadrupling the number of transmissions. Now, under the assumption the power consumed by an electrical appliance is proportional the the square of the voltage (this is true for the resistor), this means that both options will give the same power consumption for the same error rate. One might have expected this result based on the principle of “there’s no free lunch”.

Obviously, this also indicates that our simple model is insufficient to describe the phenomenon mentioned in the article. Two ways that I can imagine this happening is that (1) the power consumption is “worse” than squared of the voltage due to non-ideal behavior of the physical systems and (2) the real thermal noise is not Gaussian, and has a nontrivial dependence on the input voltage. Both, I think, are physically reasonable, yet neither, I think, is sufficient to explain the massive power saving it is claimed in the article.

But this article got me thinking more than just the physics. Another point that was brought up is the representation of data digitally. Consider a scratchy phonograph. This is an analog representation of data. Even when the phonograph is old and scratched, we can still play it back and mostly hear what is going on. If the phonograph were really scratched, we may not be able to hear the parts that are quieter on the original recording, but the loud parts will still come through. The point I want to make is that in an analog recording, increasing levels of noise makes the lower amplitude portion of the original signal less visible, but the most significant high amplitude signals will not be obscured as much. But now consider digital storage of data. The information is stored in sequences of bits. Each individual bit, however, can represent a value of any power of two. For example, in a modern 64-bit computer, each unit of storage (this is not strictly true, but let’s pretend that it is) consists of a 64 slots, each taking the value of either 1 or 0. But each individual slot among the 64 represents a different weight! The lowest (or the right-most when you write it out) slot represent a value of 2 to the 0 power, or the value 1. The highest (or the left-most) represent a value of 2 to the 63rd power, which is bigger than 8 billion billion (American notion of billion, or 9 zeros). The problem now is that each of the bits is on equal footing: an error is as lightly to affect the 2 to the 63rd power bit as the 2 to the zeroth power bit. Now imagine the number stored represent your savings in your bank account: this means that a thermal error that changes the value of one single bit could as easily change your savings by 1 cent as by the US military budget.

This is partially why it is so important in modern computing that the error rates must be limited as low as possible. Unlike a physical/analog process where noise can be tolerated because the signal degrades gracefully (meaning that the large-scale structure is maintained: an analog television broadcast where the noise is high can still be watched even though there is a bit of snowcrash), a digital process without redundancy with high levels of noise can be problematic because the noise is allowed to disturb the large-scale structure (in digital TV broadcasts, the signal basically has three qualities: clear, mosaic-like blocky, or lost). An error is as likely to be insignificant as catastrophic.