Natural science colloqium
Kenyon College---September 28, 1989
Today I will speak about one aspect of a very deep and interesting question: What are the fundamental physical limitations of information processing? By "information processing" I mean the sort of operation a computer might perform: storing data and moving it around, doing arithmetic, comparing one number to another, following any sort of algorithm or program. I am not interested in exactly how far our current technology can take us---how many megabytes of RAM we can sqeeze on a chip, or how many gigaflops we can manage with our latest supercomputers. Instead, we will be thinking about the basic and unalterable limits to computation imposed by the laws of physics. Computers must, after all, be physical systems, and any actual computation must be a physical process.
This is a huge and complex business, and there are dozens of specific questions that we might discuss.  There are, for example, some quite profound issues in relativity and quantum mechanics that we could bring up. But in this lecture I want to talk about thermodynamics. That is, I want to talk about questions like: What are the thermodynamic resources (free energy, etc.) nececssary to perform a given computer operation? How much thermodynamic irreversibility is associated with computation?
Like so many of the most interesting questions in science---or in any other branch of knowledge, for that matter---this is a "border issue", a question lying on the frontier between two subjects. In this case, the frontier lies between the mathematics of the abstract theory of computation (known to the cognoscenti as recursive function theory) and the physics of energy and its transformations (thermodynamics). This particular frontier is a strange and unfamiliar territory. The theory of computation, because it is so closely related to mathematical logic, is arguably the most remote and abstract kind of mathematics. Thermodynamics, on the other hand, is so hard-headed and down-to-earth that it is practically engineering.
I will be hard-headed and down-to-earth first and review the thermodynamic facts of life. There are four basic laws of thermodynamics (numbered zero through three), but we only need worry about the middle two. The First Law can be put very simply:
Energy is conserved.
Though energy can be converted from one form to another, it can be neither created nor destroyed. In a closed, isolated system, the total amount of energy remains constant. If we obtain energy from or add energy to a system, the energy content of the system decreases or increases by exactly that amount.
Energy comes in a number of different forms, chief among them being work and heat . Work is energy used to move things around, lift weights, push electrical currents through wires, and so on. Heat is the very disorganized form of energy that resides in the rapid and chaotic motions of the molecules that make up material objects. It is quite easy to convert work into heat---that is exactly what friction does. It is also sometimes possible to convert heat into work. The steam engine is a good example of this: heat generated by burning coal or wood is converted into mechanical work.
The interesting thing is that there is a fundamental asymmetry between the processes "work heat" and "heat work". The first process is always possible, but the second one is not. The conversion of work into heat can be irreversible , so that it is not always possible to reconvert all of the energy back into work. This is the meaning of the famous Second Law of thermodynamics, which is the law of irreversibility. Lord Kelvin put the Second Law this way:
No process can occur whose only final result is to transform heat extracted from a source at uniform temperature into work.
Another version of the Second Law, first due to Clausius, is more familiar:
The entropy of an isolated system never decreases.
"Entropy" is a somewhat elusive thermodynamic quantity that is related to heat transfer. If a system at an absolute temperature T absorbs a small amount of heat Q, then its entropy S changes by an amount
If the entropy of one part of an isolated system decreases, there must be an increase at least as large in the entropy of some other part of the system. The entire universe is often imagined to be an isolated system, so you sometimes hear that "The entropy of the universe is increasing."
Energy in a system that is in a form which can be used for doing work is called free energy , and the Second Law states that the supply of free energy in an isolated system cannot increase. The available free energy in the system decreases over time, as friction and other unavoidable inefficiencies irreversibly convert work into heat.
Before I finish with thermodynamics I want to illustrate these ideas with a standard example. Suppose we want to convert some of the heat of a hot object (at temperature TH) into work. During the course of the process, the hot object experiences a heat transfer QH. Remember, since we are removing heat from the hot object, QH is negative, and the change in entropy is
which is also negative. If all of this heat were converted to work, then the total entropy would decrease in violation of the Second Law. The only hope is to use some of the energy to increase the entropy of some other part of the universe---say a cold object at temperature TH. If we add an amount of heat QC to this object, then its entropy increases by the positive amount
This gives us a relation between the heat extracted from the hot object and the heat added to the cold one. We can use it work out the ideal efficiency of the process---that is, the ratio of the actual work done to the amount of heat that is extracted in order to do it:
What does all this have to do with computers? We want to know whether a physical computation requires free energy. This is tantamount to asking whether the computation process is irreversible, i.e., whether it results in an increase in the entropy of the system (computer and environment). In short: Does information processing necessarily generate entropy?
This question has been given a definitive answer in the last few years by Rolf Landauer and Charles Bennett of IBM Research.  Landauer first showed that:
The only computer operations that must be thermodynamically irreversible are the logically irreversible ones.
This is a beautiful and remarkable fact that is best appreciated by means of an example. Suppose we have a very simple computer with a memory of only two bits. The function of the computer is to add the two bits together and store the result as a binary number. The computer functions in this way:
The problem with this computer, of course, is that it "writes over" the two initial numbers, erasing all record of how it arrived at the outcome of its computation. This problem can easily be fixed by using a different, somewhat larger computer which writes the sum of the two bits in a different part of the memory. Imagine a four-bit computer. The first two memory cells store the input bits, as before, and the second two cells are initially set to zero. The computer adds the two bits and then writes their sum as a binary number in the second pair of memory cells:
A little reflection on the example should convince you that saving both of the input bits is unnecessary. It is only essential to save one ofthem, since the other could be reconstructed by subtraction. The main lesson still holds: energy dissipation can be prevented if and only if a record is kept of the computation that is sufficient to reconstruct the initial "input" state of the computer from the final "output" state.
We can summarize:
The strategy for "reversible computing" is clear: Never throw anything away! This strategy, however, has its price. The record of the computation which is so carefully preserved is probably pretty useless (or else you would have wanted to compute it in the first place). It is the "garbage" produced as a by-product of a useful program. If we never erase it, our memory banks will soon be full of old, useless garbage data that we are keeping just to save ourselves the free energy involved in getting rid of it. If we want to re-use our computer, we will have to erase the garbage eventually.
What, then, is the minimum thermodynamic cost of erasure? For a computer operating at a temperature T, the minimum dissipation is
kB log 2 per bit
Energy: Q kB T log 2 per bit
where kB is Boltzmann's constant, which has a value of 1.38 x 10-23 J/K. Roughly speaking, each bit erased generates an entropy increase of about 10-23 J/K, which at room temperature is an energy dissipation of about 3 x 10-21 J, or about one-fiftieth of an eV.
I must emphasize how far these fundamental limits are from any known computer technology. The solid-state memory chips in your PC, for example, must be constantly "refreshed" with electrical energy to retain their information. Computational steps which are actually logically reversible are done in a dissipative way; even moving information from one memory location to another, which could in principle be accomplished without using up free energy, is done by making a copy in the new location and then erasing the original. Finally, real computers do not generally keep any of their "garbage", so that their programs abound with logically irreversible steps. The consequence is that all current computer technology uses free energy at a rate that is more than a factor of a trillion (1012) above the fundamental thermodynamic limits, and computers therefore generate lots of waste heat as they operate. Getting rid of this waste heat has always been a major engineering consideration in the design of new high-performance computers. Some of the latest supercomputers actually have their working parts immersed in a liquid fluorocarbon coolant to improve the efficiency of waste heat removal.
Biological information processing systems are somewhat better, but still very far from the thermodynamic limits. The transcription of genetic information in DNA, which is information processing that proceeds at the molecular level, still dissipates a free energy of about 20 kBT per bit  ---and that is mere copying, which could be done reversibly.
The reason for this inefficiency in human technology and in nature is that free energy is very abundant. It is much easier to figure out ways of getting more usable energy than it is to construct a less dissipative information processing system. As any engineer knows, it is not always optimal to optimize! Nevertheless, given the engineering difficulties of waste heat management, I think it likely that human-built computers will come much closer to the thermodynamic limits over the next several decades.
I want to return to a remark I made earlier. Energy dissipation is, in effect, the result of erasing the "garbage" along the way. Equivalently, we could save the garbage information (the record of the computation) until the end and erase it all at once. But the amount of garbage data, and hence the amount of free energy that must be used up in erasing it, depends upon how clever we have been in making that record of the computation. It costs free energy to throw bits away, so the trick is to throw away as few bits as possible.
Let me put it another way. The garbage that we do generate may be very repetitive and redundant. Consider the following excerpt from the garbage data from an imagninary program:
That is, the garbage data consist mostly of 0s with a few 1s distributed through it. The preponderance of 0s over 1s is a kind of crude pattern to the data. It is possible to use this rough pattern to encode the data into a more compact form. The following code, for instance, does a good job:
code word meaning
The basic idea here is to code in binary notation the number of 0s between the 1s in the original data. The 64 bits of garbage data above, written in this code, become:
We therefore imagine a special program in our computer called a "trash compactor". The job of the trash compactor is to examine the garbage data generated by the main program, to look for patterns, regularities, etc., and then to use these regularities, if they exist, to encode the garbage in a compressed form. Since the coding is logically reversible, the trash compactor program can perform its function without energy dissipation or the production of additional "coding garbage". After the garbage is compressed, it may be erased at a reduced thermodynamic cost. The more we can compress the garbage before erasing it, the less free energy will be required. The question of the thermodynamic cost of computation has been reduced to: How good can a trash compactor be?
Before I give a straight answer to this, I want to introduce a somewhat abstract model of a computer and its workings. This model, which was invented by A. M. Turing, is easy to visualize and is susceptible to rigorous mathematical analysis. Most of the theory of computation is based upon it. As it turns out, the model is quite general enough for our purposes.
A Turing machine  is a very simple idealized computer. It consists of a small central processing unit with a memory of only a few bits---what is technically known as a "finite-state automaton," since it only has a finite number of different internal memory states. The CPU can read or write binary digits on an infinite computer tape as it moves back and forth according to its own very simple, hard-wired rules. This single tape serves as the Turing machine's input, program, bulk memory, output, and perhaps "garbage" storage. (We can imagine modified Turing machines with more than one tape, but one is all that is really necessary.)
Figure 1: A Turing machine.
Remarkably, it turns out that this extremely simplified computing device is capable of doing any possible computation. More precisely, there is a class of Turing machines called universal Turing machines (UTMs for short) which are capable of simulating the action of any digital computer. If you have a program that runs on your computer, then I can come up with software for my universal Turing machine that lets it simulate your machine and thus run your program. Any computation that can be done on any digital computer can also be done---given enough time and tape---on a UTM. This means, among other things, that all UTMs are roughly equivalent. There is always a simulation program of finite, fixed length that allows UTM-143, say, to run any program designed for UTM-26, and another such simulation program for UTM-26 to run programs designed for UTM-143. Let us stipulate that we have chosen one particular UTM to make use of. Since any other computer we might want to use can if necessary be simulated by our UTM, there should be no problem.
Suppose s is a binary string, that is, a long (but finite) sequence of 1s and 0s. Some examples might be
A = 011011011011011011011011011011...
B = 01010011100001011000011010110010...
C = 0100011011000001010011100101110111...
(The ... indicates that each string is considerably longer than shown.) The length in bits of the string s is denoted |s|.
For a particular string s, we ask the following question: What programs on our UTM have the string s as their output? In other words, which programs will eventually write the string s on the tape and then stop? Let P(s) be the set of programs on the UTM which do this. There is always at least one program that is in P(s); it looks something like
where the string s = 0100101...110010. (Actually, of course, the program is itself a binary string written in the "machine language" of the UTM, not in the high-level language shown above. This is only a schematic of the actual program.) There are many other programs in P(s)---infinitely many, in fact. There are programs that do lots of complicated calculations whose result is s. There are programs which do lots of irrelevent stuff, then print out s in the manner above. There is no limit to how stupidly and inefficiently the programs in P(s) can do their job.
On the other hand, there is a limit to how cleverly and efficiently the programs can do their job. There must be a shortest program in P(s), which we will call s*. (If there are many shortest programs, so much the better; we can choose any of them to be s* .) s* is shortest program that will eventually print out s and then stop. It is not necessarily the fastest program that would do this. s* might contain lots of nested loops, perhaps, and take a very long time to run, while the brute-force "print s" program suggested above would run very quickly. But s* is the shortest set of instructions for the UTM that enables it to construct the sequence s. Thus, in some sense s* is the shortest description of the string s for the computer UTM.
Perhaps you already see where all this is heading. A trash compactor is a device for replacing a string s of garbage data with one of the shorter programs in P(s). The compressed garbage can always be "decoded" by running it as a program on the UTM. A perfect trash compactor would be one that would always choose the minimal program s* to represent s. This minimal program takes the greatest advantage of whatever structure or pattern there may be in s and describes it as succinctly as possible.
Let us consider the "compressability" of the three strings A, B, and C given above. A evidently consists of 011 repeated over and over again. Consequently, it can be described very briefly, and A* is quite short. The string B, which was generated by flipping a coin repeatedly, has no pattern at all and cannot be effectively compressed. The minimal program B* may in fact simply be the "print B" program, which is about the same length as B itself. Finally, we come to C. This string is interesting in that C* is very short, but the structure of the string is buried somewhat more deeply than the structure of A. I will return to C in a moment.
It is reasonable at this point to define the algorithmic complexity K(s) of a string s to be the length of the minimal program in P(s); that is,
K(s) = |s*|
This quantity, which was proposed independently by Chaitin, Kolmogorov, and Solomonoff , is in fact a mathematical measure of randomness. That is, strings that are very ordered and not very random can be generated by short programs, and so K(s) |s| . (An example would be A.) Strings that are truly random without any pattern, like B, cannot be generated by programs that are significantly shorter than the strings themselves: K(s) is approximately |s| .
Note that this idea of randomness has nothing to do with probability. I constructed the sequence C to illustrate this point. The sequence is made by simply writing down all the one-bit sequences in binary order (0,1) followed by the two-bit sequences in order (00,01,10,11), and so on. This is a very simple pattern that can be generated by a very short computer program, so K(C) is small. However, C passes many of the conventional statistical tests for "randomness". In a long section of C, the binary digits 0 and 1 each occur about 50% of the time. Furthermore, if the pairs of adjacent bits are counted, it is found that 00, 01, 10, and 11 each occur about 25% of the time. The various triples each occur about 12.5% of the time, and so on. Statistically, it is a good candidate for randomness; algorithmically, it is very ordered.
The function K(s) (which has also been called algorithmic randomness, algorithmic entropy, and algorithmic information) forms the basis for a branch of mathematics called algorithmic information theory. I would suggest that the basic idea embodied in K(s), that "complexity = length of shortest description", may have useful applications in many fields. In chemistry, for example, the complexity of a polymer might be defined as the algorithmic complexity of the sequence of its structural units. In any case, K(s) at least provides a rigorous way of defining complexity in mathematical terms, and that makes it interesting.
K(s) also has thermodynamic significance. Recall that the thermodynamic cost of a computation is precisely the cost of erasing the useless garbage data that is generated as a by-product. This cost can be reduced by first running the garbage through a trash compactor program, which reversibly compresses it. K(s) is precisely the minimum size to which a string s can be compressed. Therefore, the erasure of the string s will produce an entropy increase S kB log 2 K(s), and an energy dissipation S kB T log 2 K(s), with equality holding for the perfect compression s s*. The more random or complex the garbage, the more costly it is to dispose of it. Highly ordered garbage can be greatly compressed, and can be erased with less dissipation. Of course, this dissipation can be avoided entirely if the compressed garbage is tucked away in permanent storage instead of being erased. If a lot of empty memory is available, the computer might be able to operate for some time before too much garbage piles up. The computer may either use up empty memory for garbage storage or it may dispose of the garbage and use up free energy at a rate of kB T log 2 per bit erased. Empty memory can be used to save free energy, and free energy can be used to clear out sections of memory by erasing them. The two are interconvertible. This leads us to our strangest conclusion so far:
Empty memory is a form of free energy.
But perhaps this is not, after all, so strange. Empty memory corresponds to a very organized state of the computer, and energy is free energy to precisely the degree that it is organized.
If empty memory is a form of free energy, then does "used" memory correspond to unusable energy---i.e., does it contribute to the entropy of the computer? This very proposal was recently put forward by Wojciech Zurek, who suggested that the total entropy of a computer should be defined as
S = Sthermo + kB log 2 K(m) ,
where Sthermo is the ordinary thermodynamic entropy and m is a binary string representing the memory state of the computer.  That is, the entropy has a new, algorithmic part proportional to K(m), the smallest number of bits into which the computer's memory could be compressed. Zurek has supported his formula with a number of fairly persuasive physical arguments. For example, if his extended definition of entropy is adopted, then the thermodynamic cost of information erasure is incorporated into the Second Law---a decrease in algorithmic complexity must be paid for by an increase in ordinary entropy of at least equal magnitude.
Zurek has also used his formula to analyze the old problem of Maxwell's demon. This demon was invented by J. C. Maxwell as a joking illustration of the ideas of statistical mechanics, but it has taken a life of its own.  It is a mischievous little imp that, by carefully observing and altering the motions of individual molecules, is able to decrease the entropy of a system and apparently violate the Second Law. Zurek treated the demon as a computing device and showed that the demon, in the course of its activities, must use up a tremendous amount of free memory to store its information about molecular motions. The increase in the algorithmic complexity K(m) of the demon's memory is large enough to offset any decrease in the ordinary thermodynamic entropy the demon produces. (If the demon decides to erase its memory, of course, then it will generate entropy in its conventional form.)
The thermodynamics of computation leads us to add a new term to entropy. For an ordinary computer, this extra term is much smaller than the conventional entropy; but for a computer operating near the thermodynamic limits of computation, the extra term is crucial. It is also, as we will see next, very peculiar.
Something strange arises when we ask how to calculate K(s) for a string s. Suppose that we had a program called FINDK on our universal Turing machine that did exactly that: given any s as input, FINDK would calculate for a while and then print out K(s) in binary notation. FINDK would doubtless be very complex---let us say, for example, that FINDK is 109 bits long. Nothing prevents us from using FINDK as a subroutine of a slightly larger program called BIGGUY. The first part of BIGGUY is a short program that generates every binary sequence in order, beginning with shorter ones and then increasing in length: 0, 1, 00, 01, 10, 11, 000, 001, etc. (The program that generated C did just this.) When each sequence is generated, it is sent to the FINDK subroutine and its algorithmic complexity is determined. The program BIGGUY stops when it finds a string s of, say, K(s) = 1012. That is, BIGGUY looks like
START: GENERATE NEXT s
EVALUATE K(s) USING FINDK
IF K(s) < 10^12 THEN GOTO START
BIGGUY is not very much longer than FINDK itself, about 109 bits long, since none of the other parts of it are very complicated. BIGGUY runs for a very long time and then prints out a string s and halts. This string has the property that its algorithmic complexity K(s), as evaluated by FINDK, is much larger than the length of the program BIGGUY! This is a contradiction, since by definition no program shorter than K(s) can print out s.
To summarize: If a program FINDK to calculate K(s) exists, we could imbed it as a subroutine in a program BIGGUY, which would perform the impossible task of generating a string of greater complexity than the length of BIGGUY. Since this is a contradiction, no such program FINDK can exist. Or, to put it more elegantly,
Algorithmic complexity is an uncomputable function.
Mathematically, it is easy to see that the number of different computer programs, and hence the number of computable functions, is countably infinite. Since the number of integer functions is itself un countably infinite, it is clear that there must be many functions which cannot be computed by any computer program. Nevertheless, it is somewhat unusual to be face-to-face with one and even more unsettling to run across it in a physical context. Remember, according to Zurek K(s) contributes a part of the total entropy of a computer. The implication is that this entropy itself is an uncomputable function.
Consider a trash compactor program like the ones we have discussed. If the program were a perfect one, then it would replace every string s with the miminal program s* that generates s on the UTM. But if this could be done, then we could find the length of the program S* and compute K(s). It follows that
There are no perfect trash compactors.
A trash compactor program scans the string s looking for patterns and structure that can be used to compress the description of s on the UTM. No program can possibly be written that can identify such patterns perfectly. For any trash compactor, there will always be strings that it cannot compress perfectly.
In the context of the thermodynamic cost of computation, this means that there will always be situations in which the "garbage" is erased somewhat inefficiently, so that more free energy is dissipated as heat than is absolutely necessary. What is the source of this additional dissipation? The source is the uncomputability of the function K(s).
Some of the meaning of this uncomputability can be uncovered by imagining how we might try to design a perfect trash compactor program---that is, one that takes a string s and tries to find the minimal program s* that prints out s and then stops. Our strategy runs as follows: First, the trash compactor generates every possible UTM program shorter than and including the "print s" program. We will try to determine which of these programs are in P(s) and then find the shortest program in that set. We start all of the programs running for a very long time, say a hundred million program steps apiece. After this, the programs will fall into three categories:
We would like to say that s* is the shortest program in category 2. The problem resides in category 3. Some category 3 programs will get caught in an infinite loop or a runaway process and will never stop at all. Programs which never stop are certainly not in P(s). The other category 3 programs, however, eventually will stop, and some of them could print s before they do. Category 3 might just contain the minimal program s* in P(s) that we are seeking.
If we ran all the programs long enough, eventually all the "stopping programs" would have stopped already and only the "infinite-loop" programs would be left in category 3; we would then know that the shortest program in category 2 is the one we are looking for. But how would we know if we had gotten to this point? The only way that we could be absolutely certain that only the non-halting programs remained in category 3 would be if we could somehow recognize non-halting programs. We need a "halting-checker" subroutine in our trash compactor that would do this job; we could then throw out from the beginning all non-halting programs (since they cannot be in P(s) anyway) and run all of the halting programs until they finally stopped.
If we could construct a "halting-checker" program, we could use it to construct the program FINDK that calculates the algorithmic complexity. But we know that no such FINDK program exists; therefore, we know that no "halting-checker" program exists. We have proven a famous result in the theory of computation! As Turing first showed in 1937,  the "halting problem" is an unsolvable one. That is, there is no computer program which can tell with certainty whether or not other programs will eventually halt. Consequently, we cannot use a halting-checker subroutine in our trash compactor program. Any actual subroutine we wrote as a halting-checker would not work perfectly. It would either fail to identify some non-halting programs (in which case the trash compactor would itself get caught in an "infinite loop"), or it would fail to identify some programs that did halt (one of which might be s*).
We have shown that the uncomputability of K(s) implies the unsolvability of the halting problem. It turns out that the converse is also true. That is, if we could construct a program FINDK, we would be able to use it to construct a "halting-checker" program.
Moreover, the unsolvability of the halting problem is equivalent to an even more famous result in mathematical logic: Godel's Incompleteness Theorem.  Godel's theorem essentially states that any formal mathematical system of sufficient complexity (complex enough to contain the positive integers and their properties), must contain statements whose truth cannot be determined from the axioms of the system. Such "undecidable propositions" cannot be proved either true or false. For example, the well-known Continuum Hypothesis in transfinite numbers is known to be undecidable within the context of ordinary set theory.  It is consistent with the axioms of set theory either to assert the Continuum Hypothesis or to deny it. Godel's demonstration of the existence of undecidable propositions has had a profound impact upon mathematics and logic, but not, until now, upon physics.
The "Godelian" undecidability of formal systems is thus at the root of the uncomputability of K(s). From this uncomputability it follows that any data compression scheme ("trash compactor") is imperfect, and this gives rise to additional dissipation in the course of some computations. This effect has been called "logical friction" by Zurek, but it might better be termed "Godel friction". Godel friction is the physical dissipation of energy in a computation process due to the logical incompleteness of formal mathematical systems.
I have said so much about things that we cannot do that perhaps I should list a few of the things that we can do, both for clarity and for reassurance.
I want to emphasize number 5. Zurek and others have used algorithmic information theory to shed an interesting new light on thermodynamics and statistical mechanics. I do not believe that we have come to the end of the insights that can be obtained in this way.
So here is the story in a nutshell: The thermodynamic efficiency of a computer depends on its effectiveness in compressing useless and unwanted data before erasing it; but because of Godel's incompleteness theorem, no computer can compress data perfectly. An abstruse theorem of mathematical logic affects the amount of waste heat produced by a computer!
This is a very strange story, one of the strangest I know. There are quite a few questions that remain unanswered. For example, is there any way to make an estimate on the magnitude of the Godel friction of a given trash compactor program? Since Godel friction arises because subtly-ordered strings are mistaken for random ones, and since most strings of a given length are nearly random, Godel friction should be a very small effect except in very rare cases. How small? How rare? This whole frontier country between computation and physics is still largely unexplored.
Contact: Benjamin Schumacher, Dept. of Physics. Updated 26-July-2001