**Benjamin Schumacher**

Natural science colloqium

Kenyon College---September 28, 1989

Today I will speak about one aspect of a very deep and interesting question:
*What are the fundamental physical limitations of information processing?*
By "information processing" I mean the sort of operation a computer might
perform: storing data and moving it around, doing arithmetic, comparing one
number to another, following any sort of algorithm or program. I am not interested
in exactly how far our current technology can take us---how many megabytes
of RAM we can sqeeze on a chip, or how many gigaflops we can manage with our
latest supercomputers. Instead, we will be thinking about the basic and unalterable
limits to computation imposed by the laws of physics. Computers must, after
all, be physical systems, and any actual computation must be a physical process.

This is a huge and complex business, and there are dozens of specific questions
that we might discuss. [1] There are, for example, some
quite profound issues in relativity and quantum mechanics that we could bring
up. But in this lecture I want to talk about thermodynamics. That is, I want
to talk about questions like: *What are the thermodynamic resources (free
energy, etc.) nececssary to perform a given computer operation? How much
thermodynamic irreversibility is associated with computation?*

Like so many of the most interesting questions in science---or in any other branch of knowledge, for that matter---this is a "border issue", a question lying on the frontier between two subjects. In this case, the frontier lies between the mathematics of the abstract theory of computation (known to the cognoscenti as recursive function theory) and the physics of energy and its transformations (thermodynamics). This particular frontier is a strange and unfamiliar territory. The theory of computation, because it is so closely related to mathematical logic, is arguably the most remote and abstract kind of mathematics. Thermodynamics, on the other hand, is so hard-headed and down-to-earth that it is practically engineering.

I will be hard-headed and down-to-earth first and review the thermodynamic facts of life. There are four basic laws of thermodynamics (numbered zero through three), but we only need worry about the middle two. The First Law can be put very simply:

Energy is conserved.

Though energy can be converted from one form to another, it can be neither created nor destroyed. In a closed, isolated system, the total amount of energy remains constant. If we obtain energy from or add energy to a system, the energy content of the system decreases or increases by exactly that amount.

Energy comes in a number of different forms, chief among them being *work
*and *heat *. Work is energy used to move things around, lift weights,
push electrical currents through wires, and so on. Heat is the very disorganized
form of energy that resides in the rapid and chaotic motions of the molecules
that make up material objects. It is quite easy to convert work into heat---that
is exactly what friction does. It is also sometimes possible to convert heat
into work. The steam engine is a good example of this: heat generated by burning
coal or wood is converted into mechanical work.

The interesting thing is that there is a fundamental *asymmetry *between
the processes "work
heat" and "heat
work". The first process is always possible, but the second one is not.
The conversion of work into heat can be *irreversible *, so that it is
not always possible to reconvert all of the energy back into work. This is
the meaning of the famous Second Law of thermodynamics, which is the law of
irreversibility. Lord Kelvin put the Second Law this way:

No process can occur whose only final result is to transform heat extracted from a source at uniform temperature into work.

Another version of the Second Law, first due to Clausius, is more familiar:

The entropy of an isolated system never decreases.

"Entropy" is a somewhat elusive thermodynamic quantity that is related to heat transfer. If a system at an absolute temperature T absorbs a small amount of heat Q, then its entropy S changes by an amount

If the entropy of one part of an isolated system decreases, there must be an increase at least as large in the entropy of some other part of the system. The entire universe is often imagined to be an isolated system, so you sometimes hear that "The entropy of the universe is increasing."

Energy in a system that is in a form which can be used for doing work is
called *free energy *, and the Second Law states that the supply of
free energy in an isolated system cannot increase. The available free energy
in the system decreases over time, as friction and other unavoidable inefficiencies
irreversibly convert work into heat.

Before I finish with thermodynamics I want to illustrate these ideas with
a standard example. Suppose we want to convert some of the heat of a hot object
(at temperature T_{H}) into work. During the course of the process,
the hot object experiences a heat transfer
Q_{H}. Remember, since we are *removing *heat from the hot
object,
Q_{H} is negative, and the change in entropy is

which is also negative. If all of this heat were converted to work, then
the total entropy would decrease in violation of the Second Law. The only
hope is to use some of the energy to *increase *the entropy of some
other part of the universe---say a cold object at temperature T_{H}.
If we add an amount of heat
Q_{C} to this object, then its entropy increases by the positive
amount

This gives us a relation between the heat extracted from the hot object and the heat added to the cold one. We can use it work out the ideal efficiency of the process---that is, the ratio of the actual work done to the amount of heat that is extracted in order to do it:

What does all this have to do with computers? We want to know whether a
physical computation requires free energy. This is tantamount to asking whether
the computation process is irreversible, i.e., whether it results in an increase
in the entropy of the system (computer and environment). In short: *Does
information processing necessarily generate entropy?*

This question has been given a definitive answer in the last few years by Rolf Landauer and Charles Bennett of IBM Research. [3] Landauer first showed that:

The only computer operations that must be thermodynamically irreversible are the logically irreversible ones.

This is a beautiful and remarkable fact that is best appreciated by means of an example. Suppose we have a very simple computer with a memory of only two bits. The function of the computer is to add the two bits together and store the result as a binary number. The computer functions in this way:

The problem with this computer, of course, is that it "writes over" the two initial numbers, erasing all record of how it arrived at the outcome of its computation. This problem can easily be fixed by using a different, somewhat larger computer which writes the sum of the two bits in a different part of the memory. Imagine a four-bit computer. The first two memory cells store the input bits, as before, and the second two cells are initially set to zero. The computer adds the two bits and then writes their sum as a binary number in the second pair of memory cells:

A little reflection on the example should convince you that saving *both*
of the input bits is unnecessary. It is only essential to save *one*
ofthem, since the other could be reconstructed by subtraction. The main lesson
still holds: energy dissipation can be prevented if and only if a record is
kept of the computation that is sufficient to reconstruct the initial "input"
state of the computer from the final "output" state.

We can summarize:

- Thermodynamic irreversibility must occur in a computation only if the computation process is logically irreversible. In such a situation, entropy is produced and free energy is dissipated as heat.
- Logical irreversibility results from the
*erasure*of a sufficient record of the computation.*Erasure is the only inherently irreversible computer operation.*Any other computation can be made reversible by keeping a careful record of the computer's memory states. - It may be possible by clever design to shorten somewhat the necessary length of the computation record. It is only required that this record be sufficient for a reconstruction of the initial state of the computer.

The strategy for "reversible computing" is clear: *Never throw anything
away!* This strategy, however, has its price. The record of the computation
which is so carefully preserved is probably pretty useless (or else you would
have wanted to compute it in the first place). It is the "garbage" produced
as a by-product of a useful program. If we never erase it, our memory banks
will soon be full of old, useless garbage data that we are keeping just to
save ourselves the free energy involved in getting rid of it. If we want to
re-use our computer, we will have to erase the garbage eventually.

What, then, is the minimum thermodynamic cost of erasure? For a computer operating at a temperature T, the minimum dissipation is

Entropy:
S
k_{B} log 2 *per bit*

Energy:
Q
k_{B} T log 2 *per bit*

where k_{B} is Boltzmann's constant, which has a value of 1.38
x 10^{-23} J/K. Roughly speaking, each bit erased generates an entropy
increase of about 10^{-23} J/K, which at room temperature is an energy
dissipation of about 3 x 10^{-21} J, or about one-fiftieth of an
eV.

I must emphasize how far these fundamental limits are from any known computer
technology. The solid-state memory chips in your PC, for example, must be
constantly "refreshed" with electrical energy to retain their information.
Computational steps which are actually logically reversible are done in a
dissipative way; even moving information from one memory location to another,
which could in principle be accomplished without using up free energy, is
done by making a copy in the new location and then erasing the original. Finally,
real computers do not generally keep any of their "garbage", so that their
programs abound with logically irreversible steps. The consequence is that
all current computer technology uses free energy at a rate that is more than
a factor of a trillion (10^{12}) above the fundamental thermodynamic
limits, and computers therefore generate lots of waste heat as they operate.
Getting rid of this waste heat has always been a major engineering consideration
in the design of new high-performance computers. Some of the latest supercomputers
actually have their working parts immersed in a liquid fluorocarbon coolant
to improve the efficiency of waste heat removal.

Biological information processing systems are somewhat better, but still
very far from the thermodynamic limits. The transcription of genetic information
in DNA, which is information processing that proceeds at the molecular level,
still dissipates a free energy of about 20 k_{B}T per bit [4] ---and that is mere copying, which could be done reversibly.

The reason for this inefficiency in human technology and in nature is that free energy is very abundant. It is much easier to figure out ways of getting more usable energy than it is to construct a less dissipative information processing system. As any engineer knows, it is not always optimal to optimize! Nevertheless, given the engineering difficulties of waste heat management, I think it likely that human-built computers will come much closer to the thermodynamic limits over the next several decades.

I want to return to a remark I made earlier. Energy dissipation is, in
effect, the result of erasing the "garbage" along the way. Equivalently, we
could save the garbage information (the record of the computation) until the
end and erase it all at once. But the *amount* of garbage data, and
hence the amount of free energy that must be used up in erasing it, depends
upon how clever we have been in making that record of the computation. It
costs free energy to throw bits away, so the trick is to throw away as few
bits as possible.

Let me put it another way. The garbage that we do generate may be very repetitive and redundant. Consider the following excerpt from the garbage data from an imagninary program:

0010000100000100000010000100100101000001010001100000010000001001

That is, the garbage data consist mostly of 0s with a few 1s distributed
through it. The preponderance of 0s over 1s is a kind of crude pattern to
the data. It is possible to use this rough pattern to *encode* the data
into a more compact form. The following code, for instance, does a good
job:

code word meaning

000 1

001 01

010 001

011 0001

100 00001

101 000001

110 0000001

111 0000000

The basic idea here is to code in binary notation the number of 0s between the 1s in the original data. The 64 bits of garbage data above, written in this code, become:

We therefore imagine a special program in our computer called a "trash
compactor". The job of the trash compactor is to examine the garbage data
generated by the main program, to look for patterns, regularities, etc., and
then to use these regularities, if they exist, to encode the garbage in a
compressed form. Since the coding is logically reversible, the trash compactor
program can perform its function without energy dissipation or the production
of additional "coding garbage". After the garbage is compressed, it may be
erased at a reduced thermodynamic cost. The more we can compress the garbage
before erasing it, the less free energy will be required. The question of
the thermodynamic cost of computation has been reduced to: *How good can
a trash compactor be?*

Before I give a straight answer to this, I want to introduce a somewhat abstract model of a computer and its workings. This model, which was invented by A. M. Turing, is easy to visualize and is susceptible to rigorous mathematical analysis. Most of the theory of computation is based upon it. As it turns out, the model is quite general enough for our purposes.

A *Turing machine* [5] is a very simple idealized
computer. It consists of a small central processing unit with a memory of
only a few bits---what is technically known as a "finite-state automaton,"
since it only has a finite number of different internal memory states. The
CPU can read or write binary digits on an infinite computer tape as it moves
back and forth according to its own very simple, hard-wired rules. This single
tape serves as the Turing machine's input, program, bulk memory, output, and
perhaps "garbage" storage. (We can imagine modified Turing machines with more
than one tape, but one is all that is really necessary.)

Figure 1: A Turing machine.

Remarkably, it turns out that this extremely simplified computing device
is capable of doing any possible computation. More precisely, there is a class
of Turing machines called *universal Turing machines *(UTMs for short)
which are capable of simulating the action of any digital computer. If you
have a program that runs on your computer, then I can come up with software
for my universal Turing machine that lets it *simulate* your machine
and thus run your program. Any computation that can be done on any digital
computer can also be done---given enough time and tape---on a UTM. This means,
among other things, that all UTMs are roughly equivalent. There is always
a simulation program of finite, fixed length that allows UTM-143, say, to
run any program designed for UTM-26, and another such simulation program
for UTM-26 to run programs designed for UTM-143. Let us stipulate that we
have chosen one particular UTM to make use of. Since any other computer we
might want to use can if necessary be simulated by our UTM, there should be
no problem.

Suppose s is a binary string, that is, a long (but finite) sequence of 1s and 0s. Some examples might be

A = 011011011011011011011011011011...

B = 01010011100001011000011010110010...

C = 0100011011000001010011100101110111...

(The ... indicates that each string is considerably longer than shown.) The length in bits of the string s is denoted |s|.

For a particular string s, we ask the following question: What programs
on our UTM have the string s as their output? In other words, which programs
will eventually write the string s on the tape and then stop? Let **P**(s)
be the set of programs on the UTM which do this. There is always at least
one program that is in **P**(s); it looks something like

WRITE 0100101...110010

STOP

where the string s = 0100101...110010. (Actually, of course, the program
is itself a binary string written in the "machine language" of the UTM, not
in the high-level language shown above. This is only a schematic of the actual
program.) There are many other programs in **P**(s)---infinitely many,
in fact. There are programs that do lots of complicated calculations whose
result is s. There are programs which do lots of irrelevent stuff, then print
out s in the manner above. There is no limit to how stupidly and inefficiently
the programs in **P**(s) can do their job.

On the other hand, there is a limit to how cleverly and efficiently the
programs can do their job. There must be a *shortest* program in **P**(s),
which we will call s*. (If there are many shortest programs, so much the better;
we can choose any of them to be s* .) s* is shortest program that will eventually
print out s and then stop. It is not necessarily the *fastest* program
that would do this. s* might contain lots of nested loops, perhaps, and take
a very long time to run, while the brute-force "print s" program suggested
above would run very quickly. But s* is the shortest set of instructions for
the UTM that enables it to construct the sequence s. Thus, in some sense s*
is the *shortest description* of the string s for the computer UTM.

Perhaps you already see where all this is heading. A trash compactor is
a device for replacing a string s of garbage data with one of the shorter
programs in **P**(s). The compressed garbage can always be "decoded" by
running it as a program on the UTM. A *perfect* trash compactor would
be one that would always choose the minimal program s* to represent s. This
minimal program takes the greatest advantage of whatever structure or pattern
there may be in s and describes it as succinctly as possible.

Let us consider the "compressability" of the three strings A, B, and C given above. A evidently consists of 011 repeated over and over again. Consequently, it can be described very briefly, and A* is quite short. The string B, which was generated by flipping a coin repeatedly, has no pattern at all and cannot be effectively compressed. The minimal program B* may in fact simply be the "print B" program, which is about the same length as B itself. Finally, we come to C. This string is interesting in that C* is very short, but the structure of the string is buried somewhat more deeply than the structure of A. I will return to C in a moment.

It is reasonable at this point to define the *algorithmic complexity*
K(s) of a string s to be the length of the minimal program in **P**(s);
that is,

K(s) = |s*|

This quantity, which was proposed independently by Chaitin, Kolmogorov, and Solomonoff [6], is in fact a mathematical measure of randomness. That is, strings that are very ordered and not very random can be generated by short programs, and so K(s) |s| . (An example would be A.) Strings that are truly random without any pattern, like B, cannot be generated by programs that are significantly shorter than the strings themselves: K(s) is approximately |s| .

Note that this idea of randomness has nothing to do with probability. I
constructed the sequence C to illustrate this point. The sequence is made
by simply writing down all the one-bit sequences in binary order (0,1) followed
by the two-bit sequences in order (00,01,10,11), and so on. This is a very
simple pattern that can be generated by a very short computer program, so
K(C) is small. However, C passes many of the conventional statistical tests
for "randomness". In a long section of C, the binary digits 0 and 1 each
occur about 50% of the time. Furthermore, if the *pairs* of adjacent
bits are counted, it is found that 00, 01, 10, and 11 each occur about 25%
of the time. The various *triples* each occur about 12.5% of the time,
and so on. Statistically, it is a good candidate for randomness; algorithmically,
it is very ordered.

The function K(s) (which has also been called algorithmic randomness, algorithmic entropy, and algorithmic information) forms the basis for a branch of mathematics called algorithmic information theory. I would suggest that the basic idea embodied in K(s), that "complexity = length of shortest description", may have useful applications in many fields. In chemistry, for example, the complexity of a polymer might be defined as the algorithmic complexity of the sequence of its structural units. In any case, K(s) at least provides a rigorous way of defining complexity in mathematical terms, and that makes it interesting.

K(s) also has thermodynamic significance. Recall that the thermodynamic
cost of a computation is precisely the cost of erasing the useless garbage
data that is generated as a by-product. This cost can be reduced by first
running the garbage through a trash compactor program, which reversibly compresses
it. K(s) is precisely the minimum size to which a string s can be compressed.
Therefore, the erasure of the string s will produce an entropy increase
S
k_{B} log 2 K(s), and an energy dissipation
S
k_{B} T log 2 K(s), with equality holding for the perfect compression
s
s*. The more random or complex the garbage, the more costly it is to dispose
of it. Highly ordered garbage can be greatly compressed, and can be erased
with less dissipation. Of course, this dissipation can be avoided entirely
if the compressed garbage is tucked away in permanent storage instead of being
erased. If a lot of empty memory is available, the computer might be able
to operate for some time before too much garbage piles up. The computer may
either use up empty memory for garbage storage or it may dispose of the garbage
and use up free energy at a rate of k_{B} T log 2 per bit erased.
Empty memory can be used to save free energy, and free energy can be used
to clear out sections of memory by erasing them. The two are interconvertible.
This leads us to our strangest conclusion so far:

Empty memory is a form of free energy.

But perhaps this is not, after all, so strange. Empty memory corresponds to a very organized state of the computer, and energy is free energy to precisely the degree that it is organized.

If empty memory is a form of free energy, then does "used" memory correspond to unusable energy---i.e., does it contribute to the entropy of the computer? This very proposal was recently put forward by Wojciech Zurek, who suggested that the total entropy of a computer should be defined as

S = S_{thermo} + k_{B} log 2 K(m) ,

where S_{thermo} is the ordinary thermodynamic entropy and m is
a binary string representing the memory state of the computer. [7] That is, the entropy has a new, algorithmic part proportional
to K(m), the smallest number of bits into which the computer's memory could
be compressed. Zurek has supported his formula with a number of fairly persuasive
physical arguments. For example, if his extended definition of entropy is
adopted, then the thermodynamic cost of information erasure is incorporated
into the Second Law---a decrease in algorithmic complexity must be paid for
by an increase in ordinary entropy of at least equal magnitude.

Zurek has also used his formula to analyze the old problem of Maxwell's demon. This demon was invented by J. C. Maxwell as a joking illustration of the ideas of statistical mechanics, but it has taken a life of its own. [8] It is a mischievous little imp that, by carefully observing and altering the motions of individual molecules, is able to decrease the entropy of a system and apparently violate the Second Law. Zurek treated the demon as a computing device and showed that the demon, in the course of its activities, must use up a tremendous amount of free memory to store its information about molecular motions. The increase in the algorithmic complexity K(m) of the demon's memory is large enough to offset any decrease in the ordinary thermodynamic entropy the demon produces. (If the demon decides to erase its memory, of course, then it will generate entropy in its conventional form.)

The thermodynamics of computation leads us to add a new term to entropy. For an ordinary computer, this extra term is much smaller than the conventional entropy; but for a computer operating near the thermodynamic limits of computation, the extra term is crucial. It is also, as we will see next, very peculiar.

Something strange arises when we ask how to calculate K(s) for a string
s. Suppose that we had a program called FINDK on our universal Turing machine
that did exactly that: given any s as input, FINDK would calculate for a
while and then print out K(s) in binary notation. FINDK would doubtless be
very complex---let us say, for example, that FINDK is 10^{9} bits
long. Nothing prevents us from using FINDK as a subroutine of a slightly
larger program called BIGGUY. The first part of BIGGUY is a short program
that generates every binary sequence in order, beginning with shorter ones
and then increasing in length: 0, 1, 00, 01, 10, 11, 000, 001, etc. (The
program that generated C did just this.) When each sequence is generated,
it is sent to the FINDK subroutine and its algorithmic complexity is determined.
The program BIGGUY stops when it finds a string s of, say, K(s) = 10^{12}.
That is, BIGGUY looks like

START: GENERATE NEXT s

EVALUATE K(s) USING FINDK

IF K(s) < 10^12 THEN GOTO START

PRINT s

STOP

BIGGUY is not very much longer than FINDK itself, about 10^{9}
bits long, since none of the other parts of it are very complicated. BIGGUY
runs for a very long time and then prints out a string s and halts. This string
has the property that its algorithmic complexity K(s), as evaluated by FINDK,
is much larger than the length of the program BIGGUY! This is a contradiction,
since by definition no program shorter than K(s) can print out s.

To summarize: If a program FINDK to calculate K(s) exists, we could imbed it as a subroutine in a program BIGGUY, which would perform the impossible task of generating a string of greater complexity than the length of BIGGUY. Since this is a contradiction, no such program FINDK can exist. Or, to put it more elegantly,

Algorithmic complexity is an uncomputable function.

Mathematically, it is easy to see that the number of different computer
programs, and hence the number of computable functions, is countably infinite.
Since the number of integer functions is itself *un* countably infinite,
it is clear that there must be many functions which cannot be computed by
any computer program. Nevertheless, it is somewhat unusual to be face-to-face
with one and even more unsettling to run across it in a *physical* context.
Remember, according to Zurek K(s) contributes a part of the total entropy
of a computer. The implication is that this entropy itself is an uncomputable
function.

Consider a trash compactor program like the ones we have discussed. If the program were a perfect one, then it would replace every string s with the miminal program s* that generates s on the UTM. But if this could be done, then we could find the length of the program S* and compute K(s). It follows that

There are no perfect trash compactors.

A trash compactor program scans the string s looking for patterns and structure that can be used to compress the description of s on the UTM. No program can possibly be written that can identify such patterns perfectly. For any trash compactor, there will always be strings that it cannot compress perfectly.

In the context of the thermodynamic cost of computation, this means that there will always be situations in which the "garbage" is erased somewhat inefficiently, so that more free energy is dissipated as heat than is absolutely necessary. What is the source of this additional dissipation? The source is the uncomputability of the function K(s).

Some of the meaning of this uncomputability can be uncovered by imagining
how we might try to design a perfect trash compactor program---that is, one
that takes a string s and tries to find the minimal program s* that prints
out s and then stops. Our strategy runs as follows: First, the trash compactor
generates every possible UTM program shorter than and including the "print
s" program. We will try to determine which of these programs are in **P**(s)
and then find the shortest program in that set. We start all of the programs
running for a very long time, say a hundred million program steps apiece.
After this, the programs will fall into three categories:

- Programs that have stopped but did not print s (not in
**P**(s)). - Programs that have stopped after printing s (definitely in
**P**(s) ). - Programs that have not stopped yet.

We would like to say that s* is the shortest program in category 2. The
problem resides in category 3. Some category 3 programs will get caught in
an infinite loop or a runaway process and will never stop at all. Programs
which never stop are certainly not in **P**(s). The other category 3
programs, however, eventually will stop, and some of them could print s before
they do. Category 3 might just contain the minimal program s* in **P**(s)
that we are seeking.

If we ran all the programs long enough, eventually all the "stopping programs"
would have stopped already and only the "infinite-loop" programs would be
left in category 3; we would then know that the shortest program in category
2 is the one we are looking for. But how would we know if we had gotten to
this point? The only way that we could be absolutely certain that only the
non-halting programs remained in category 3 would be if we could somehow *recognize
non-halting programs.* We need a "halting-checker" subroutine in our
trash compactor that would do this job; we could then throw out from the
beginning all non-halting programs (since they cannot be in **P**(s) anyway)
and run all of the halting programs until they finally stopped.

If we could construct a "halting-checker" program, we could use it to construct the program FINDK that calculates the algorithmic complexity. But we know that no such FINDK program exists; therefore, we know that no "halting-checker" program exists. We have proven a famous result in the theory of computation! As Turing first showed in 1937, [9] the "halting problem" is an unsolvable one. That is, there is no computer program which can tell with certainty whether or not other programs will eventually halt. Consequently, we cannot use a halting-checker subroutine in our trash compactor program. Any actual subroutine we wrote as a halting-checker would not work perfectly. It would either fail to identify some non-halting programs (in which case the trash compactor would itself get caught in an "infinite loop"), or it would fail to identify some programs that did halt (one of which might be s*).

We have shown that the uncomputability of K(s) implies the unsolvability of the halting problem. It turns out that the converse is also true. That is, if we could construct a program FINDK, we would be able to use it to construct a "halting-checker" program.

Moreover, the unsolvability of the halting problem is equivalent to an even more famous result in mathematical logic: Godel's Incompleteness Theorem. [10] Godel's theorem essentially states that any formal mathematical system of sufficient complexity (complex enough to contain the positive integers and their properties), must contain statements whose truth cannot be determined from the axioms of the system. Such "undecidable propositions" cannot be proved either true or false. For example, the well-known Continuum Hypothesis in transfinite numbers is known to be undecidable within the context of ordinary set theory. [11] It is consistent with the axioms of set theory either to assert the Continuum Hypothesis or to deny it. Godel's demonstration of the existence of undecidable propositions has had a profound impact upon mathematics and logic, but not, until now, upon physics.

The "Godelian" undecidability of formal systems is thus at the root of the uncomputability of K(s). From this uncomputability it follows that any data compression scheme ("trash compactor") is imperfect, and this gives rise to additional dissipation in the course of some computations. This effect has been called "logical friction" by Zurek, but it might better be termed "Godel friction". Godel friction is the physical dissipation of energy in a computation process due to the logical incompleteness of formal mathematical systems.

I have said so much about things that we cannot do that perhaps I should
list a few of the things that we *can* do, both for clarity and for reassurance.

- We can devise trash compactors that work perfectly for some strings and pretty well for strings up to a certain length. After the length of the string becomes greater than the length of the trash compactor program, however, the program becomes less and less adept at finding "nearly-minimal" programs.
- We can make estimates for K(s) that are close except in rare (but
unpredictable) cases. We can always set an upper bound on K(s), so that
we can always
*overestimate*the entropy in Zurek's formula. - The troublesome strings are those which
*look*random to our computer but in fact have some hidden pattern. We can set limits on the number of such strings that there might be under a certain length. (It turns out that most strings both look random and are random. Order is the exception, randomness the rule.) - Given any trash compactor or program for estimating K(s), we can always devise a more sophisticated one that works still better. Improvement is allowed; only perfection is impossible.
- We can prove beautiful theorems about algorithmic information theory even though K(s) is uncomputable. These theorems can help our intuition and understanding of thermodynamics.

I want to emphasize number 5. Zurek and others have used algorithmic information theory to shed an interesting new light on thermodynamics and statistical mechanics. I do not believe that we have come to the end of the insights that can be obtained in this way.

So here is the story in a nutshell: The thermodynamic efficiency of a computer depends on its effectiveness in compressing useless and unwanted data before erasing it; but because of Godel's incompleteness theorem, no computer can compress data perfectly. An abstruse theorem of mathematical logic affects the amount of waste heat produced by a computer!

This is a very strange story, one of the strangest I know. There are quite
a few questions that remain unanswered. For example, is there any way to make
an estimate on the *magnitude* of the Godel friction of a given trash
compactor program? Since Godel friction arises because subtly-ordered strings
are mistaken for random ones, and since most strings of a given length are
nearly random, Godel friction should be a very small effect except in very
rare cases. How small? How rare? This whole frontier country between computation
and physics is still largely unexplored.

- J. D. Bekenstein,
*Phys. Rev.***D23**, 287 ff. (1981). - Many elementary expositions of thermodynamics are available. See,
for example, E. Fermi,
*Thermodynamics*, Dover (New York), 1956. - See R. Landauer,
*IBM J. Research***3**, 183--191 (1961) and C. Bennett,*Int. J. Th. Phys.***21**905--940 (1982). - Thanks to J. Lutton for this datum.
- \rm A. M. Turing,
*Proc. Lond. Math. Soc. (ser. 2)***42**(1936). - G. J. Chaitin,
*Journal of the ACM***22**, 547--569 (1966). See also Chaitin's article in*Scientific American*, May, 1975. Kolmogorov,A. N. Kolmogorov,*Information Transmission***1**, 3--11 (1965). and Solomonoff,R. J. Solomonoff,*Information and Control***7**, 1--22 (1964). - W. H. Zurek, "Algorithmic Randomness and Physical Entropy I",
*Phys. Rev.***A40**, 4731--4751 (1989). - .See the book
*Maxwell's Demon: Entropy, Information, Computing*, edited by Harvey S. Leff and Andrew F. Rex (Princeton, 1990). This collection of papers is the best single source available on the subject. - Turing,
*op. cit.*. - K. Godel,
*On Formally Undecidable Propositions*, Basic Books (New York), 1962. This is a translation and expansion of Godel's original paper in*Monat. für Math. und Phys.***38**173--198 (1931). - P. C. Cohen,
*Set Theory and the Continuum Hypothesis*, W. A. Benjamin (Menlo Park, Calif.), 1966.

Contact: Benjamin Schumacher, Dept. of Physics. Updated 26-July-2001 |