Search the Catalog
Beyond Contact

Beyond Contact

A Guide to SETI and Communicating with Alien Civilizations

By Brian McConnell
March 2001
0-596-000375, Order Number: 0375
424 pages, $24.95

Chapter 12
Binary DNA

While it's interesting to be able to send a block of binary numbers across interstellar distances, this whole exercise is pointless unless we can use those bits to convey something meaningful. This is probably the biggest challenge SETI researchers face, because it forces them to think about how we share knowledge ourselves. Whether we are communicating verbally, in written form, or sending a file across the Internet, our communication can ultimately be reduced to a small set of basic elements--binary numbers, letters in the alphabet, or the basic utterances of our speech. By themselves, these basic elements have no meaning. Yet, when combined, they can be used to form words and phrases that represent everything we see and experience.

The challenge is to work out a system that enables someone to see structure (and, with the help of that structure, to derive meaning) within an otherwise endlessly repeating series of numbers. One way to visualize this is to imagine that someone who speaks a completely unknown language is sending you a coded radio message. This message consists of nothing more than short and long beeps, similar to Morse code, except the key to decoding the message is unknown. Since the sender speaks a language nobody else has encountered, you can't consult a dictionary as a guide. The only clues about how to read the message are embedded in the message itself.

So far, we've discussed the technology required to transmit a radio or lightwave carrier that can be detected at interstellar distances, as well as the techniques used to detect extraterrestrial signals and filter out various types of noise and local interference. This chapter focuses on the techniques that can be used to transmit useful information (images, text, multimedia, etc.) over a carrier signal. Throughout this chapter and the remainder of the book, we explore the various ways to organize numbers so that they convey meaning, and also contain hints about how to read the message itself.

Assumptions about alien communication

While we can only guess about what an intelligent alien species might look like, or how they might talk to each other, we can make some predictions about what types of signals and messages they will be capable of understanding. Among other things, we can assume that an alien civilization that is capable of communicating with us can understand electronics, digital computing, math, and Boolean arithmetic. We make these predictions based on the minimum level of technological capability required to detect an interstellar signal. An understanding of electronics is a prerequisite for building the sensitive amplifiers and signal processing circuitry needed to detect a weak radio signal. An understanding of math and geometry is also a perquisite for building a properly shaped antenna for a radiotelescope.

We make these predictions because we understand the technological requirements for interstellar communication using electromagnetic radiation. These requirements define a minimum level of understanding of physics and telecommunications on both ends of the broadcast. Any civilization that is capable of communicating in this manner will, by definition, have reached a minimum level of technological sophistication.

This lowest common denominator, the understanding of electromagnetic radiation and digital computing, serves as a foundation upon which we can build a system for communication, even though we may know absolutely nothing else about the party on the other end of the line.

Aliens and electronics

One thing we do know is that communicative aliens will understand electronics. The process of transmitting or detecting radio or optical signals requires the use of electronic devices such as transistors and diodes, each of which perform specific functions within an electronic system. As we learned in Chapter 10, Teleporting Bits, transistors act like electronic valves, allowing a very small electric current to control the flow of a much larger electric current. This is the basic premise behind an amplifier.

There are several important devices required to build SETI transmission and detection systems, among them: basic electronic components (resistors, capacitors, transistors, and diodes), and digital computing devices (AND gates, memory registers, etc.).

Non-linear devices (electronic valves)

A non-linear device, or electronic valve, is a device that can adjust its resistance to carry an electrical current. A diode, for example, allows a current to flow easily in one direction through the device, but not the other; it is a one-way street for electrical current. A transistor (a basic component used to build amplifiers, logic circuits, and memory) behaves like an electrical valve. Current flows into the device through one connector and out through another. A third connector is used to control the amount of current that flows through the device by applying a small electrical signal to the device. This is conceptually similar to adjusting the valve on a water faucet, where a small mechanical force controls the flow of water through the faucet.

These devices make it possible to build a wide range of devices, including:

A weak signal is used as the control signal to modulate the flow of a much larger current through a device.

Logic circuits
Transistors can be combined to form devices that process binary data (e.g., to add two binary numbers or compare two numbers, etc.).

Transistors can also be combined to form a special class of logic circuit that remembers the last binary number presented to it. This is the basis of random access memory (RAM).

Photoelectric devices

Photoelectric devices convert light into electrical signals, or convert electrical energy into light. These devices are especially important in Optical SETI (OSETI), and are important for astronomy and scientific research as a whole. These devices can be divided into two broad categories: light capturing devices and light emitting devices, which may be defined as follows:

Light capturing devices
These devices convert incoming light into an electrical current. They are based on the photoelectric effect, the phenomenon that Albert Einstein described, and for which he won the Nobel Prize in Physics in 1921. Among the devices we've developed in this category are solar cells and charged coupled devices (sensitive light detectors that are used in a wide variety of applications in astronomy).

Light emitting devices
These devices convert electrical energy into light. One example is a light emitting diode (LED), which efficiently converts electrical energy into light. Unlike an incandescent light, which emits many colors of light, an LED emits light that is tuned to a narrow range of colors. Solid-state lasers, which were derived from the work done on LEDs, operate in a similar manner, except they emit a narrowly focused beam of coherent light.

It's not important for an alien inventor to build a carbon copy of our silicon transistor, or to invent each of these things in the same sequence. However, it is necessary that it build devices that are functionally equivalent to many of these components in order to build the equipment required to detect an interstellar signal. This is important because the civilization, while it may not have invented these devices in the same order, must have discovered most of them in order to build the hardware required to communicate wirelessly.

Aliens and digital computing

Electronic valves and computing are closely related, so the next thing we must assume is that aliens understand digital computers. A digital computer, with its ability to process information presented in the form of minute electrical signals, is built from a large array of simple circuits that are built using interconnected non-linear devices. One such example is the flip-flop memory register shown in Figure 12-1.

Figure 1
Figure 12-1.
This circuit, composed of NAND gates, which are built with transistors, stores a single bit of information.

NOTE: A NAND gate is the equivalent of an AND circuit, followed by a NOT circuit. This means that a NAND gate will produce a 1 when either of its inputs A or B are 0.

Once a civilization has invented an electronic valve, whether it is a vacuum tube or solid-state transistor, it will only be a matter of time before it also discovers that these devices can connect together to build digital memory and information processing circuitry. While it's impossible to predict the sequence of invention, it's reasonable to conclude that a civilization capable of communicating by radio or laser has also discovered transistor-like devices, and is therefore likely to have discovered digital computing.

Any civilization that makes a serious attempt to communicate must understand the importance of computing in communication, especially computing related to the detection of radio signals. Because of this, we can expect that a civilization capable of building an interstellar beacon has probably discovered digital computing in some form.

Aliens who understand math

An understanding of mathematics and geometry, as we know them, is another implicit requirement for interstellar communication. The best shape to use for a radiotelescope is a parabolic dish. If you cut a cross-section through the antenna of such an antenna, the shape of the cross section would be a parabola (Figure 12-2). The reason for this has to do with geometry. A parabola is defined by the formula y=ax2+bx+c. This equation should be familiar to most people who've taken algebra or geometry. The curve this formula describes is also an ideal shape for a mirror, thus an antenna with this shape reflects incoming radiation in such a way that it converges on a single focal point, much like a solar oven (Figure 12-3).

The development of electronics and math are closely related because electricity is invisible. It's impossible to see the current flowing through a wire, or to see what's happening inside of a transistor, as shown in Figure 12-4 and Figure 12-5. To reliably design electronic devices, the inventors must be able to model their behavior using mathematical equations.

The following equations describe the behavior of the simplified transistor depicted in Figure 12-5:

Iab = Vab / R

R = Vcb * 1000

The first equation tells us that the current flowing through the device (Iab) is equal to the voltage applied to the device (Vab) divided by its resistance to electrical current (R). This value, R, is in turn determined by the voltage applied to the third connector (Vcb). The resistance of the device is equal to the control voltage multiplied by 1000. By increasing the control voltage applied to the device, we increase its resistance to electrical current, and decrease the amount of current flowing through the device.

Figure 2
Figure 12-2.
A parabola is the shape of the cross-section of an antenna used for radioastronomy.

Figure 3
Figure 12-3.
A parabolic mirror reflects light from a distant source so that it converges on a central focal point.

Figure 4
Figure 12-4.
A simple transistor.

Figure 5
Figure 12-5.
A simple transistor shown as part of an electrical circuit.

This example greatly oversimplifies the behavior of a transistor; however, the basic idea is the same. By modifying the control voltage (Vcb), it is possible to control the flow of a much larger current through the device. The equations used to describe a real transistor are more complex and take a large number of factors into account, such as the temperature of the device, the effect of current leaking from the control input through the device, and many others. The important point in all of this is that to build transistor-like devices, it is very helpful to know how to use mathematics to first model their behavior.

Therefore, any civilization that is capable of interstellar communication must have discovered electronics. It is also likely that it understands math and geometry. While it's possible that a civilization could have discovered electricity without understanding math, it's hard to imagine it constructively using electricity without being able to model devices mathematically.

Aliens and binary code

We can conclude from these predictions that a communicating alien civilization will most likely understand electronics, digital computing, and mathematics. Therefore, it's a reasonable assumption that it will understand, or at least be capable of understanding, binary arithmetic. If it learns to build digital memory and logic circuits, it will learn binary code (since it is closely related to the behavior of these devices).

Even though we may have very little in common with an alien civilization, it is likely that both parties will be able to understand and manipulate digitally coded information. This is a fancy way of saying that using binary code, we will be able to communicate with computer programs written in binary code.

The point isn't to speculate that aliens would instantly recognize a binary encoded message, but merely to point out that in order to detect a signal at all, they would need to have developed most of these skills. Even if they had stalled out at our level of technological development, a minimum requirement to detect an interstellar signal, they would already know most of what is required to detect and begin deciphering this type of interstellar message.

Lingua numerica

The real challenge in all of this is figuring out how to embed meaning in a string of binary numbers. This forces us to think about what communication and symbols really mean, without getting caught in the trap of assuming that aliens will automatically understand ideas or concepts that are obvious to us (like the concept of emotional states).

Instead of speculating about what message aliens might choose to send to us, we will look at how to construct a message that can convey large amounts of useful information while at the same time can be easily decoded. This is just one of many possible approaches we could use to do this. The goal of this section is not to provide an encyclopedic overview of every SETI communication system that has been proposed. Instead, we focus here on describing one system from start to finish and how it can be applied to the task of sending a detailed message to an alien civilization.

The system that this book describes is the outgrowth of the author's work on a system for developing and distributing software via public data networks, such as the Internet. The general approach borrows ideas from genetics and artificial intelligence research (especially semantic networks). By combining these ideas, it is possible to create a system that is useful not only as a way to demonstrate how an alien message might be organized, but also as a framework for a new system for developing and distributing computer software.

The basic objective of the technique is to devise a system for transmitting machine instructions (computer programs) and for building a symbolic vocabulary based on numbers. We do this because we can describe computer programs using a basic vocabulary of less than 100 mathematical and logical symbols, all of which can be easily described to a recipient who understands electronics and digital computing. Although their underlying instructions may be simple, computer programs can exhibit complex behavior. We can use computer programs to perform an infinite variety of tasks, from displaying a compressed image to illustrating a complex situation, such as a simulated fly-by of the Earth. These computer programs, embedded throughout the message, will be the built-in clues that guide the reader in deciphering the message itself and in learning the meaning of symbols that represent abstract ideas.

The important point here is that it is possible to use computer programs to describe complex ideas in an abbreviated format. Take the idea of gravity, for example. Gravity is an abstract concept that cannot be seen directly, and as such, is hard to describe to someone who does not already know what you're talking about. Imagine trying to explain the word gravity to an alien using still images or symbols alone. It's kind of like playing a game of PictionaryTMover the telephone. If you can describe the idea with a computer program, you can recreate the behavior of objects under the influence of gravity in a simulation.

Imagine, for a moment, that you write a short computer program that simulates the interaction of several objects according to the laws of gravity. Such a program can be quite brief, consisting primarily of the equations used to describe the gravitational force between two objects based on their mass, velocity, and distance from each other.

The simulation would automatically run through an endless series of randomly generated examples. Some examples would depict two large objects orbiting each other at a distance, while another would depict a large number of smaller objects interacting with each other. This program could run for as long as the viewer allowed it. The program itself would only be a few hundred or a few thousand bytes long, yet it could depict an infinite variety of situations, all of which are consistent with the laws of gravity. The recipient viewing this program is going to notice a common theme in all of the simulated scenarios: they're all consistent with the laws of gravity. They will then be able to infer that the program is related to the concept of gravity. So, with one small statement, we can describe a comprehensive description of gravity, and later use this to define symbols related to this idea. We can apply this approach to a wide range of ideas, and use it to build a vocabulary of practical and abstract concepts that can be used to build a simplified language.

By using this approach, which we'll describe in detail in the upcoming chapters, we can create a message format that is very easy to decode and that can also convey an infinite diversity of messages, from computer programs to high-resolution color images. We're not limited to sending scientific equations or digital stick figures; we can build a general-purpose symbolic language that allows us to say pretty much anything we'd like to.

Just as we can use this general technique to tell our story to anyone who cares to listen, we can use this as a guide to anticipate how other civilizations might initiate contact with their neighbors.

Genes, memes, and igenes

The blueprint stored in DNA, an organism's genome, is, in effect, the program that describes how an organism builds itself and functions throughout its life. This information is subdivided into many discrete packages of instructions (genes). Each gene is typically associated with a particular function or trait (such as the instructions for producing the hemoglobin molecule used by red blood cells). An organism's DNA program is not read in its entirety from start to finish, but is broken down into many smaller units, each of which can be accessed as needed.

An igene, like a gene, is a set of computer instructions that can be incorporated into other, more complex programs. Just as the gene for hemoglobin doesn't describe how to build an entire blood cell, an igene that describes how to calculate the sine of a number is a component that deals with a small part of a larger task. This modular approach for packaging instructions allows us to create symbols that are shorthand for an otherwise complicated set of instructions, and to combine these symbols to describe complex processes in shorthand form. The key difference between an igene and a gene is the igene contains computation instructions whereas a gene describes how to build a protein that is used by an organism.

igenes can be used to build a symbolic vocabulary that allows us to perform a wide variety of math and computational tricks. The igenes can be combined to create complex systems, even though their building blocks are quite simple. We can use igenes to describe computer programs, and reusable segments of computer programs. We can then use these programs to do all sorts of things, like perform calculations, simulate systems, display images, or anything we normally do with conventional programs; the possibilities are limitless.

In Chapter 15, Concepts and igenes, we'll describe how to build programs composed of igenes to describe memes, or abstract symbols. Memes, as we discus in Chapter 19, Abstract Symbols and Language, are shorthand for abstract ideas. We'll do this by writing computer programs that display pictures of objects, simulate the situations or processes we'd like to describe, and so forth. This memetic vocabulary will form the foundation for an abstract language that we can use to describe a wide variety of concepts. All of this can be done starting with an apparently meaningless sequence of ones and zeros.

Hiding structure in numbers

When an alien civilization first encounters our hypothetical radio message (or we detect a similar message ourselves), all they will see is an apparently endless stream of binary numbers. At first, the message will appear to be hopelessly jumbled and devoid of meaning. If it is not formatted so that it contains clues about its structure, the party on the receiving end may never figure out how it is organized, and will never be able to get beyond receiving the signal.

The first step on the path to comprehension is to figure out how the series of numbers is organized into the equivalent of words, groups of words, groups of groups of words, and so on. A good metaphor to use is the general format of the information encoded in DNA. Learning the format of DNA is not the same thing as learning the meaning of the instructions encoded in DNA. The first step is to figure out how the information encoded in a genome is broken down into smaller subunits of information. Knowing how genetic information is organized doesn't mean that we understand what every gene does, but merely that we know how to parse this data into the equivalent of words and sentences.

DNA does this by using special sequences of base pairs, the genetic equivalent of letters, to denote the end of a sequence of instructions. We will do something similar with our series of binary numbers by using special sequences of binary digits to play the role of parentheses. We'll use these parentheses to bracket other numbers and to create groups of numbers, and groups of groups of numbers, as in the following example:

(((1001)(1010)) ((1000)(1010)) ((1001)(0101))
  ((1000)(1100)) ((1001)(0101)))

Which can be read in decimal form as:

(((9)(10)) ((8)(10)) ((9)(5)) ((8)(12)) ((9)(5)))

This example, thanks to the parentheses, is easy to break down into words and groups of words, and groups of groups of words. While this doesn't tell us anything about what this statement means, it does tell us where to start in breaking the message down for analysis. Now compare the above example with the following statement:


This is merely the first example without the parentheses. Imagine that you had received a message like this, except that instead of a few digits, you were looking at millions or billions of digits. Where does one word end and the next begin? How could you tell whether words are combined to form groups or groups of groups? Without a clue about how the message is organized into subunits, it would be very difficult to interpret. This type of structure tells the receiver how to parse the message, or break it down into its basic units.

Binary biology

So, how can we describe the idea of a parenthesis to an alien? Biology can teach us a lot about how to pack an immense amount of information into a small package. The entire human genome, in effect the program required to build a human being, consists of about 3 billion DNA base pairs, or roughly 6 billion bits of information. This is roughly equivalent to the amount of information stored on a single CD-ROM. What this demonstrates is that nature can condense everything it takes to build a human into this space, whereas Microsoft can barely manage to squeeze its suite of Office software into the same real estate. We can learn a great deal from nature's economy of words.

The DNA molecule encodes the information needed to build most life forms on Earth. The DNA molecule uses different combinations of base pairs to represent different amino acids, which are used to assemble more complex molecules, called proteins. Special combinations of base pairs represent the start of a series of instructions to build a protein. The DNA molecule encodes information using four different molecules: adenine, thymine, cytosine, and guanine (Figure 12-6).

Figure 6
Figure 12-6.
Adenine, thymine, cytosine, and guanine: the basic units used to form genetic words in DNA.

These four molecules, when found in groups of three (or triplets), form basic genetic words. These words can represent an amino acid (a building block for proteins), or they can represent a special "stop" word to mark the end of a sequence (or a word at the end of a sentence).

It's helpful to think of DNA encoding as we would an alphabet, just like the English alphabet. Forget, for a moment, about the fact that we're dealing with chemicals here. Think of each base pair as a letter: A for adenine, T for thymine, C for cytosine, and G for guanine.

By itself, a single letter means nothing. In order for them to mean something, these letters must be combined to form words. The information in DNA is organized in triplets, or sets of three base pairs. These triplets are analogous to words in English. For example, the sequence GAC is the word that represents aspartic acid. The format of DNA is more rigid than English. Every word in DNA is built from three base pairs (letters), no more, no less. These triplets are also referred to as codons. Since DNA uses three letters (molecules) to represent each word, and the letters have four possible values, DNA can encode a total of 64 different states in each three-character (letter) word. This means that DNA could, in theory, encode a maximum of 64 different words.

In practice, DNA encodes a total of 22 amino acids (see Table 12-1). This is because several different combinations can code for a single amino acid. (This appears to be an error-correction mechanism, so a random error in transcribing one letter in a DNA word will not necessarily produce the wrong amino acid.)

Table 12-1: DNA codons and their meanings

Amino Acid


DNA Codons







Aspartic Acid



Glutamic Acid






















































These words can, in turn, be combined to form groups of words. The German language offers a good analogy. Many German words are formed by combining several words to create a single compound word. In the same way, several DNA words can be strung together to describe a protein that is formed using several amino acids. For example, combining the words arm, band, and uhr forms the word Armbanduhr. This example translates in English to wristwatch, or literally, "arm + band + clock."

A typical DNA instruction can be read in a format such as the following:

STOP : Alanine + Aspartic Acid + Glycine + Alanine : STOP

This instruction can be read as: "Build a protein by combining alanine, aspartic acid, glycine, and alanine." It would be coded in DNA as:


The Stop instruction is important because an organism would otherwise produce infinitely long, tangled blobs of amino acids, instead of useful proteins that perform a specific function such as transporting oxygen in blood (i.e., hemoglobin).

We're going to do something similar with our endless stream of binary digits by creating two special series of "start" and "stop" instructions. One series, 111000111000111000, will always indicate the start of a word. Another series, 101000101000101000, will always indicate the end of a word. So, whenever we see the sequence of digits 111000111000111000, we see an open parenthesis, "(", and whenever we see the sequence of digits 101000101000101000, it can be interpreted as a close parenthesis, ")".

NOTE: The number 111000111000111000 doesn't have an inherent meaning. This number was chosen at random to use as an example throughout the book. In a real system, the sender could use any string of digits as a delimiter to separate symbols and groups of symbols.

Parentheses can also be described by assigning special states to the transmitted signal itself. If, for example, the message is embedded in a pulsed laser beacon, the sender could use special colors to describe "(" and ")" symbols (e.g., red=0, orange=1, yellow="open parenthesis," green="close parenthesis").

At first, the recipient of the message will see nothing more than a series of binary numbers with no apparent beginning or end. However, upon closer inspection, there will be certain sequences of digits that recur throughout the message. One of the first things our recipient will do is to start analyzing the series of digits to look for order or repeating patterns in the message. The open parenthesis and close parenthesis sequences will appear repeatedly throughout the message. The recipient will most likely look for repeating patterns in the message by analyzing the frequency with which different combinations of digits appear. This type of analysis, although it is requires a lot of computation, is fairly easy to do.

The trend that this frequency analysis will reveal is that these two series of digits occur repeatedly throughout the message. They also occur in a predictable order. An open parenthesis symbol will be followed by some data, and then by a close parenthesis symbol. The open and close parentheses symbols will also be encountered in equal numbers throughout the message as a whole.

This, by itself, does not reveal the meaning of the open parenthesis and close parenthesis sequences, but their use throughout the message is a strong indication that these sequences are important to deciphering the message.

The recipient will also know that, if the message contains useful information, it will most likely be organized into smaller units and subunits of information. So, the first thing the recipient will want to do is figure out how to parse the message, or to break it down into those smaller blocks of data.

Once the recipient discovers that the "(" and ")" sequences appear throughout the message, and sees that they almost always occur in pairs, it should be fairly easy to figure out that they are being used to bracket information--to define the start and end of a word or group of words. They will most likely try many different approaches before finding the right solution. When the recipient figures out that the "(" symbol equals 111000111000111000 and the ")" symbol equals 101000101000101000, the basic structure of the message will be revealed. For example:

1110001110001110001110001110001110001001101000101000101000 1110001110001110001100110010100010100010100011100011100011 1000000110100010100010100011100011100011100001010110100010 1000101000111000111000111000110011001010001010001010001010 00101000101000

Although we can see a repeating pattern in this sequence of digits, it is difficult to see how this message is organized. What we'll do now is to replace the sequence 111000111000111000 with a "(" symbol to denote the start of a word. We'll replace the sequence 101000101000101000 with the symbol ")" to denote the end of a word. When we perform the translation, the series of digits above is reduced to:


Now the structure to this message is revealed. We can see relatively short words bracketed by "(" and ")" symbols, and can also see groups of words bracketed by "(" and ")" symbols to create expressions.

While this doesn't tell the recipient anything about what the words mean, it does reveals the basic structure of the message. Once the recipient can parse the message into individual words and groups of words, they can then set about the task of determining what these numeric words mean.

The basic trick we're using here is to introduce an obvious, repeating pattern into an otherwise unintelligible message. In effect, what we're doing is repeating the following message over and over:

{useful information starts here} 101010111100110110101010111111
{useful information ends here} {useful information starts here} 1011111100000111 {useful information ends here}

This approach allows the recipient to discern the basic structure of the message using only simple statistical analysis tools. This approach may also be easy to decipher because it mimics the way information is coded in biological systems, and therefore may look familiar to our distant recipient (assuming biological information is encoded in something similar to DNA on other planets, which it may not be).

Once these special symbols are known, a simple computer program could be employed to perform a search and replace operation, much like a word processor does. At this point, the goal is not to translate the message itself, but to figure out how it is organized into symbols, groups of symbols, groups of groups of symbols, and so on. Next, we'll look at how we can use this system to create a vocabulary of symbols that we can use to build a progressively more and more sophisticated message.

Back to: Sample Chapter Index

Back to: Beyond Contact: A Guide to SETI and Communicating with Alien Civilizations

O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies

© 2001, O'Reilly & Associates, Inc.