Anticryptography: The Next Frontier in Computer Science

by Brian McConnell
02/26/2001

Example 1: Teaching addition using a set of examples

1?1=2
1?2=3
2?2=4
2?5=7

Conclusion: ? means "+"

The problem with Lincos and similar languages was their limited scope. While they could be used to describe a concept in purely mathematical terms, many ideas do not lend themselves to such a neat and tidy description. But what if you extended this concept to create a rudimentary programming language?

What Can You Say with Computer Programs that You Can't Say with Simple Equations?

A lot.

Think about all the different tasks that a desktop computer can perform. You can use it to communicate (using text, images, and sound). You can use it to process large volumes of data. You can use it to create a virtual environment (a flight simulator, for example). The number of potential uses for computers is limited only by the creativity of software developers. Yet, for all of their complexity, the basic language used by computers is very simple and consists of a few dozen to, at most, a few hundred basic machine instructions.

To create a rudimentary programming language, all you need is about two dozen symbols. To do this, you need:

  • Delimiting characters (e.g., "(", ")")
  • Basic arithmetic operators (e.g., ADD, SUB, MULT, DIV)
  • Boolean arithmetic and logic (e.g., TRUE, FALSE, AND, OR, NOT)
  • Comparison operators (e.g., GT, GTE, LT, LTE, EQ, NEQ)
  • Variables (e.g., STORE{n}, RECALL{n})
  • Branching (e.g., JUMP{n}, GOSUB{n}, IF {x} DO {y})

All of these symbols can easily be taught to someone who understands math. While the behavior of a program may be complex, the individual symbols used to build it are easy to teach.

How Would You Teach Another Civilization to Recognize this Language?

Because the basic vocabulary is small (about two dozen symbols), and because the symbols are based on universal mathematical concepts, they can be taught using sets of true and false examples. This process can be fairly straightforward. By sending examples similar to those in Example 1, the sender can train the recipient to associate numbers with specific symbols or commands.

This may seem like a big assumption, but any civilization that is capable of doing interstellar communication will need to understand radio or lightwave communication. This implies that any civilization must also understand math and geometry (both of which are key to understanding the physics of communicating using electromagnetic radiation).

Structure

The first step in the process is to train the recipient to recognize a special series of digits or carrier signal states as delimiting characters. These are used as open- and close-parenthesis symbols, and are used to add structure to a linear (one dimensional) series of digits. They are repeated constantly throughout the message and appear more often than any other symbols.

Recognizing these symbols enables the recipient to parse the message into discrete parcels and subparcels of information, as shown in the example in Table 1. This is a critical step. Without it, the recipient will not know where to begin the process of analyzing and decoding the message.

Table 1: Example series of digits with and without delimiting symbols

Without delimiting symbols 011011100101110111
With delimiting symbols

(((0)(1)(10)(11))((100)(101)(110)(111)))

which can be read as

(((0)(1)(2)(3))
((4)(5)(6)(7)))

a two-dimensional array

The example shown in Table 1 illustrates the importance of delimiting symbols. Even if the recipient can recognize only two symbols, the equivalent of '(" and ")', they can see the general structure of the message. The example in Table 1 shows what a two-dimensional array looks like with and without delimiters. Without delimiters it is impossible to see the structure in the message. With delimiters, it is obvious that the message contains a series of numbers that are nested within a two-dimensional array.

At this stage, the recipient's goal is not to learn the meaning of the message, but only its basic format. A good analogy is learning the basic encoding format employed in DNA. We learned fairly quickly that DNA encodes information using four base pairs (two bits per character), and that base pairs are grouped into three character words called triplets (for a total of six bits per word). Each word codes for either an amino acid, or as a special "stop" word that signifies the end of an instruction set. Knowing this does not reveal the meaning of the message encoded in DNA, it merely tells us how to parse the data into manageable parcels and subparcels of information.

Dewey Decimal System for Ideas

The next step in this process is to create a table of numeric symbols that are used as placeholders for the arithmetic and logic symbols we want to teach. This is very similar to machine language, where numbers are used to represent commands to be executed by a CPU. The numbers themselves are meaningless, and are used to provide a unique address to each idea or symbol.

     Table 2: Example set of symbols and their numeric addresses
     
Table 2

Table 2 depicts a partial list of the symbols needed to define a basic programming language. Each symbol is associated with a unique number that is used as its placeholder. Think of this number as an IP address for an idea (instead of a workstation). The number itself is meaningless, and serves only to prevent different symbols from being confused with each other.

Each symbol is taught by sending a set of examples where there is one unknown (as shown in Example 1). The recipient is prompted to recognize a pattern within the example set, and by doing so learns the meaning of each new symbol. This process is repeated for each symbol in the basic lexicon.

Once the recipient has learned to recognize the basic symbols used to build the programming language, they can then capture and run programs without manually analyzing each instruction within each program. Programs can be built by linking to instructions in other programs (much like most software built today is based on modular design). Once they reach this point, all they need to do is archive programs and execute them to see what they do (much like you use desktop publishing software without having to understand the function of every DLL or class library used to build it).

Living Symbols

Programs are useful as a system for communication because they enable the sender to condense a lot of knowledge into a compact package. Take images as an example. A typical, uncompressed high-resolution image requires 8 to 32 million bits of data (for a 1000-by-1000 to 2000-by-2000 pixel image, 8 bits per pixel). Since the signal will have a finite information-carrying capacity (see the "Solving the Entropy Paradox" section later in this article), this is an unacceptable waste of space. Just as we have a strong incentive to compress images prior to transmitting them across networks, so too will the sender of an interstellar message.

Instead of sending the raw, uncompressed data used to describe an image, the sender can send efficiently compressed data to be processed by a previously transmitted algorithm (the equivalent of a JPEG image viewer, for example). What would otherwise require several megabits of data would then require a fraction of that amount.

What is especially interesting about using programs to communicate is that they enable the sender to deliver what are, in effect, living symbols. Instead of manually picking through every character in the message, the recipient would simply archive a program, run it, and observe what it does. The recipient could even interact with programs while they are running (by using variables as an input/output interface).

Imagine for a moment that you must define a set of symbols to describe the concept of gravity. This is a difficult idea to describe because it is an invisible force that acts over time. One way you can do this is to write a program that simulates the effects of gravity on two or more bodies. This program would run through a wide range of scenarios (e.g., small body orbiting a large body, many small bodies interacting with each other, etc.). The output from this program would, in turn, be associated with numeric symbols related to gravity (e.g., gravitational force, various types of orbits, etc.). One small program could encapsulate a lot of knowledge about this phenomenon and could be used to define additional symbols for use in an abstract language.

Figure 1
Figure 1: The output of a program is used to define or amplify the meaning of abstract symbols

This is just one example. The variety of ideas that can be described using this general technique is limited only by the author's creativity. If you doubt this, just consider the variety of tasks a desktop computer can perform. This machine, at its core, is merely a very fast calculator.

Solving the Entropy Paradox

Programs are useful as a foundation for communication not only because of their versatility, but also because they can solve the so-called entropy paradox. The entropy paradox arises because efficiently encoded information is nearly indistinguishable from random noise.

Imagine for a moment that you are engineering a system for communicating with another civilization. You know nothing about the recipient, so you want to format the message so that the receiving party can decode it as easily as possible. You are also forced to work within the following constraints:

  1. You have a finite amount of energy with which to power your transmitter.

  2. Because of (1), you have a finite amount of energy with which to encode each bit of information within your signal.

  3. Because of (2), your signal has a finite information-carrying capacity (bandwidth).

  4. Because of (3), you must encode information as efficiently as possible within your message. This enables you to maximize the number of symbols (useful information) encoded per bit.

  5. Because of (4), this forces you to eliminate redundant information from the message. This increases the apparent randomness of the raw data within the message.

It's the last item in this list that leads to the entropy paradox. In order to make a message easy to decode, it is necessary to make the structure of the message obvious to the recipient. Doing so requires the inclusion of redundant information that guides the recipient in parsing and decoding the message. The problem is that this undermines the primary goal of encoding the message as efficiently as possible (which requires the elimination of redundant information from the data stream).

Algorithms can solve this apparent catch-22 by enabling the sender to embed instructions for decompressing data within the message itself. We routinely use algorithms to compress data and to detect and repair damage caused by transmission errors. The same basic idea can be applied to an interstellar message. To do this, the sender would transmit a small program, not unlike the widely used PKZip program, which decompresses data that has been encoded for compression and forward error correction. To decode efficiently coded data, the recipient would run the bulk of the message through this program.

Using this technique, the sender only needs to send a small part of the message (the basic primer and the decompression program) in uncompressed "plaintext." The rest of the message is transmitted in the compressed, error-corrected format that the "pkzip" algorithm is designed to handle. As a result, the sender gets to have his cake and eat it too. Most of the message is transmitted in an efficient, compressed format that maximizes the amount of information that can be sent across the link. Yet it also contains a primer section that is easy to capture and decode.

Locating the Primer

How would the recipient know where to look for the primer, the part of the message that teaches them to recognize the basic lexicon of symbols? This is not as difficult as it first seems. This technique adds a clear signature to the message that reveals the location of the plaintext information. This is betrayed by variations in the randomness of the data stream over time. Figures 2 and 3, depict how this signature will reveal itself.

In the nearly random (efficiently encoded) segments of the message, there will be no correlation between a digit and those that follow it. Conversely, in the plaintext segments of the message, there will be many repeating patterns in the data. The degree of randomness in a series of numbers can be expressed as a value H (a measure of entropy). The lower the value of H, the more redundancy there is in the data.

By graphing the value of H over time, the recipient will be able to see which regions in the message contain a lot of redundant information. These are the regions where the primer information is most likely to be.

Figure 2
Figure 2: Graph of H (randomness) over time

Figure 3
Figure 3: Graph of H (randomness) over time

Communicating in Depth

Using programs that are based on a minimum number of symbols, it will be possible to communicate in greater depth than most people assume is feasible. Because programs can be used to perform a wide variety of information-processing tasks, they can be used to describe many different types of objects and phenomena, and they can be used to process and render efficiently encoded information (e.g., compressed images, motion video, etc.). Like your desktop computer, programs designed for the purpose of communication perform a wide variety of tasks.

Furthermore, the output from these programs can be used to describe or amplify symbols that can then be used in an abstract language that is based on the same principle of using numbers as words. Images produced by programs can be associated with numeric symbols that represent objects or categories of objects. The output from a simulation program can be used to describe processes or forces (such as the gravitation example depicted in Figure 1). Interactive programs (which interact with their users by passing information through variables) will allow the recipient to explore the behavior of a program in detail.

The variety of information and symbols that can be described in this manner is essentially unlimited.

Anticryptography on Terra Firma

The idea of communicating with an extraterrestrial civilization may seem outlandish or irrelevant to software development here on terra firma, but there are many interesting applications for these techniques that are waiting to be discovered. One of the applications is a framework for distributing reusable software components via networks. Another is a framework for delivering compressed multimedia content across computer networks.

One of the key features of mathematical languages is the use of numeric addresses as placeholders for symbols or ideas. This is not a new idea. René Descartes first proposed the idea in the sixteenth century as the basis for a system for multilingual communication. Though it's an old idea, it has been largely ignored, and can be used to build a better system for designing and disseminating reusable software.

In this system, imagine that every software component has a unique numeric address. Think of this as an IP (intellectual property?) address for software. The number itself is meaningless, and serves to prevent one component from being confused with another. This is a subtle difference from the current system, in which software components are assigned short alphanumeric names, typically on the whim of their authors. Though it's a subtle difference, it has significant implications, as shown in the hypothetical program in Example 2.

Example 2: Hypothetical program that uses uniform addresses for software components

# declarations
Use lang.math.sine[100.1.2.3] as sine
Use graphics.2d.drawline[95.2.1.1] as drawline
Use lang.math.pi[100.1.2.1] as pi

# body of program

FirstPoint = true
OldY=0
For X = 0 to pi() step 0.01
   OldY=NewY
   NewY = sine(x)
   If FirstPoint = False Then
      Drawline(((x-0.01)*100),(oldy*100),(x),(newy*100))
   End If	
Next X

The hypothetical program in Example 2, while similar to conventional programs, has one important difference. Notice how several reusable components are invoked at the beginning of the program. Each of these components has a uniform mnemonic or numeric address that uniquely identifies it. When this program is compiled, the resulting executable contains only the topmost layer of instructions.

A computer running this program would determine, at run-time, what lower level components were required to execute it. This computer would determine that the components 100.1.2.3 (lang.math.sine), 95.2.1.1 (graphics.2d.drawline), and 100.1.2.1 (lang.math.pi) were required to execute the program. It would obtain these components from a nearby object server (similar to a domain name server) if they had not already been cached locally. The computer would then inspect these components to determine what components they, in turn, needed to run. It would determine that component 100.1.2.3 (lang.math.sine) needs the component 100.1.1.4 (lang.math.factorial) to compute the sine of an angle, and would repeat the process of obtaining additional low-level components as needed.

Figure 4
Figure 4: Graphical depiction of the hierarchy of components used by the hypothetical program

This may seem like a trivial difference compared to the way programs and operating systems are packaged today, but it has important implications if it is ever widely adopted. The important difference is that the computers running these programs can automatically acquire and cache the low-level instructions required to run them. This approach would largely do away with the bloated, multimegabyte installation packages that are commonplace today. This can also be used to solve problems with component version control (through a simple subaddressing scheme), and to reduce the size of operating systems (additional components are obtained only on an as-needed basis).

Survival of the Fittest Codec: LGM (Little Green Men) Media Files

Video compression is a specific example of where these techniques can produce big dividends by overthrowing the "tyranny of standards." Industry standards for encoding images, such as the widely used JPEG format, are important because they allow systems from different vendors to interoperate, but they also tend to freeze the industry at a particular stage of development.

By migrating to a system where the instructions for decoding an image (or video clip or sound clip) are associated with the media itself, it is possible to create a system that can evolve much more quickly. In the current paradigm, we tend to stick with a particular system for encoding content until it becomes so obviously obsolete that people begin adopting a new system en masse.

Imagine a new system where the media and the instructions for viewing it are intertwined, as shown in Example 3.

Example 3

<?xml version="1.0"?>
<xml>
<presentation>
<name>My First Image File</name>

<components_used>
_102.1.1.1
_102.1.1.2
</components_used>

<image>
<_102.1.1.1>
binhex encoded data goes here
</_102.1.1.1>
<_102.1.1.2>
binhex encoded data goes here
</_102.1.1.2>
</image>

<!-- client can drop connection here if it has 
already cached the underlying code that defines 
the algorithms used to compress the media file -->

<componentcode>
<_102.1.1.1>
code that defines component goes here
</_102.1.1.1>
<_102.1.1.2>
code that defines component goes here
</_102.1.1.2>
</componentcode>
</presentation>
</xml>

In this framework, the viewer automatically learns new decompression algorithms as it encounters them. This would free developers and multimedia to use whatever tools worked best for encoding a particular type of sound or image (or region within an image). This would also enable them to use multiple compression schemes concurrently (e.g., use one codec to process unfocused background regions in an image, another to process highly detailed regions within an image). The important point is that it puts the decision of which algorithm to use in the hands of the author, who will be free to choose the algorithms that produce the best trade-off between compression and quality.

Boldly Going Where No Algorithm Has Gone Before

The basic goal in anticryptography is to create messages or programs that can be used by a recipient who has little or no prior knowledge about how they were created. This is a subtle but important difference from the way we do things today.

Most software today requires the user, or the client program downloaded by the user, to have prior knowledge about the system or information it is designed to access. For example, in order to view a QuickTime video clip, a user has to download viewer software specifically for the purpose of processing and rendering these files. Without the viewer software, the information contained in the .MOV file is unintelligible.

By applying techniques from anticryptography to the systems and software in use today, we can build software that automatically learns new procedures as they are required and that allows designers to decide which components to use based on technical merits instead of market share.