Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples

Computers + Biology = Bioinformatics

by Cynthia Gibas

Related Reading

Developing Bioinformatics Computer Skills
By Cynthia Gibas, Per Jambeck

The glib answers to the question run something like this: "Bioinformatics is the intersection of information technology and biology;" "Bioinformatics is information management for biology;" "Bioinformatics means tools for data mining in biological databases."

But what does that really mean? Those answers leave open a lot of questions. "Information technology" and "data mining" don't really mean a whole lot to a biologist, and they don't convey a sense of the possibilities that computers create for researchers. And from the opposite perspective, "biology" doesn't mean a whole lot to a computer professional. What is it biologists do? What do they want to find out? How do they go about finding it out? And finally, what are the benefits of applying information technology to biological research, and why is bioinformatics such a hot area as a result?

The Longer Answer

What are biologists trying to find out?

One of the big questions in biology is: how does the genomic code translate into a real, live human (or animal or plant or bacterium)? For a long time, biologists didn't have access to a complete version of that code, and they had to study it one letter (or word or sentence) at a time.

To learn more about the latest trends and research in this field, don't miss the O'Reilly Bioinformatics Technology Conference January 28-31, 2002, in Tuscon, Arizona.

The genomic code breaks down into thousands of individual genes. Genes tell cells to make proteins, individual molecules that each have a unique chemical mission. Proteins interact with each other to carry out thousands of functions, from digesting your dinner to synthesizing the small molecules that form a barrier between the inside of your cells and the outside world.

Biologists want to collect all of the information they can about every gene in every genome, and from that information construct models of how genes work together to build up and maintain a living body, whether it's a bacterium or a star quarterback.

What are the data types that are collected to answer these questions?

There are as many kinds of biological data as there are experiments. Bioinformaticians, however, can only work easily with data types that are collected systematically from the entire biological research community. The Web makes it possible to collect such data by electronic submission.

Currently, gene and genome sequences are the most abundantly collected data types, followed by protein atomic coordinates. DNA sequences are reported as strings of characters, and they are usually annotated with descriptions of features associated with particular regions of the string. Proteins are reported as Cartesian coordinates, with some (incompletely standardized) identifying information about the protein attached. New high-throughput experimental methods such as DNA microarrays produce large matrices of values which describe gene expression levels, protein-protein interactions, and other information about how genes and proteins interact in living cells.

How does computation support the whole enterprise?

Computers play many roles in modern biology:

The Future of Computers and Biology: A Broader View

Bioinformaticians are professional data analysts--they work with data generated by the experimental biology community and by a growing number of "data factory" projects (e.g., genome sequencing projects). Mining this data to develop new hypotheses, new models of how biological systems function, and even rules and patterns (which can be used to screen new data sets), is the work of bioinformatics.

Bioinformatics is a subset of a larger general trend to apply systematic and quantitative methods to the analysis of biological systems, which is in turn a subset of computational science in general. Bioinformatics may be primarily about data storage and genome sequence analysis, but computational approaches are already in use across the whole spectrum of biological research. With the increasing automation of experimentation and data collection, this trend can only continue.

For Tim O'Reilly's thoughts on trends in computational science, see Business Computing Isn't Where the Action Is Going to Be, in which Tim writes that a recent New York Times article, "All Science Is Computer Science," captures a trend he's been seeing for some time. "Every time we've had a radical lowering of the barriers of entry into a computing market, that market has exploded," says Tim. "Now, hackers and scientists are working together to break down the barriers to discovery."

Key Skills and Knowledge for Bioinformatics

As professional data analysts in a specialized field, bioinformaticians need to have a solid understanding of both computational analysis methods and the biological questions they're meant to answer. A lack of biological understanding can result in sophisticated computational methods being applied naively and in ways which aren't really helpful to biologists. A lack of analytical sophistication means that interesting features of biological data may go undiscovered.

In 1998, Dr. Russ Altman, now president of the International Society for Computational Biology, published an article called A Curriculum for Bioinformatics: The Time is Ripe, which enumerated some of the many skills that are useful for aspiring bioinformaticians.

Critical knowledge and skills he identified for bioinformaticians include:

Even more basic, however, are the key skills pointed out to us by some of our colleagues in the bioinformatics field:

The first two areas are the province of biologists, the latter two of computer scientists. Both sets of knowledge are considered basic to their field and are usually the focus of a good deal of training, generally an entire undergraduate degree. It's rare to find a combination of these skills in one individual. It sometimes seems that in order to retrain for bioinformatics, you'd need an entire new degree. But that's not very practical.

So, How Do I Retrain Myself for Bioinformatics?

The answer to the retraining question depends on how far you want to go on the continuum from programming to scientific research.

If you're going to be a programmer on a bioinformatics project, what you need to learn is enough biology so that you can talk to biological scientists, because they will be asking you to put their ideas into action on the computer. That means knowing on a general level what the important molecules of life are (DNA, RNA, proteins, metabolites), what they're made of, and what kinds of things they do. It's also helpful to understand how the information in the genome is used in living systems by translation into molecules that subsequently interact with each other to carry out life processes.

O'Reilly & Associates is dedicated to providing you with information about critical trends and innovations in computer technology. We're pleased to announce our second Peer-to-Peer and Web Services Conference (September 17-20, 2001, Washington, D.C.), an event exploring the technical, business, and legal dimensions of these technologies.

Once you know these basics, then you may want to learn about some existing bioinformatics and computational biology methods and how they work. Some universities offer bioinformatics certification programs for computer professionals.

If you see yourself making the transition from programmer to scientist and actually developing new bioinformatics methods, you'll need more than a thin gloss of biology over your computer competence. This is where bioinformatics graduate programs come in. Scientists go through the arduous and life-sucking process of graduate school to do more than just take a few more classes. They're there to learn the rules and process of scientific research from hypothesis to experiment to publication. If this is the road that you choose to take, consider applying to one of the many new graduate programs in bioinformatics and computational biology. And keep an eye on the O'Reilly books catalog.

Copyright © 2009 O'Reilly Media, Inc.