Preface
The Philosophy Behind the Book
Data structures and algorithms are among the most important inventions of the last 50 years, and they are fundamental tools software engineers need to know. But in my opinion, most of the books on these topics are too theoretical, too big, and too “bottom up”:
- Too theoretical
Mathematical analysis of algorithms is based on simplifying assumptions that limit its usefulness in practice. Many presentations of this topic gloss over the simplifications and focus on the math. In this book I present the most practical subset of this material and omit or de-emphasize the rest.
- Too big
Most books on these topics are at least 500 pages, and some are more than 1,000. By focusing on the topics I think are most useful for software engineers, I kept this book under 150 pages.
- Too “bottom up”
Many data structures books focus on how data structures work (the implementations), with less about how to use them (the interfaces). In this book, I go “top down”, starting with the interfaces. Readers learn to use the structures in the Java Collections Framework before getting into the details of how they work.
Finally, some books present this material out of context and without motivation: it’s just one damn data structure after another! I try to liven it up by organizing the topics around an application—web search—that uses data structures extensively, and is an interesting and important topic in its own right.
This application motivates some topics that are not usually covered in an introductory data structures class, including persistent data structures with Redis.
I have made difficult decisions about what to leave out, but I have made some compromises. I include a few topics that most readers will never use, but that they might be expected to know, possibly in a technical interview. For these topics, I present both the conventional wisdom as well as my reasons to be skeptical.
This book also presents basic aspects of software engineering practice, including version control and unit testing. Most chapters include an exercise that allows readers to apply what they have learned. Each exercise provides automated tests that check the solution. And for most exercises, I present my solution at the beginning of the next chapter.
Prerequisites
This book is intended for college students in computer science and related fields, as well as professional software engineers, people training in software engineering, and people preparing for technical interviews.
Before you start this book, you should know Java pretty well; in particular, you should know how to define a new class that extends an existing class or implements an interface
. If your Java is rusty, here are two books you might start with:
Downey and Mayfield, Think Java (O’Reilly Media, 2016), which is intended for people who have never programmed before
Sierra and Bates, Head First Java (O’Reilly Media, 2005), which is appropriate for people who already know another programming language
If you are not familiar with interfaces in Java, you might want to work through the tutorial called “What Is an Interface?” at http://thinkdast.com/interface.
One vocabulary note: the word “interface” can be confusing. In the context of an application programming interface (API), it refers to a set of classes and methods that provide certain capabilities.
In the context of Java, it also refers to a language feature, similar to a class, that specifies a set of methods. To help avoid confusion, I’ll use “interface” in the normal typeface for the general idea of an interface, and interface
in the code typeface for the Java language feature.
You should also be familiar with type parameters and generic types. For example, you should know how create an object with a type parameter, like ArrayList<Integer>
. If not, you can read about type parameters at http://thinkdast.com/types.
You should be familiar with the Java Collections Framework (JCF), which you can read about at http://thinkdast.com/collections. In particular, you should know about the List
interface and the classes ArrayList
and LinkedList
.
Ideally you should be familiar with Apache Ant, which is an automated build tool for Java. You can read more about Ant at http://thinkdast.com/anttut.
And you should be familiar with JUnit
, which is a unit testing framework for Java. You can read more about it at http://thinkdast.com/junit.
Working with the Code
The code for this book is in a Git repository at http://thinkdast.com/repo.
Git is a version control system that allows you to keep track of the files that make up a project. A collection of files under Git’s control is called a repository.
GitHub is a hosting service that provides storage for Git repositories and a convenient web interface. It provides several ways to work with the code:
You can create a copy of the repository on GitHub by pressing the Fork button. If you don’t already have a GitHub account, you’ll need to create one. After forking, you’ll have your own repository on GitHub that you can use to keep track of code you write. Then you can clone the repository, which downloads a copy of the files to your computer.
Alternatively, you could clone the repository without forking. If you choose this option, you don’t need a GitHub account, but you won’t be able to save your changes on GitHub.
If you don’t want to use Git at all, you can download the code in a ZIP archive using the Download button on the GitHub page, or this link: http://thinkdast.com/zip.
After you clone the repository or unzip the ZIP file, you should have a directory called ThinkDataStructures
with a subdirectory called code
.
The examples in this book were developed and tested using Java SE Development Kit 7. If you are using an older version, some examples will not work. If you are using a more recent version, they should all work.
Conventions Used in This Book
The following typographical conventions are used in this book:
- Italic
Indicates emphasis, keystrokes, menu options, URLs, and email addresses.
- Bold
Used for new terms where they are defined.
Constant width
Used for program listings, as well as within paragraphs to refer to filenames, file extensions, and program elements such as variable and function names, data types, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Safari® Books Online
Note
Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.
Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.
Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.
Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
- O’Reilly Media, Inc.
- 1005 Gravenstein Highway North
- Sebastopol, CA 95472
- 800-998-9938 (in the United States or Canada)
- 707-829-0515 (international or local)
- 707-829-0104 (fax)
To comment or ask technical questions about this book, send email to bookquestions@oreilly.com.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Contributors
This book is an adapted version of a curriculum I wrote for the Flatiron School in New York City, which offers a variety of online classes related to programming and web development. They offer a class based on this material, which provides an online development environment, help from instructors and other students, and a certificate of completion. You can find more information at http://flatironschool.com.
At the Flatiron School, Joe Burgess, Ann John, and Charles Pletcher provided guidance, suggestions, and corrections from the initial specification all the way through implementation and testing. Thank you all!
I am very grateful to my technical reviewers, Barry Whitman, Patrick White, and Chris Mayfield, who made many helpful suggestions and caught many errors. Of course, any remaining errors are my fault, not theirs!
Thanks to the instructors and students in Data Structures and Algorithms at Olin College, who read this book and provided useful feedback.
Charles Roumeliotis copyedited the book for O’Reilly Media and made many improvements.
If you have comments or ideas about the text, please send them to feedback@greenteapress.com.
Get Think Data Structures now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.