BUY THIS BOOK
Add to Cart

Print Book $44.95


Add to Cart

Print+PDF $58.44

Add to Cart

PDF $35.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £31.95

What is this?

Looking to Reprint or License this content?


Security Warrior
Security Warrior By Cyrus Peikari, Anton Chuvakin
January 2004
Pages: 552

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Assembly Language
This chapter provides a brief introduction to assembly language (ASM), in order to lay the groundwork for the reverse engineering chapters in Part I. This is not a comprehensive guide to learning ASM, but rather a brief refresher for those already familiar with the subject. Experienced ASM users should jump straight to Chapter 2.
From a cracker's point of view, you need to be able to understand ASM code, but not necessarily program in it (although this skill is highly desirable). ASM is one step higher than machine code, and it is the lowest-level language that is considered (by normal humans) to be readable. ASM gives you a great deal of control over the CPU. Thus, it is a powerful tool to help you cut through the obfuscation of binary code. Expert crackers dream in assembly language.
In its natural form, a program exists as a series of ones and zeroes. While some operating systems display these numbers in a hex format (which is much easier to read than a series of binary data), humans need a bridge to make programming—or understanding compiled code—more efficient.
When a processor reads the program file, it converts the binary data into instructions. These instructions are used by the processor to perform mathematical calculations on data, to move data around in memory, and to pass information to and from inputs and outputs, such as the keyboard and screen. However, the number of instruction sets and how they work varies, depending on the processor type and how powerful it is. For example, an Intel processor, such as the Pentium 4, has an extensive set of instructions, whereas a RISC processor has a limited set. The difference can make one processor more desirable in certain environments. Issues such as space, power, and heat flux are considered before a processor is selected for a device. For example, in handheld devices, a RISC-based processor such as ARM is preferable. A Pentium 4 would not only eat the battery in a few minutes, but the user would have to wear oven mitts just to hold the device.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Registers
While it is possible for a processor to read and write data directly from RAM, or even the cache, it would create a bottleneck. To correct this problem, processors include a small amount of internal memory. The memory is split up into placeholders known as registers . Depending on the processor, each register may hold from 8 bits to 128 bits of information; the most common is 32 bits. The information in a register could include a value to be used directly by the processor, such as a decimal number. The value could also be a memory address representing the next line of code to execute. Having the ability to store data locally means the processor can more easily perform memory read and write operations. This ability in turn increases the speed of the program by reducing the amount of reading/writing between RAM and the processor.
In the typical x86 processor, there are several key registers that you will interact with while reverse engineering. Figure 1-1 shows a screenshot of the registers on a Windows XP machine using the debug -r command (the -u command provides a disassembly).
Figure 1-1: Example registers on an x86 processor shown using the debug -r command on Windows XP
The following list explains how each register is used:
AX
Principle register used in arithmetic calculations. Often called the accumulator, AX is frequently used to accumulate the results of an arithmetic calculation.
BX (BP)
The base register is typically used to store the base address of the program.
CX
The count register is often used to hold a value representing the number of times a process is to be repeated.
DX
The data register value simply holds general data.
SI and DI
The source and destination registers are used as offset addresses to allow a register to access various elements of a list or array.
SS, CS, ES, and DS
The stack segment, code segment, extra segment, and data segment registers are used to break up a program into parts. As it executes, the segment registers are assigned the base values of each segment. From here, offset values are used to access each command in the program.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
ASM Opcodes
Now that you understand registers and how memory is accessed, here's a quick overview of how opcodes are used. This is a brief summary only, since each processor type and version will have a different instruction set. Some variations are minor, such as using JMP (jump) versus B (branch) to redirect the processor to code in memory. Other variations, such as the number of opcodes available to the processor, have a much larger impact on how a program works.
Opcodes are the actual instructions that a program performs. Each opcode is represented by one line of code, which contains the opcode and the operands that are used by the opcode. The number of operands varies depending on the opcode. However, the size of the line is always limited to a set length in a program's memory. In other words, a 16-bit program will have a 1-byte opcode and a 1-byte operand, whereas a 32-bit program will have a 2-byte opcode and a 2-byte operand. Note that this is just one possible configuration and is not the case with all instruction sets.
As stated previously, the entire suite of opcodes available to a processor is called an instruction set. Each processor requires its own instruction set. You must be familiar with the instruction set a processor is using before reverse engineering on that device. Without understanding the vagaries among opcodes, you will spend countless hours trying to determine what a program is doing. This can be quite difficult when you're faced with such confusing opcodes as UMULLLS R9, R0, R0, R0 (discussed in Chapter 4). Without first being familiar with the ARM instruction set, you probably would not guess that it performs an unsigned multiply long if the LS status is set, and then updates the status flags accordingly after it executes.
One final note: when programs are disassembled, the ASM output syntax may vary according to the disassembler you are using. A particular disassembler may place operands in reverse order from another disassembler. In many of the Linux examples in this book, the equivalent command:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
References
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Windows Reverse Engineering
Software reverse engineering, also known as reverse code engineering (RCE), is the art of dissecting closed-source binary applications. Unlike open source software, which theoretically can be more easily peer-reviewed for security, closed source software presents the user with a "black box." Historically, RCE has been performed on Windows platforms, but there is now a growing need for expert Linux reversers as well, as we will explain in Chapter 3.
RCE allows you to see inside the black box. By disassembling a binary application, you can observe the program execution at its lowest levels. Once the application is broken down to machine language, a skilled practitioner can trace the operation of any binary application, no matter how well the software writer tries to protect it.
As a security expert, why would you want to learn RCE? The most common reason is to reverse malware such as viruses or Trojans. The antivirus industry depends on the ability to dissect binaries in order to diagnose, disinfect, and prevent them. In addition, the proliferation of unethical commercial spyware and software antipiracy protections that "phone home" raises serious privacy concerns.
In this chapter, we work on desktop Windows operating systems. Since Windows is a closed source and often hostile platform, by Darwinian pressure Windows RCE has now matured to the pinnacle of its technology. In subsequent chapters, we touch upon the emerging science of RCE on other platforms, including Linux and Windows CE, in which RCE is still in its infancy.
The legality of RCE is still in question in many areas. Most commercial software ships with a "click-through" end-user license agreement (EULA). According to the software manufacturers, clicking "I AGREE" when you install software contractually binds you to accept their licensing terms. Most EULAs include a clause that prevents the end user from reverse engineering the application, in order to protect the intellectual property of the manufacturer. In fact, the Digital Millennium Copyright Act (DMCA) now provides harsh criminal penalties for some instances of reverse engineering.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
History of RCE
"Modern" RCE started with programmers who circumvented copy protection on classic computer games, such as those written for the Apple II in the early 1980s. Although this trend quickly became a way to distribute pirated computer software, a core of experts remained who developed the RCE field purely for academic reasons.
One of the legendary figures of those heady days was the Old Red Cracker, (+ORC). Not only was +ORC a genius software reverser, he was a prolific author and teacher of the subject. His classic texts are still considered mandatory reading for RCE students.
In order to further RCE research, +ORC founded the High Cracking University, or +HCU. The "+" sign next to a nickname, or "handle," designated members of the +HCU. The +HCU students included the most elite Windows reversers in the world. Each year the +HCU published a new reverse engineering challenge, and the authors of a handful of the best written responses were invited as students for the new school year.
One of the professors, known as +Fravia, maintained a motley web site known as "+Fravia's Pages of Reverse Engineering." In this forum +Fravia not only challenged programmers, but society itself to "reverse engineer" the brainwashing of a corrupt and rampant materialism. At one point +Fravia's site was receiving millions of traffic hits per year, and its influence was widespread.
Today, most of the old +HCU has left Windows for the less occult Linux platform; only a few, such as +Tsehp, have remained to reverse Windows software. A new generation of reversers has rediscovered the ancient texts and begun to advance the science once again. Meanwhile, +Fravia himself can still be found wandering his endless library at http://www.searchlores.org.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reversing Tools
As a software reverse engineer, you are only as good as your tools. Before diving into practical examples later in the chapter, we first review some of the classic Windows RCE tools. Some you can learn in a day, while others may take years to master.
To edit binaries in hexadecimal (or opcode patching ), you need a good hex editor. One of the best is Ultra Edit, by Ian Meade (http://www.ultraedit.com/), shown in Figure 2-1.
Figure 2-1: For opcode patching, we recommend UltraEdit, an advanced Windows hex editor
A disassembler attempts to dissect a binary executable into human-readable assembly language. The disassembler software reads the raw byte stream output from the processor and parses it into groups of instructions. These instructions are then translated into assembly language instructions. The disassembler makes a best guess at the assembly language code, often with variable results. Nevertheless, it is the most essential tool for a software cracker.
A popular disassembler, and one that is the tool of choice for many expert reverse engineers, is IDA Pro. IDA (http://www.datarescue.com) is a multiprocessor, multioperating-system, interactive disassembler. It has won numerous accolades, not the least being chosen as the official disassembler of the +HCU in 1997.
IDA treats an executable file as a structured object that has been created from a database representing the source code. In other words, it attempts to re-create viable source code (as opposed to W32DASM, which only displays the code it thinks is important).
One of the most powerful features of IDA is the use of FLIRT signatures. FLIRT stands for Fast Library Identification and Recognition Technology. This means that IDA uses a proprietary algorithm to attempt to recognize compiler-specific library functions.
Mastering IDA takes considerable time and effort. The company admits in the user's manual that IDA is difficult to understand. However, once you have mastered IDA, you'll probably prefer it to the combination of W32DASM + SoftICE (discussed next). This section walks you through a few basic IDA configuration and manipulation steps.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reverse Engineering Examples
Before beginning your practical journey, there is one final issue to note. Similar to software debugging, reverse engineering by definition goes in reverse. In other words, you must be able to think backward. Zen meditation skills will serve you better than many years of formal programming education. If you are good at solving verbal brain-teaser riddles on long trips with friends, you will probably be good at RCE. In fact, master reversers like +Fravia recommend cracking while intoxicated with a mixture of strong alcoholic beverages. While for health reasons we cannot recommend this method, you may find that a relaxing cup of hot tea unwinds your mind and allows you to think in reverse. The following segments walk you through live examples of Windows reverse engineering.
Since it is illegal to defeat protections on copyrighted works, reverse engineers now program their own protection schemes for teaching purposes. Thus, crackmes are small programs that contain the heart of the protection scheme and little else.
Example 1 is Muad'Dib's Crackme #1.
The sample binaries (crackmes) used in this chapter may be downloaded from our web site at http://www.securitywarrior.com.
This is a simple program, with a twist. The program's only function is to keep you from closing it. For example, when you run the program you will see an Exit button. However, pressing the Exit button does not work (on purpose). Instead, it presents you with a nag screen that says, "Your job is to make me work as an exit button" (Figure 2-12).
Figure 2-12: Solving Muad'Dib's crackme
Thus, the crackme emulates shareware or software that has features removed or restricted to the user (i.e., crippleware). Your job is to enable the program in order to make it fully functional. Fortunately, the program itself gives you a great clue. By searching the disassembled program for the following string:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
References
The example crackmes from this chapter are at http://www.securitywarrior.com. Due to their controversial nature, some of the references in this book have volatile URLs. Whenever possible, we list the updated links at http://www.securitywarrior.com.
  • Windows Internet Security: Protecting Your Critical Data, by Seth Fogie and Cyrus Peikari. Prentice Hall, 2001.
  • ".NET Server Security: Architecture and Policy Vulnerabilities." Paper presented at Defcon 10, August 2002.
  • "PE header Format." Iczelion's Win32 Assembly Homepage. (http://win32asm.cjb.net )
  • "Mankind comes into the Ice Age." Mammon_'s Tales to his Grandson.
  • "An IDA Primer." Mammon_'s Tales to Fravia's Grandson.
  • SoftICE breakpoints. (http://www.anticrack.de)
  • "WoRKiNG WiTH UCF's ProcDump32," by Hades.
  • Win32 Assembly Tutorial. Copyright 2000 by Exagone. (http://exagone.cjb.net)
  • SubSeven official site. (http://www.subseven.ws)
  • "Reversing a Trojan: Part I," by the Defiler. Published by +Tsehp.
  • Muad'dib's Crackme, published by +Tsehp.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Linux Reverse Engineering
This chapter is concerned with reverse engineering in the Linux environment, a topic that is still sparsely covered despite years of attention from security consultants, software crackers, programmers writing device drivers or Windows interoperability software. The question naturally arises: why would anyone be interested in reverse engineering on Linux, an operating system in which the applications that are not open source are usually available for no charge? The reason is worth noting: in the case of Linux, reverse engineering is geared toward "real" reverse engineering—such as understanding hardware ioctl( ) interfaces, proprietary network protocols, or potentially hostile foreign binaries—rather than toward the theft of algorithms or bypassing copy protections.
As mentioned in the previous chapter, the legality of software reverse engineering is an issue. While actually illegal in some countries, reverse engineering is for the most part a violation of a software license or contract; that is, it becomes criminal only when the reverse engineer is violating copyright by copying or redistributing copy-protected software. In the United States, the (hopefully temporary) DMCA makes it illegal to circumvent a copy protection mechanism; this means the actual reverse engineering process is legal, as long as protection mechanisms are not disabled. Of course, as shown in the grossly mishandled Sklyarov incident, the feds will go to absurd lengths to prosecute alleged DMCA violations, thereby driving home the lesson that if one is engaged in reverse engineering a copy-protected piece of software, one should not publish the matter. Oddly enough, all of the DMCA cases brought to court have been at the urging of commercial companies...reverse engineering Trojaned binaries, exploits, and viruses seems to be safe for the moment.
This material is not intended to be a magic "Reverse Engineering How-To." In order to properly analyze a binary, you need a broad background in computers, covering not only assembly language but high-level language design and programming, operating system design, CPU architecture, network protocols, compiler design, executable file formats, code optimization—in short, it takes a great deal of experience to know what you're looking at in the disassembly of some random compiled binary. Little of that experience can be provided here; instead, the standard Linux tools and their usage are discussed, as well their shortcomings. The final half of the chapter is mostly source code demonstrating how to write new tools for Linux.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Basic Tools and Techniques
One of the wonderful things about Unix in general and Linux in particular is that the operating system ships with a number of powerful utilities that can be used for programming or reverse engineering (of course, some commercial Unixes still try to enforce "licensing" of so-called developer tools—an odd choice of phrase since "developers" tend to use Windows and "coders" tend to use Uni—but packages such as the GNU development tools are available for free on virtually every Unix platform extant). A virtual cornucopia of additional tools can be found online (see Section 3.5 at the end of the chapter), many of which are under continual development.
The tools presented here are restricted to the GNU packages and utilities available in most Linux distributions: nm, gdb, lsof, ltrace, objdump, od, and hexdump. Other tools that have become fairly widely used in the security and reverse engineering fields—dasm, elfdump, hte, ald, IDA, and IDA_Pro—xare not discussed, though the reader is encouraged to experiment with them.
One tool whose omission would at first appear to be a matter of great neglect is the humble hex editor. There are many of these available for Linux/Unix. biew is the best; hexedit is supplied with just about every major Linux distribution. Of course, as all true Unixers know in their hearts, you need no hex editor when you're in bed with od and dd.
The first tool that should be run on a prospective target is nm, the system utility for listing symbols in a binary. There are quite a few options to nm; the more useful are -C (demangle), -D (dynamic symbols), -g (global/external symbols), -u (only undefined symbols), --defined-only (only defined symbols), and -a (all symbols, including debugger hints).
There are notions of symbol type, scope, and definition in the nm listing. Type specifies the section where the symbol is located and usually has one of the following values:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Good Disassembly
The output of objdump leaves a little to be desired. In addition to being a "dumb" or sequential disassembler, it provides very little information that can be used to understand the target. For this reason, a great deal of post-disassembly work must be performed in order to make a disassembly useful.
As a disassembler, objdump does not attempt to identify functions in the target; it merely creates code labels for symbols found in the ELF header. While it may at first seem appropriate to generate a function for every address that is called, this process has many shortcomings; for example, it fails to identify functions only called via pointers or to detect a "call 0x0" as a function.
On the Intel platform, functions or subroutines compiled from a high-level language usually have the following form:
55         push ebp
89 E5        movl %esp, %ebp
83 EC ??    subl ??, %esp
...
89 EC        movl %ebp, %esp        ; could also be C9 leave 
C3        ret
The series of instructions at the beginning and end of a function are called the function prologue and epilogue; they are responsible for creating a stack frame in which the function will execute, and are generated by the compiler in accordance with the calling convention of the programming language. Functions can be identified by searching for function prologues within the disassembled target; in addition, an arbitrary series of bytes could be considered code if it contains instances of the 55 89 E5 83 EC byte series.
Performing automatic analysis on a disassembled listing can be quite tedious. It is much more convenient to do what more sophisticated disassemblers do: translate each instruction to an intermediate or internal representation and perform all analyses on that representation, converting back to assembly language (or to a higher-level language) before output.
This intermediate representation is often referred to as
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Problem Areas
So far, the reverse engineering process that has been presented is an idealized one; all tools are assumed to work correctly on all targets, and the resulting disassembly is assumed to be accurate.
In most real-world reverse engineering cases, however, this is not the case. The tools may not process the target at all, or may provide an inaccurate disassembly of the underlying machine code. The target may contain hostile code, be encrypted or compressed, or simply have been compiled using nonstandard tools.
The purpose of this section is to introduce a few of the common difficulties encountered when using these tools. It's not an exhaustive survey of protection techniques, nor does it pretend to provide reasonable solutions in all cases; what follows should be considered background for the next section of this chapter, which discusses the writing of new tools to compensate for the problems the current tools cannot cope with.
The prevalence of open source software on Linux has hampered the development of debuggers and other binary analysis tools; the developers of debuggers still rely on ptrace, a kernel-level debugging facility that is intended for working with "friendly" programs. As has been more than adequately shown (see Section 3.5 for more information), ptrace cannot be relied on for dealing with foreign or hostile binaries.
The following simple—and by now, quite common—program locks up when being debugged by a ptrace-based debugger:
#include <sys/ptrace.h>
    #include <stdio.h>
    int main( int argc, char **argv ) {
        if ( ptrace(PTRACE_TRACEME, 0, NULL, NULL) < 0 ) {
            /* we are being debugged */
            while (1) ;
        }
        printf("Success: PTRACE_TRACEME works\n");
        return(0);
    }
On applications that tend to be less obvious about their approach, the call to ptrace will be replaced with an int 80 system call:
asm("\t xorl %ebx, %ebx    \n"    /* PTRACE_TRACEME = 0 */
    "\t movl $26, %ea    \n"    /* from /usr/include/asm.unistd.h */
    "\t int 80        \n"    /* system call trap */
    );
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Writing New Tools
As seen in the previous section, the current tools based on binutils and ptrace leave a lot to be desired. While there are currently tools in development that compensate for these shortcomings, the general nature of this book and the volatile state of many of the projects precludes mentioning them here. Instead, what follows is a discussion of the facilities available for writing new tools to manipulate binary files.
The last half of this chapter contains a great deal of example source code. The reader is assumed to be familiar with C as well as with the general operation of binary tools such as linkers, debuggers, and disassemblers. This section begins with a discussion of parsing the ELF file header, followed by an introduction to writing programs using ptrace(2) and a brief look at the GNU BFD library. It ends with a discussion of using GNU libopcodes to create a disassembler.
The standard binary format for Linux and Unix executables is the Executable and Linkable Format (ELF). Documentation for the ELF format is easily obtainable; Intel provides PDF documentation at no charge as part of its Tool Interface Standards series (see Section 3.5 at the end of this chapter for more information).
Typical file types in ELF include binary executables, shared libraries, and the object or ".o" files produced during compilation. Static libraries, or ".a" files, consist of a collection of ELF object files linked by AR archive structures.
An ELF file is easily identified by examining the first four bytes of the file; they must be \177ELF, or 7F 45 4C 46 in hexdecimal. This four-byte signature is the start of the ELF file header, which is defined in /usr/include/elf.h:
typedef struct {                        /* ELF File Header */
    unsigned char   e_ident[16];        /* Magic number */
    Elf32_Half      e_type;             /* Object file type */
    Elf32_Half      e_machine;          /* Architecture */
    Elf32_Word      e_version;          /* Object file version */
    Elf32_Addr      e_entry;            /* Entry point virtual addr */
    Elf32_Off       e_phoff;            /* Prog hdr tbl file offset */
    Elf32_Off       e_shoff;            /* Sect hdr tbl file offset */
    Elf32_Word      e_flags;            /* Processor-specific flags */
    Elf32_Half      e_ehsize;           /* ELF header size in bytes */
    Elf32_Half      e_phentsize;        /* Prog hdr tbl entry size */
    Elf32_Half      e_phnum;            /* Prog hdr tbl entry count */
    Elf32_Half      e_shentsize;        /* Sect hdr tbl entry size */
    Elf32_Half      e_shnum;            /* Sect hdr tbl entry count */
    Elf32_Half      e_shstrndx;         /* Sect hdr string tbl idx */
} Elf32_Ehdr;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
References
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Windows CE Reverse Engineering
In the previous chapters, we covered reverse engineering on traditional platforms such as Win32 and Linux. However, what about the little guys? Can you reverse engineer software on embedded operating systems? Why would you want to?
Many embedded operating systems are stripped-down microversions of their big brothers. An embedded operating system brings the power of a complete OS to small devices such as mobile phones or watches, which suffer from severely restricted processing and memory resources. However, as embedded devices continue to increase in sophistication, their vulnerability to attack increases as well. Already the first computer viruses have hit embedded platforms, as we describe in Chapter 17. Corporate spyware will likely follow soon. With hundreds of millions of "smart" consumer appliances on the horizon, the potential for abuse keeps increasing.
Embedded RCE is still in its infancy. In this chapter, we introduce embedded OS architecture and how to crack the applications that run on it. For our example, we have chosen Windows CE, which powers many Windows Mobile OS flavors such as PocketPC and Smartphone. Windows CE is a semi-open, scalable, 32-bit, true-multitasking operating system that has been designed to run with maximum power on minimum resources. This OS is actually a miniature version of Windows 2000/XP that can run on appliances as small as a watch.
Why have we chosen Windows CE for our reverse engineering research, instead of friendly, open source, and free embedded Linux? For better or worse, CE is set to become one of the most prevalent operating systems of all time, thanks to aggressive marketing tactics by Microsoft. In addition, because of their closed nature, Windows platforms usually see the majority of viruses and unethical corporate spyware. Thus, the need to reverse engineer embedded Windows applications is more pressing. Download the free eMbedded Visual Tools (MVT) package from Microsoft.com and get cracking—literally.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Windows CE Architecture
Windows CE is the basis of all Windows Mobile PocketPC and Smartphone devices. In addition, using the CE Platform Builder, any programmer can create her own miniature operating system based on Windows CE. Consequently, CE is starting to control a vast array of consumer devices, ranging from toasters to exercise bicycles. Because of its growing prevalence, if you want to become proficient at reverse engineering applications on mobile devices it is important to understand the basics of how this operating system works. This segment briefly covers the Windows CE architecture, with a deeper look at topics important to understand when reversing.
In the world of miniature gadgets, physics is often the rate-limiting step. For example, the intense heat generated by high-speed processors in notebook PCs has been shown to be hot enough to fry eggs. In fact, News.com reported that one unfortunate man inadvertently burned his genitals with a laptop computer (http://www.news.com.au/common/story_page/0,4057,5537960%255E1702,00.html)!
Windows CE devices are likewise limited in their choice of processors. The following is a list of processors supported by Windows CE:
ARM
Supported processors include ARM720T, ARM920T, ARM1020T, StrongARM, and XScale. ARM-based processors are by far the most common choice of CE devices at the time of this writing.
MIPS
Supported processors include MIPS II/32 w/FP, MIPS II/32 w/o FP, MIPS16, MIPS IV/64 w/FP, and MIPS IV/64 w/o FP.
SHx
Supported processors include SH-3, SH-3 DSP, and SH-4.
x86
Supported processors include 486, 586, Geode, and Pentium I/II/III/IV.
If heat dissipation is a serious issue, the best choice is one of the non-x86 processors that uses a reduced level of power. The reduction in power consumption reduces the amount of heat created during processor operation, but it also limits the processor speed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
CE Reverse Engineering Fundamentals
To review: when a developer writes a program, he typically uses one of several languages. These include Visual Basic, C++, Java, or any one of the other, lesser-used languages. The choice of language depends on several factors; the most common are space and speed considerations. In the infamously bloated Windows environment, Visual Basic is arguably the king. This is because the hardware required to run Windows is usually more than enough to run any Visual Basic application. However, if a programmer needs a higher level of speed and power, he will probably select C++.
While these upper-level languages make programming easier by providing a large selection of Application Program Interfaces (APIs) and commands that are easy to understand, there are many occasions in which a programmer must create a program that can fit in a small amount of memory and operate quickly. To meet this goal, she may choose to use assembler, thus controlling the hardware of the computer directly. However, programming in assembler is tedious and must be done within an explicit set of rules.
Since every processor type uses its own set of assembler instructions, focus on one device (i.e., one processor type) and become fluent in the operation codes (opcodes), instruction sets, processor design, and methods by which the processor uses internal memory to read and write to RAM. Only after you master the basics of the processor operation can you start to reverse engineer a program. Fortunately, most processors operate similarly, with slight variations in syntax and use of internal processor memory.
Since our target in this chapter is the ARM processor used by PDAs, we provide some of the basic information you need to know, or at least to be familiar with, before attempting to study a program meant to run on this type of processor. The rest of this section describes the ARM processor, its major opcodes and their hex equivalents, and how its memory is used. If you do not understand this information, you may have some difficulty with the rest of this chapter.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Practical CE Reverse Engineering
For this section, you will need to use the tools described in previous chapters, including hex editors and disassemblers. We start by creating a simple "Hello World!" application, and we then use this program to demonstrate several cracking methods. After this discussion, we offer a hands-on tutorial that allows you to walk through real-life examples of how reverse engineering can be used to get to the heart of a program.
When learning a programming language, the first thing most people do is to create the famous "Hello, World" application. This program is simple, but it helps to get a new programmer familiar with the syntax structure, compiling steps, and general layout of the tool used to create the program. In fact, Microsoft's eMbedded Visual C++ goes so far as to provide its users with a wizard that creates a basic "Hello World" application with the click of a few buttons. The following are the required steps:
  1. Open Microsoft eMbedded Visual C++.
  2. Click File New.
  3. Select the Projects tab.
  4. In the "Project Name:" field, type "test", as illustrated in Figure 4-2. Select WCE Application on the left.
Figure 4-2: WCE application creation window
By default, all compiled executables will be created in the C:\Program Files\Microsoft eMbedded Tools\Common\EVC\MyProjects\ directory.
  1. Click OK.
  2. Ensure "A typical `Hello World!' Application" is selected, and click Finish.
  3. Click OK.
We're running the programs on a PDA synchronized with our computer, but the beauty of Microsoft's eMbedded Visual Tools is you don't need a real device. The free MVT has an emulator for virtual testing .
After a few seconds, a new "test" class appears on the left side of the screen, under which are all the classes and functions automatically created by the wizard. We aren't making any changes to the code, so next, we compile and build the executable:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reverse Engineering serial.exe
Now that you've had a simple introduction to RCE on Windows CE, the next section provides a legal and hands-on tutorial of how to bypass serial protection. We describe multiple methods of circumvention of the protection scheme, which shows there's more than one "right" way to do it. We use the previous discussion as a foundation.
For our example, we use our own program, called serial.exe. This program was written in Visual C++ to provide you with a real working product on which to test and practice your newly acquired knowledge. Our program simulates a simple serial number check that imitates those of many professional programs. You will see firsthand how a cracker can reverse engineer a program to allow any serial number, regardless of length or value. To obtain this embedded crackme, please download serial.exe from http://www.securitywarrior.com.

Section 4.4.1.1: Loading the target

You must first load the target file into a disassembler from the local computer, using the steps we covered earlier. In this case, we are targeting a file called serial.exe, written solely for this example (Figure 4-13).
Figure 4-13: serial.exe
Once the program is open, drill down to a point in the program where you can monitor what is happening. As previously discussed, there are several function calls that flag an event worth inspection. For example, using the Names window, we can locate a wcscmp call, which is probably used to validate the entered serial number with the corrected serial number. Using this functions XREF, we can easily locate the chunk of code illustrated in Figure 4-13.
Since serial.exe is a relatively simple program, all the code we need to review and play with is located within a few lines. They are as follows:
.text:00011224             MOV   R4, R0
.text:00011228             ADD   R0, SP, #0xC
.text:0001122C             BL   CString::CString(void)
.text:00011230             ADD   R0, SP, #8
.text:00011234             BL   CString::CString(void)
.text:00011238             ADD   R0, SP, #4
.text:0001123C             BL   CString::CString(void)
.text:00011240             ADD   R0, SP, #0x10
.text:00011244             BL   CString::CString(void)
.text:00011248             ADD   R0, SP, #0
.text:0001124C             BL   CString::CString(void)
.text:00011250             LDR   R1, =unk_131A4
.text:00011254             ADD   R0, SP, #0xC
.text:00011258             BL   CString::operator=(ushort)
.text:0001125C             LDR   R1, =unk_131B0
.text:00011260             ADD   R0, SP, #8
.text:00011264             BL   CString::operator=(ushort)
.text:00011268             LDR   R1, =unk_131E0
.text:0001126C             ADD   R0, SP, #4
.text:00011270             BL   ; CString::operator=(ushort)
.text:00011274             LDR   R1, =unk_1321C
.text:00011278             ADD   R0, SP, #0
.text:0001127C             BL   CString::operator=(ushort)
.text:00011280             MOV   R1, #1
.text:00011284             MOV   R0, R4
.text:00011288             BL   CWnd::UpdateData(int)
.text:0001128C             LDR   R1, [R4,#0x7C]
.text:00011290             LDR   R0, [R1,#-8]
.text:00011294             CMP   R0, #8
.text:00011298             BLT   loc_112E4
.text:0001129C             BGT   loc_112E4
.text:000112A0             LDR   R0, [SP,#0xC]
.text:000112A4             BL   wcscmp
.text:000112A8             MOV   R2, #0
.text:000112AC             MOVS  R3, R0
.text:000112B0             MOV   R0, #1
.text:000112B4             MOVNE  R0, #0
.text:000112B8             ANDS  R3, R0, #0xFF
.text:000112BC             LDRNE  R1, [SP,#8]
.text:000112C0             MOV   R0, R4
.text:000112C4             MOV   R3, #0
.text:000112C8             BNE   loc_112F4
.text:000112CC             LDR   R1, [SP,#4]
.text:000112D0             B    loc_112F4
.text:000112E4 
.text:000112E4 loc_112E4                ; CODE XREF: .text:00011298
.text:000112E4                          ; .text:0001129C
.text:000112E4             LDR   R1, [SP]
.text:000112E8             MOV   R3, #0
.text:000112EC             MOV   R2, #0
.text:000112F0             MOV   R0, R4
.text:000112F4 
.text:000112F4 loc_112F4                ; CODE XREF: .text:000112C8
.text:000112F4                          ; .text:000112D0
.text:000112F4             BL   CWnd_  _MessageBoxW
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
References
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Overflow Attacks
Attacking applications is a core technique for vulnerability researchers. Test engineers can spare a company from needless expense and public embarrassment by finding early exploitation points in the company's software. This chapter reviews a variety of application attack techniques, including buffer overflows and heap overflows. It also builds on the reverse engineering knowledge gained from the previous chapters.
To exploit an overflow, you need a thorough knowledge of assembly language, C++, and the operating system you wish to attack. This chapter describes buffer overflows, traces their evolution, and even walks you through a live sample.
A buffer overflow attack deliberately enters more data than a program was written to handle. The extra data overflows the region of memory set aside to accept it, thus overwriting another region of memory that was meant to hold some of the program's instructions. In the ideal version of this attack, the overflow values introduced become new instructions that give the attacker control of the target processor.
Buffer overflow attacks are not a new phenomenon. For example, the original Morris worm in 1988 used a buffer overflow. In fact, the issue of buffer overflow risks to computer systems has been recognized since the 1960s.
Buffer overflows result from an inherent weakness in the C++ programming language. The problem (which is inherited from C and likewise found in other languages, such as Fortran) is that C++ does not automatically perform bounds-checking when passing data. To understand this concept, consider the following sample code that illustrates how a C/C++ function returns data to the main program:
// lunch.cpp : Overflowing the stomach buffer

#include <stdafx.h>
#include <stdio.h>
#include <string.h>

void bigmac(char *p);

int main(int argc, char *argv[])
{
    bigmac("Could you supersize that please?"); // size > 9 overflows
    return 0;
}

void bigmac(char *p)
{
     char stomach[10]; //limit the size to 10
     strcpy(stomach, p);
     printf(stomach);
}
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Buffer Overflows
To exploit an overflow, you need a thorough knowledge of assembly language, C++, and the operating system you wish to attack. This chapter describes buffer overflows, traces their evolution, and even walks you through a live sample.
A buffer overflow attack deliberately enters more data than a program was written to handle. The extra data overflows the region of memory set aside to accept it, thus overwriting another region of memory that was meant to hold some of the program's instructions. In the ideal version of this attack, the overflow values introduced become new instructions that give the attacker control of the target processor.
Buffer overflow attacks are not a new phenomenon. For example, the original Morris worm in 1988 used a buffer overflow. In fact, the issue of buffer overflow risks to computer systems has been recognized since the 1960s.
Buffer overflows result from an inherent weakness in the C++ programming language. The problem (which is inherited from C and likewise found in other languages, such as Fortran) is that C++ does not automatically perform bounds-checking when passing data. To understand this concept, consider the following sample code that illustrates how a C/C++ function returns data to the main program:
// lunch.cpp : Overflowing the stomach buffer

#include <stdafx.h>
#include <stdio.h>
#include <string.h>

void bigmac(char *p);

int main(int argc, char *argv[])
{
    bigmac("Could you supersize that please?"); // size > 9 overflows
    return 0;
}

void bigmac(char *p)
{
     char stomach[10]; //limit the size to 10
     strcpy(stomach, p);
     printf(stomach);
}
To test this program, you compile it using a C++ compiler. Although the program compiles without errors, when we execute it we get a program crash similar to Figure 5-1.
Figure 5-1: Buffer overflow crash
What happened? When this program executes, it calls the function bigmac and passes it the long string "Could you supersize that please?" Unfortunately,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Understanding Buffers
Buffer overflows are a leading type of security vulnerability. In order to understand how a hacker can use a buffer overflow to infiltrate or crash a computer, you need to understand exactly what a buffer is.
This section provides a basic introduction to buffers; experienced users should skip ahead to Section 5.3.
A computer program consists of code that accesses variables stored in various locations in memory. As a program is executed, each variable is assigned a specific amount of memory, determined by the type of information the variable is expected to hold. For example, a Short Integer only needs a little bit of memory, whereas a Long Integer needs more space in the computer's memory (RAM). There are many different possible types of variables, each with its own predefined memory length. The space set aside in the memory is used to store information that the program needs for its execution. The program stores the value of a variable in this memory space, then pulls the value back out of memory when it's needed. This virtual space is called a buffer.
A good analogy for a buffer is a categorized CD collection. You have probably seen the tall CD towers that hold about 300 CDs. Your computer's memory is similar to a CD holder. The difference is that a computer can have millions of slots that are used to store information, compared to the relatively limited space on a CD rack. Our example CD collection consists of three main categories: Oldies, Classical, and Pop Rock (Figure 5-2). Logically, we would separate the 300 slots into 3 parts, with 100 slots for each genre of music. The bottom 100 of the CD holder is set aside for Oldies, the middle 100 is for Classical, and the top 100 contains Pop. Each slot is labeled with a number; you know where each type of music begins and ends based on the slot number.
Figure 5-2: A segmented CD rack is similar to a buffer
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Smashing the Stack
Content preview·Buy PDF of this chapter|