Chapter 1. Overview of Ghidra
Ghidra (pronounced GEE-druh with a hard g) is a reverse engineering framework, developed by the United States National Security Agency (NSA). It is one of many tools that have been released as open source by the NSA in recent years. In this section, we are going to go over where this tool comes from, as well as the ways it can be useful for you. This includes some use cases to talk about some of the reasons for using Ghidra, though these are not considered to be comprehensive. You may find as you are reading that you have other tasks you would like to set Ghidra to.
Note
If you’re interested in perusing the complete list of all the open source projects at the NSA, you can go to https://code.nsa.gov.
In a way, Ghidra is a unique tool. You don’t see a lot of reverse engineering frameworks around, though there are a lot of other tools reverse engineers use. Certainly there are a lot of frameworks that are available for security testing—most specifically penetration testing. Normally, if you wanted to reverse engineer software, you would use a disassembler like IDA Pro, OllyDbg, or the Immunity Debugger. Ghidra provides the same functionality you would expect to see in a more traditional debugger like those mentioned above. It also provides some additional features, including the ability to extend what you get, which is what makes it a framework rather than just a debugger.
Quick Features Overview
Certainly, Ghidra will do the disassembly that any other disassembler will do. It also is capable of handling programs that are compiled for different platforms like Linux, Windows, and macOS. Additionally, Ghidra supports multiple processor families, including the standard Intel-based processors, Sparc, PowerPC, ARM, Z80, and many others. This includes Java bytecode in addition to hardware processors.
One of the features that makes Ghidra a framework rather than just a disassembler, even a feature-rich disassembler, is the addition of the ability to use scripting languages to interact with Ghidra and extend it. This is something you can also do with the Immunity Debugger, since it was written in a scripting language, but Ghidra provides quite a bit more functionality than just being a debugger/disassembler. Ghidra was developed by a group of professionals who spend a lot of time reverse engineering software. As a result, the focus of Ghidra has been to make a workflow that is normally very tedious quite a bit easier. This is not to say that streamlined workflow will be beneficial for everyone—different people approach reverse engineering in different ways than other people might. Reverse engineering is as much an art as it is a science.
A major advantage of Ghidra is that it will decompile the object code back to source code. This means you don’t have to do the hard work of trying to read the assembly language. As an example, in Figure 1-1, you can see a section of object code that was translated into assembly language on the lefthand side. On the righthand side, the assembly was turned back into source code. The C code that is shown is very similar to the original source.
All of this functionality makes the job of reverse engineering any software considerably easier. The fact that it can be had without paying a lot of money makes it even more appealing. Some people do have some concerns about its origins, in spite of the amount of functionality and the lack of acquisition costs.
Origins
Believe it or not, the NSA has a long history of offering open source software to the security community in particular. Security Enhanced Linux (SELinux) was developed by the NSA as a way to add an essential security feature into Linux since it had been missing since the origins of Linux in AT&T Unix—Mandatory Access Control (MAC). This is not the only project that the NSA has released back to the public. The NSA has a Technology Transfer Program that is responsible for taking research performed at the NSA and transferring it to industry, academia, or other research organizations. Keep in mind that everything the NSA does is taxpayer funded, so it’s not entirely altruistic for this program. As long as the release of technology doesn’t expose or impair essential functions of the NSA, it has the potential to be handed off.
While one might cynically assume that the release of a cache of tools developed by the NSA by the Shadow Brokers would have been the light shining under the rock that triggered a release of tools as open source, there is evidence that the NSA is looking out for the community and giving back. If nothing else, the release of SELinux many years ago is a good example. As noted above, there are several projects the NSA has made open source so anyone can download and get access to the software. This includes the source code, of course, so you can see exactly what the program is doing.
One comment you will often hear from security professionals when they hear the NSA has released software as open source is that the software will have backdoors. This may be said in a joking way, though those of us who have been around for a while are well aware of the storied and secretive history of the NSA—especially their remit, which involves listening in on communications, known as signals intelligence (SIGINT). The advantage of releasing software open source is anyone can look at the source to determine whether there are any backdoors or other ways for the NSA to monitor what the software is being used for. To date, there is no evidence that any software released by the NSA has any such backdoor.
Use Cases
Why would you want to use Ghidra? Perhaps you already have some good ideas for it, since you are reading this. There are a few use cases for using Ghidra. One is to analyze malware more efficiently. After all, malware has become a predominant means for modern adversaries to get what they are after. Another reason is to better understand the differences between compilers.
Malware Analysis
Malware is a serious problem in the world today. You can open your favorite news source or check Twitter, Reddit, or anywhere else that talks about what is happening and you will find stories about attacks on companies. As often as not, these attacks include the use of malicious software at some point. Malicious software is difficult to work with. For a start, it’s malicious. You can’t just run it on your system and see what it does and how it behaves. There are some traditional tools for assessing what programs are doing, such as strace and ltrace on Unix-like platforms. These execute the program and trap behaviors so they can be reported on. You don’t want to do that on your own system. You’ll end up with an infected system that won’t be especially reliable in what it reports back to you.
Additionally, malware developers will usually go to some lengths to obscure the actual functionality of the code. This is done so the program can get past anti-malware software. This makes performing a static analysis like looking at the disassembled code much harder. What you would see is a little stub program, generally, that either decrypts or decompresses the actual program, which is stored as data. A tool like Ghidra can make life for a reverse engineer a little easier because of the ability to add in tools that may, for example, do the decrypting or decompression of the data in the program. You can also get the decompiled stub program back rather than spending a lot of time trying to assess the compiled code.
Compiler Comparison
Compilers are used to convert source code into object code. The object code is expressed as numeric values that first tell the processor which function inside the processor to call. The processor is actually composed of several sections of what are essentially hardware functions. If you call an ADD, for instance, the processor knows which set of integrated circuitry to run through. In addition to the processor instruction, or opcode, there is additional data. Again, this data is expressed numerically, and is often an address where a parameter to the operation is stored. The address is a location in memory. So the compiler has to take all the text that we can read and convert it to numeric values the processor understands. This is not a direct conversion. The code you can see in Figure 1-2, for instance, doesn’t convert directly to numeric values.
There is no for
operation on your processor, for instance. That single instruction has to be converted into a number of opcode steps. Essentially a loop has to be created by the compiler that includes a comparison operation and then a jump operation. The comparison checks to see whether the loop counter has hit the value set in the for
statement. This is not a straightforward task in most cases. Different compilers will handle this conversion differently. As a result, being able to see the results of the compilation of the same source code by two different compilers can let you compare the efficiency of the compilers. Being able to look at the assembly language and especially the decompiled program is useful. Ghidra may not return the exact same C code as was originally written but from our experience, it’s incredibly close.
Learning
Certainly you are learning with both use cases above. However, you may simply want to learn how a particular program works. This requires reverse engineering a compiled binary so you can look at the actual code of it. This will help you look at the program. Being able to look at the program can help you understand how it is functioning. You’ll be able to look at the assembly language and also be able to look at the source code where Ghidra is able to convert the assembly back to a higher level programming language. You can see a little bit of how some programs have been constructed in this way. This may help you improve your own programming practices, once you have seen how the compiler generates code.
Efficiency Improvements
Large programming projects can be challenging to effectively understand in toto. This is another area Ghidra can be helpful. As an example, you can look at a Function Graph to see how the different functions within the program interact with each other. You can see an example of a Function Graph from Ghidra in Figure 1-3. This shows you what functions call what other functions. If you have a function that is used very regularly, you may be able to improve the efficiency of your program by tuning up that function. What you see in Figure 1-3 may be hard to read. You can also take a look at a Function Call Graph. This just uses nodes that are named by the name of the function. It makes for an easier graph to read.
Additionally, you can take a look at other analysis features of Ghidra. For instance, you can highlight dead subroutines. These would be subroutines or functions that are unused. If they are unused, you might ask whether they are essential to the program. You can also get similar information from the graphs mentioned above. You may see outliers in your graph because they are functions that simply don’t get called. This may tell you something about how your program has been developed. On larger scale projects, this may be essential, since it’s harder to get a broader understanding of what your program looks like and what functions are in the program, as well as what they are being used for.
What’s Next?
To get you immersed quickly into the world of Ghidra, we’re going to take you through obtaining and installing Ghidra, which isn’t just a simple “run an executable and it does the installation for you” installation; however, it’s also not so hard that you should be running away. Then, we’ll walk you through some functions of Ghidra so you can get comfortable with its capabilities before taking it for a spin with your own code, or someone else’s code you’d like to take a look at. There is a lot to see in Ghidra, so to prevent you from getting overwhelmed we’re not going too deep. This is meant to be a primer, not a deep dive. Think of it as a taster plate or a small flight of prime ales. Before we can get to really looking at any code, we have to get Ghidra and get it installed, so let’s start there.
Get Getting Started with Ghidra now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.