Ghidra is a powerful tool. Its impressive scripting API can enable researchers to discover potentially vulnerable code within an application they’re testing without relying on long-running and power-intensive fuzzing techniques.
In the application security world, continuous testing of your company's applications – and the applications consistently used within your environment – is a must. Unfortunately, when testing other applications in your environment, you won’t typically have access to the source code. This limitation forces you to research in a blackbox environment.
The thing is, blackbox research can come with many challenges. For instance, you can never fully obtain absolute coverage of the application without the source code. Fuzzers such as Honggfuzz and AFL++ are recommended to get adequate coverage of memory-based vulnerabilities. But some memory vulnerabilities, like a double-free (CWE-415), are more straightforward. Still, they can have a critical impact on its running system. The simplicity of this vulnerability makes it a prime candidate for automated code analysis.
What are Ghidra and Ghidra Scripting?
Ghidra is an open source reverse engineering tool that enables the analysis of compiled executables in many architectures. Ghidra has numerous features, but its primary use is as a disassembler and decompiler.
We use the disassembler with Ghidra scripting via Jython to detect double-free vulnerabilities. In addition, the Ghidra API enables you to automate simple research and tracing of instructions within the application and numerous other things. You can find the Ghidra API here.
What Is a Double-Free Vulnerability?
As described by Mitre, a double-free vulnerability occurs when a C/C++ application calls the free() function twice on the same memory address. The following code demonstrates a straightforward double-free vulnerability:
From this example code, we can see that the heap1 and heap2 handle for malloc is created. Before being used, the heap2 handle is freed. After the strncpy() execution with heap1, heap2 is freed again, resulting in a double-free vulnerability within the heap2 handle.
How to Approach the Problem
Detecting potential double-free vulnerabilities within Ghidra can be broken up into multiple generic steps:
Identify all references to free() within the code
Backward trace instructions from the initial free call to the source memory allocation handle (such as malloc)
Compare all free traces to see if any share the same original memory allocation handle
Output from the trace comparisons doesn’t guarantee the shared handles are a double-free vulnerability. For example, freeing memory within a conditional will trigger but may not necessarily be vulnerable. However, this can be easily determined through a quick manual analysis.
Automation Through Ghidra Scripting
The source code used can be found here.
1.0 Obtaining All References to Free()
Before obtaining the references to free(), we first need to find the free() function external pointer. Due to the different structures of executables, the process for obtaining the external pointer is different between Windows (PE) executables and Linux (ELF) executables.
For Windows (PE) executables, you can utilize the symbol table to search for external symbols called “free.”
Linux (ELF) executables are slightly different as you'll need to obtain the initial pointer from the Relocation Table. Since, by design, the “free” in the relocation table is a pointer to the true free location, we need to grab the value/reference of that pointer to obtain the true free() references within the application.
With the free external pointer location, we can now loop through the external references to the free() function to find all references within the main application code. We want to grab as much information as possible from these references for potential future use, so we will grab the reference’s address, function name, and offset.
We’ve now successfully obtained all free references (if the application has any) and can begin tracing the instructions backward to get a full trace from memory allocation to free call.
2.0 Tracing Instructions
2.1 Basic Register Tracing
The tracing is the meat and potatoes of getting this detection to work. Failed tracing will result in an inability to determine what memory allocation handle a free is using. We need to start with an initial register to track to initiate the tracing. The free() function only takes a single parameter, which is the handle for the memory allocation. Once freed, that will be the one to track to understand what memory allocation handle is being used. In x64 architecture, Linux and Windows use different calling conventions. For Windows, the first parameter is stored in the RCX register, whereas in Linux, the first parameter is stored in the RDI register.
With the first known target register, we can begin backtracking instructions starting at the free() call until we find the register based on the calling convention in the destination operand, which is also a MOV instruction. For Windows, we may be looking at an initial instruction like the following:
The initial tracing was accomplished as the following via the Ghidra API.
We then need to continue tracing back until an instruction's destination operand (register) is equivalent to the last traced instruction's source operand (register). So, for example, the next traced instruction may be:
The traceInterFuncInstructions function loops through and traces all instructions until the source operand is equivalent to RAX.
If the source operand is equivalent to RAX, we know there are no more registers to trace. As for both Linux and Windows x64 calling conventions, the RAX register stores the return value for a function call. Since we’re tracing the free call, we know that the return value will be from the memory allocation call such as malloc. Therefore, we need to continue tracing back until the most recent CALL instruction, which will be the memory allocation handle origin.
We have now completed a basic trace consisting of only register operands within the same function.
2.2 Tracing Through Function Calls
We have basic inter-function tracing working, but what if the memory allocation handle is passed as a parameter from another function? We will have to deal with two scenarios based on x64 calling conventions for Windows and Linux. For both architectures, after a certain number of parameters are passed within the function, they will be placed upon the stack due to their set calling convention rules. For x64 Windows, the calling convention looks like this:
RCX - First parameter
RDX - Second parameter
R8 - Third parameter
R9 - Fourth parameter
[RSP+0x20] - Fifth parameter
[RSP+0x28] - Sixth parameter
In the case of Windows, any parameter after the fourth will be added to the stack after the 32 byte required shadow space. On the other side, x64 Linux calling convention looks like this:
RDI - First parameter
RSI - Second parameter
RDX - Third parameter
RCX - Fourth parameter
R8 - Fifth parameter
R9 - Sixth parameter
[RSP] - Seventh parameter
[RSP+0x8] - Eighth parameter
In the case of Linux, any parameter after the sixth will push directly to the top of the stack. For now, we’ll discuss how to track memory allocation handles passed via parameters not pushed onto the stack and discuss coverage of that in the next section.
The first thing we must do is detect when we should search out of the current function. We can check whether we’ve found the RAX register within the source operand before hitting the current function's first entry point.
Once we are at the function's entry point, we need to grab all references to the current function. From the references, we should remove any instances of recursion as tracing those would be pointless and not find the memory allocation handle.
With the cleaned-up references, we can now begin an inter-function trace for each reference starting at the instruction before the CALL instruction. This method brute-forces each reference and searches them for the correct register within the source operand.
Once the inter-function tracing has successfully identified RAX within the source operand, it will know that the tracing successfully found the source of the memory allocation through the function calls.
2.3 Tracing Through Stack Manipulation
In assembly, sometimes parameters and variables are stored onto the stack. For example, the return value will be placed into the RAX register when a function is called in the Linux GCC compiler due to standard calling convention rules. However, the compiler will then move the value of RAX onto the stack based on an offset of RBP (the stack's base pointer).
We’re tracing the operand objects of the instructions and not registers specifically, so we won’t have any issues tracing this as the stack offsets won’t change. The value is directly placed into already open/available memory spaces.
Again, function parameters may be stored on the stack if the number of parameters in a function exceeds a specific amount. In Windows, for example, the fifth parameter may be stored on the stack with the following instruction:
We can see that the fifth parameter is placed 0x20 (32) bytes or five offsets off the stack pointer (when including RSP). If we continue tracing the instruction through the function call and see where the value is grabbed after the call, we’ll see this.
Initially, the value was placed at RSP+0x20. But when grabbing it back, we are at RSP+0xA0. What happened? That’s a difference of 128 bytes! Between the time the value was placed on the stack and when it was retrieved, instructions were executed that caused the stack pointer, and more importantly, our data, to change in location. To know where our data will be on the stack before a function call, we need to track all stack pointer manipulating instructions to find the original stack offset of the data.
In Ghidra, we must first determine whether an instruction’s source operand is a pointer to an offset related to the stack. We can do this by checking whether the operand type is equivalent to the operand type values equal to a pointer address within Ghidra’s constant values. Pointer addresses within Ghidra are deemed both an ADDRESS and DYNAMIC type, so we would need to be searching for the combination value of both types.
After checking if a source operand is equivalent to a dynamic address type, we must check if the register utilized is equivalent to RSP, which shows that the instruction is modifying the top of the stack. If the value is not equivalent to RSP, and instead, it is something like RBP (the base pointer), the program would be storing the value directly onto the stack for later usage.
Now that we know it’s a pointer address regarding RSP, we need to ensure that we know the original stack offset prior to the function call. To do this, we must track all instructions that will change the value of RSP, either directly or indirectly. The direct modifications to RSP would be the ADD or SUB instructions, like the following instructions:
These directly subtract or add a set amount to the stack. The first instruction would subtract 0x20 (32) bytes from the stack address, while the second instruction would add 0x10 (16) bytes to the stack address. We need to add or subtract based on these direct changes to the stack to know the original stack offset. It’s important to note that because the stack grows down, subtracting from RSP actually increases the size of the stack. Conversely, adding to RSP decreases the size of the stack. Due to this, when we’re met with a subtraction instruction, we need to subtract from our stack offset the same value (since we are tracing backward), despite the instruction increasing the stack size.
The (more) indirect modifications to RSP would be from POP, PUSH, and the initial CALL instruction itself.
In x64 architecture, the POP instruction will subtract 8 bytes from the stack pointer (RSP), and the PUSH instruction will add 8 bytes to the stack pointer (RSP). Since we are still tracing backward, a POP will need to be added to the offset, whereas a PUSH instruction will be subtracted from the offset. Another indirect modification to RSP results from the initial CALL instruction to the function. When the CALL instruction is performed, the return address of the call pushes to the stack. Thus, we need to also subtract 8 bytes from the offset when we reach the start of the called function.
Once the stack offset is calculated, we can execute an external function trace with the newly calculated stack offset, allowing for full tracing of parameters from function calls with stack parameters.
3.0 Free Trace Comparisons
After all tracing has been attempted, we should have an array of all traces for all free calls from their initial memory allocation. We can check for multiple usages of the same memory allocation handle that end in a free call. Various uses of the same memory allocation handle can indicate that the handle could be freed twice, resulting in a double-free vulnerability. As stated above, this method of discovering double free vulnerabilities is not foolproof. It may result in false positives due to multiple free calls within a conditional such as an if/else statement. However, a simple manual analysis of the trace can determine the validity of the vulnerability or not.
With the trace instruction arrays, we can check for double-free vulnerabilities by grabbing the address of the memory allocation CALL instruction of each trace and comparing them to see if there are any duplicate memory addresses. If the memory address is the same for any array, we have a potential double-free vulnerability within the application.
With the confirmed potential double-free vulnerabilities, we can output the full trace instructions to the user for manual confirmation and analysis.
4.0 Testing with Different Application Test Cases
We used multiple test case executables during the testing process to ensure that the application could trace through different scenarios.
4.1 x64 Windows - Visual Studio - Simple Trace
The first test case used was a simple 64 bit Windows executable compiled with Visual Studio that contained a double free within the main function.
4.2 x64 Windows - Visual Studio - More Complex Trace
The more extended test case used was another 64 bit Windows executable compiled with Visual Studio. Still, the double free had to trace through function calls because the handle was passed as a parameter.
4.3 x64 Linux - GCC - More Complex Trace
This test case was similar to test case 4.2, only modified and compiled for GCC on Linux.
Double-Free Detection in Ghidra: The Takeaway
In the end, Ghidra can be a valuable tool to add to any application security engineer's arsenal. It’s especially beneficial when you face identifying vulnerabilities within a blackbox environment.
Like standard source code analysis, you can always identify vulnerabilities within Ghidra via source-to-sink analysis. This allows Ghidra scripting to rise as a powerful way to automate the source-to-sink trace and vulnerability potentiality of said trace.
Automox for Easy IT Operations
Automox is the cloud-native IT operations platform for modern organizations. It makes it easy to keep every endpoint automatically configured, patched, and secured – anywhere in the world. With the push of a button, IT admins can fix critical vulnerabilities faster, slash cost and complexity, and win back hours in their day.
Demo Automox and join thousands of companies transforming IT operations into a strategic business driver.