Developing Kernel Modules for Analysis and Fuzzing

June 19, 2023

By: Christopher Vella

detour_img

Summary

One of my first Kernel projects years ago started as a learning exercise to get comfortable analyzing drivers at runtime via hooks and logging, this turned out to be pretty useful as a tool to assist with fuzzing and executing of certain attack classes (incl. replay attacks) such that I find myself leveraging it still today.

As a method of examining a few Windows internals and Kernel programming techniques, alongside a form of documentation for the tool, this post will cover some of the design and implementation details of the driver and related internals.

Usefulness

Let’s start with a few of the reasons why you may leverage a custom driver for runtime analysis of other drivers.

Kernel drivers from a security perspective can contain attack surface reachable both remotely and locally, whether its from parsing functionality triggered by callbacks (such as filesystem or network filter drivers) or called directly with untrusted input (as the case with IOCTLs). Either way, obtaining insights into the data it routinely parsers and associated metadata (where the data came from) can be leveraged for vulnerability research.

Even disregarding memory corruption entirely, interesting logic bugs may present themselves when analyzing data obtained via introspection, including replay attacks. Consider the scenario of an EDR installed on a system which includes a Kernel driver (as practically all do), when that EDR is uninstalled it could theoretically remove the driver via an IOCTL from the uninstaller to the driver that triggers cleanup and removal, its also possible that if we capture that input on one machine, the driver is designed such that we could replay it on other machines with the EDR and it’d trigger the driver to remove itself without going through the uninstaller (assuming in this case the uninstaller requires administrator privileges, while interacting with the driver does not, as can be the case).

There are other cases where replay attacks could be useful against EDRs, including cases where IOCTLs are issued to add exclusions to any file or process scanning. These are only a few examples of capabilities that can be leveraged via introspection and logging for third-party drivers.

Recap on Drivers

Let’s take a moment to recap some of the common concepts around Windows Kernel drivers, noting that there’s plenty of public information for a lot of these topics already (including thorough books on device driver development)

Kernel and User Privilege Separation

We’re starting with a look at a few differences between the Kernel and user parts of an operating system such as how that separation is implemented and how transitions between Kernel and user code can occur.

Virtual Memory

While not always the case, most of the time (on x86-64 CPUs) our code in user or Kernel will be accessing memory through a translation layer, this translation layer controls what physical memory your (virtual) memory access actually accesses.

For example, take the following psuedocode:

 

    
     int x = 100;
int* y = &x;
*y = 5;
    
   

Pretty simple, we have a variable `x` that gets set to the value 100, we have another variable `y` that gets set to the address of the variable `x` (meaning, `y` will point to the address (likely on stack) containing `x`), then we dereference the pointer `y` to set the slot owned by `x` to the value `5`.

Consider how this may appear in assembly:

    
     mov    DWORD PTR [rsp],0x64 // rsp = pointer to `x` on the stack
lea    rax,[rsp] // sets `rax` to the address containing [rsp], meaning the address of `x`
mov    QWORD PTR [rsp+0x8],rax // Store the address of `x` in a stack variable (`y`)
mov    DWORD PTR [rax],0x5 // Store the value `5` in the address stored in `rax`, which is the address of `x`
    
   

The textual representation of assembly above shows a few memory accesses happening via the square brackets, e.g. `[rsp]`,`[rax]` or `[rsp+0x8]`.

If we assume the stack pointer `rsp` has the value `0x1337`, then the line:

    
     mov    DWORD PTR [rsp],0x64
    
   

Can be rewritten (only for cases where `rsp` actually is the value `0x1337`) to:

    
     mov    DWORD PTR [0x1337],0x64
    
   

Which roughly translates to “Store the 32-bit representation of the value 0x64 in the 32-bit slot of memory located at address 0x1337”.

Now without any translation layer, the address `0x1337` would directly correspond to slot `0x1337` in physical RAM, however due to virtual memory a translation takes place to obtain the physical address.

The data structures that control this translation are pointed to by the `CR3` register, which holds the physical address of the page directory pointer table, while we won’t go too in-depth here, you can find more information on the official intel/amd docs or other resources such as the osdev wiki here.

Ultimately the key to understand here is that different `CR3` register values may point to different page tables, and each page table can map a virtual address to any physical address.

This means if you have two processes `A` and `B`, and process A dereferences the address `0x1337`, while process B also dereferences the same address, if they contain different CR3 values they may end up accessing different physical memory.

For example, if paging in process A translates virtual address `0x1337` to physical address `0x3000` and process B translates `0x1337` to `0x4120`, then even if both processes access the same virtual address `0x1337`, they’ll actually be accessing different physical memory as shown below:

This translation process prevents code from directly referencing physical memory, as a result you can completely prevent processes from interacting with each other’s memory by ensuring no translation exists from the virtual memory of one process (such as process `A` above) to the physical address contained in another processes translation mappings.

The same concept applies to the separation of user and Kernel memory, where typically the Kernel will not create a translation permitting user code to access kernel code. It should also be noted that Kernel components (such as different Drivers) usually share a CR3, or at the least share translations such that they have access to the same memory, meaning one Driver may be able to access memory of other Drivers.

There’s other protection primitives available too, for example because Kernel and user are CPU supported modes via the `cpl` register, where a cpl of 0 can be viewed as Kernel (or “supervisor”) mode, and a cpl of 3 is user mode, virtual memory translation can also mark certain mappings as only accessible when the CPU is in Kernel mode, further restricting access to parts of memory based on the CPU mode.

Instruction Execution Restrictions

While we won’t explore all the differences between Kernel and user (cpl 0 and cpl 3) modes, there’s one point we should clarify, otherwise you may be wondering what’s stopping a process from arbitrarily changing its page tables to make a translation possible from a virtual address to any target physical address (thereby breaking any security guarantees from the translation process).

This is the fact that the CPU prevents certain instructions from executing in cpl 3, meaning some operations such as modifying your CR3 register value in user mode won’t work, instead it generates an interrupt to the Kernel (essentially reporting the User code’s attempted restricted operation). The Kernel is likely to then forward an exception to the user process that attempted the restricted operation.

Overall, it should be clear the Kernel has more power (read: capabilities) than code executing in user, and why vulnerabilities in Kernel code can be particularly critical. 

IOCTLs and Device Driver Interfaces

Now we’ll examine common methods of communication between user code and Kernel drivers, these are also well known and publicly documented so the below will not be overly in-depth but serve to provide a useful understanding of the components.

Drivers and Devices

Drivers (typically stored as “.sys” files on Windows) don’t necessarily need to communicate with user code, and may not expose options for user code to directly communicate with them.
Drivers that do want to offer such communication from user may create named Device objects (see docs here for further information). These objects can be accessed similar to files for User code (typically with the prefix \\.\, or when escaped you may see it as \\\\.\\) for example take the following code snippet:
    
     #define USR_DEVICE_NAME  L"\\\\.\\some_device_name"

HANDLE hDevice = CreateFileW(USR_DEVICE_NAME,

  GENERIC_READ | GENERIC_WRITE,

  FILE_SHARE_READ | FILE_SHARE_WRITE,

  NULL,

  OPEN_EXISTING,

  0,

  NULL);
    
   

This user code snippet attempts to open the Device named “some_device_name” with read and write permissions. The Device may have been created by a Kernel driver with the following code snippet:

    
     #define NT_DEVICE_NAME      L"\\Device\\some_device_name"

RtlInitUnicodeString( &ntUnicodeString, NT_DEVICE_NAME );

    ntStatus = IoCreateDevice(
        DriverObject,                   // Our Driver Object
        0,                              // We don't use a device extension
        &ntUnicodeString,               // Device name "\Device\some_device_name"
        FILE_DEVICE_UNKNOWN,            // Device type
        FILE_DEVICE_SECURE_OPEN,     // Device characteristics
        FALSE,                          // Not an exclusive device
        &deviceObject );
    
   

Though for the Kernel to make the Device accessible from User code, it also has to expose it via a symlink into a namespace accessible from User. An example of this may look like:

    
     #define DOS_DEVICE_NAME     L"\\DosDevices\\some_device_name"
RtlInitUnicodeString( &ntWin32NameString, DOS_DEVICE_NAME );

    //
    // Create a symbolic link between our device name  and the Win32 name
    //

    ntStatus = IoCreateSymbolicLink(
                        &ntWin32NameString, &ntUnicodeString );
    
   

This exposes our Device “\Device\some_device_name” to a user-accessible namespace “\DosDevices\some_device_name” which can be referenced by user code via “\\.\some_device_name”.

These code snippets are modified from Microsoft samples (see the sample here).

The above samples demonstrate how Drivers may create named Devices and expose them such that User code can obtain handles to them as they would a file through APIs such as CreateFileW.

Communicating via Device Handles

So now that user code has a handle to a Device object, there are a few ways to pass data between user code and the Device / Driver.

One such method is via file read and writes, where, as you may infer, a read using the file APIs (such as “ReadFile“) provides the Device an option to send data to a user provided buffer, and a write (such as “WriteFile“) provides the user a means to send data to a Device Driver.

These methods translate to different handler functions that may be implemented by a Driver, these requests are represented by their associated IRP codes, named `IRP_MJ_READ` and `IRP_MJ_WRITE`.

Another method for communication that supports simultaneous sending and receiving of data is referred to by the IRP code `IRP_MJ_DEVICE_CONTROL`, issued by the API DeviceIoControl and commonly referred to as IOCTL (I/O Control). As you can see in the linked API signature, the user can provide both an input and output buffer which will be sent to the Driver, enabling the driver to process data from the input buffer and place data in the output buffer.

Note that there are actually a second set of handlers optionally used by Drivers that intercept these user requests (read/write/ioctl) first, and only if they return false will the request be forwarded to the Driver’s IRP handlers, these are referred to as “FastIO routines“.

This matters more when we look into intercepting requests to Drivers, as we may have to intercept multiple locations to cover both IRP and FastI/O requests.

While there are far more details to the inner workings of the above communication methods, its enough to know they exist and roughly how they work for us to continue ahead with our examination of our custom analysis driver.

Writing our Analysis Driver

At a high level I had an initial set of requirements for the Driver:
  • Intercept IRP and FastI/O routines of other Drivers without interfering with their operation
  • Ability to target Drivers from their name
  • Accept target Driver names from a user application
  • Log intercepted information to disk
Some of the above appears simple, though as with all Kernel code we have to think about safety, especially around concurrency as Driver IRP and FastI/O routines may be called by multiple threads and operate in parallel.
To demonstrate this further, lets look at the following example:
    
     UINT64* somePointer = 0;

NTSTATUS
IoDeviceControlHandler(
 PDEVICE_OBJECT DeviceObject,
 PIRP Irp
)
{
    if (somePointer == 0) {
        somePointer = ExAllocatePool2(POOL_FLAG_NON_PAGED, sizeof(UINT64),'Exml');
        *somePointer = 4;
    } else {
        ExFreePool(somePointer);
            somePointer = 0;
        }
}
    
   

In this case on the first call to `IoDeviceControlHandler` the global `somePointer` will be 0 and will go through the if statement to be allocated memory and dereferenced to set the pointed-to value to `4`, on the next call it’d go down the else branch, be freed and set to 0, the third call will then be the same as the first call, etc.

This is fine when called without concurrency, however with concurrency the possibility arises where one thread (A) is executing the first case and finished line 10 such that somePointer is now valid, however before it executed line 11 another thread (B) performs the check at line 9 and goes down the else branch and executed line 13 which will free the `somePointer`, now when thread A continues and executed line 11, it’ll be dereferencing freed memory (also known as a use-after-free (UAF) bug).

There are multiple solutions here, a common one is to wrap the pointer in a structure that contains a lock and ensuring all threads synchronize via obtaining the lock before operating with the pointer.

This is the solution taken in various parts of our Driver, observable by the mutex calls, e.g.:

    
     ExAcquireFastMutex(hookList->lock);
    
   

Our Driver contains multiple components that roughly work as described:

  1. IOCTLDumpClient (“client”) accepts a Device path from the user (e.g. \Device\SomeDeviceToHook)
  2. Our client creates the `HookRequest` structure that gets sent to our Driver, in this case it mostly just contains the Device path we provide
  3. Our Driver receives our request and obtains a pointer to the Driver that owns the Device path we provided
  4. Our Driver obtains a handle to the Device object associated with the Device path via the API `IoGetDeviceObjectPointer`
  5. The Device object (defined here) contains a pointer to its owning Driver (see the `DriverObject` field)
  6. By utilizing the Driver object pointer, we obtain pointers into the Driver’s FastI/O dispatch table and IRP/MajorFunction pointers
  7. Our Driver replaces the addresses stored in the dispatch routines with our own Driver’s intercepting functions (while storing the original pointers)
The above doesn’t cover some of the specifics of the implementation (the code is the best place to see that level of detail).
At this point when certain operations are triggered on the target / hooked Drivers (such as IOCTL requests) they instead go to our own Driver (for example, see the function `DeviceIoHookD`).
Now let’s cover roughly what happens when our Driver’s intercepting functions (such as `DeviceIoHookD`) is called:
  1.  We create the folder path on disk for the Device Driver that received the request (under “C:\DriverHooks”) if it doesn’t exist
  2. We collect any applicable metadata depending on the hooked function, such as IOCTL, input/output buffer sizes, etc and store them in a .conf file
  3. We collect any bytes in the input buffer (if applicable) and store them on disk in a .data file
  4. We call the original function that our hook replaced and return its result
This means our intercepting functions collect enough information for us to replay the input (via the captured bytes and the associated information around the buffer length and provided IOCTLs) or to simply observe the range of calls and values provided to Device Drivers at runtime (useful for reverse engineering).
As replaying data is a common task, we also created and included the project `ioctl_replayer` (you may notice this tiny project is newer, as we wrote it in Rust!), this executable is relatively simple, it takes a path to a target Device Driver’s logged contents on disk from our custom Driver (e.g. “C:\DriverHooks\Driver\SomeDriver” and the Device’s path, it then enumerates all the .conf and associated .data files to reconstruct inputs and send them to their respective Drivers. While useful it its own right, it also exists as a sample of working with the .conf and .data files that you could extend.

Takeaway

Tooling is fun and can be more useful than using off-the-shelf tooling (which has its place, even better if you can customize it). If I were writing this today I’d use Rust for much nicer safety primitives/wrapping.

Repository

Brand Icon Seperator | Signal Labs | Advanced Offensive Cybersecurity Training | Self-Paced Trainings | Live Trainings | Virtual Trainings | Custom Private Trainings for Business

Empowering Cyber Defense with Advanced Offensive Security Capabilities

Signal Labs provides self-paced and live training solutions, empowering our learners to acquire the latest cutting-edge skills in this rapidly evolving field. Improve your vulnerability research campaigns and adversary simulation capabilities with the latest in offensive research and techniques.

Stay Connected

We'll let you know when our next live training is scheduled.

Stay Connected

We'll let you know when our next live training is scheduled.

Stay Connected

We'll let you know when our next live training is scheduled.

Stay Connected

We'll let you know when our next live training is scheduled.