Recently, MWR intern Jérémy Fetiveau (@__x86) conducted a research project into the kernel protections introduced in Microsoft Windows 8 and newer. This blog post details his findings, and presents a generic technique for exploiting kernel vulnerabilities, bypassing SMEP and DEP. Proof-of-concept code is provided which reliably gains SYSTEM privileges, and requires only a single vulnerability that provides an attacker with a write-what-where primitive. We demonstrate this issue by providing a custom kernel driver, which simulates the presence of such a kernel vulnerability.
Before diving into the details of the bypass technique, we will quickly run through some of the technologies we will be breaking, and what they do. If you want to grab the code and follow along as we go, you can get the zip of the files here.
SMEP (Supervisor Mode Execution Prevention) is a mitigation that aims to prevent the CPU from running code from user-mode while in kernel-mode. SMEP is implemented at the page level, and works by setting flags on a page table entry, marking it as either U (user) or S (supervisor). When accessing this page of memory, the MMU can check this flag to make sure the memory is suitable for use in the current CPU mode.
DEP (Data Execution Prevention) operates much the same as it does in user-mode, and is also implemented at the page level by setting flags on a page table entry. The basic principle of DEP is that no page of memory should be both writeable and executable, which aims to prevent the CPU executing instructions provided as data from the user.
KASLR (Kernel Address Space Layout Randomization) is a mitigation that aims to prevent an attacker from successfully predicting the address of a given piece of memory. This is significant, as many exploitation techniques rely on an attacker being able to locate the addresses of important data such as shellcode, function pointers, etc.
With the use of virtual memory, the CPU needs a way to translate virtual addresses to physical addresses. There are several paging structures involved in this process. Let’s first consider a toy example where we only have page tables in order to perform the translation.
For each running process, the processor will use a different page table. Each entry of this page table will contain the information “virtual page X references physical frame Y”. Of course, these frames are unique, whereas pages are relative to their page table. Thus we can have a process A with a page table PA containing an entry “page 42 references frame 13” and a process B with a page table PB containing an entry “page 42 references frame 37”.
If we consider a format for virtual addresses that consists of a page table field followed by an offset referencing a byte within this page, the same address 4210 would correspond to two different physical locations according to which process is currently running (and which page table is currently active). For a 64-bit x86_64 processor, the virtual address translation is roughly the same.
However, in practice the processor is not only using page tables, but uses four different structures. In the previous example, we had physical frames referenced by PTEs (page table entries) within PTs (page tables). In the reality, the actual format for virtual addresses looks more like the illustration below:
The cr3 register contains the physical address of the PML4. The PML4 field of a virtual address is used to select an entry within this PML4. The selected PML4 entry contains (with a few additional flags) the physical address of a PDPT (Page Directory Pointer Table). The PDPT field of a virtual address therefore references an entry within this PDPT. As expected this PDPT entry contains the physical address of the PD. Again, this entry contains the physical address of a PD. We can therefore use the PD field of the virtual address to reference an entry within the PD and so on and so forth. This is well summarized by Intel’s schema:
It should be now be clearer how the hardware actually translates virtual addresses to physical addresses. An interested reader who is not familiar with the inner working of x64 paging can refer to the section 4.5 of the volume 3A of the Intel manuals for more in-depth explanations.
In the past, kernel exploits commonly redirected execution to memory allocated in user-land. Due to the presence of SMEP, this is now no longer possible. Therefore, an attacker would have to inject code into the kernel memory, or convince the kernel to allocate memory with attacker-controlled content.
This was commonly achieved by allocating executable kernel objects containing attacker controlled data. However, due to DEP, most objects are now non executable (for example, the “NonPagedPoolNx” pool type has replaced “NonPagedPool”). An attacker would now have to find a way to use a kernel payload which uses return-oriented programming (ROP), which re-uses existing executable kernel code.
In order to construct such a payload, an attacker would need to know the location of certain “ROP gadgets”, which contain the instructions that will be executed. However, due to the presence of KASLR, these gadgets will be at different addresses on each run of the system, so locating these gadgets would likely require additional vulnerabilities.
The presented technique consists of writing a function to deduce the address of paging structures from a given user-land memory address. Once these structures are located, we are able to partially corrupt them to change the metadata, allowing us to “trick” the kernel into thinking a chunk that was originally allocated in user-mode is suitable for use in kernel-mode. We can also corrupt the flags checked by DEP to make the contents of the memory executable.
By doing this in a controlled manner, we can create a piece of memory that was initially allocated as not executable by a user-mode process, and modify the relevant paging structures so that it can be executed as kernel-mode code. We will describe this technique in more detail below.
When the kernel wants to access paging structures, it has to find their virtual addresses. The processor instructions only allow the manipulation of virtual addresses, not physical ones. Therefore, the kernel needs a way to map those paging structures into virtual memory. For that, several operating systems use a special self-referencing PML4 entry.
Instead of referencing a PDPT, this PML4 entry will reference the PML4 itself, and shift the other values to make space for the new self-reference field. Thus, instead of referencing a specific byte of memory within a memory page, a PTE will be referenced instead. It is possible to retrieve more structures by using the same self-reference entry several times.
A good description of this mechanism can also be found in the excellent book What makes it page? The Windows 7 (x64) virtual memory manager by Enrico Martignetti.
To better understand this process, let’s go through an example showing how to build a function that maps a virtual address to the address of its PTE. First, we should remind ourselves of the usual format of a virtual address. A canonical address has its 48 bits composed of four 9-bit fields and one 12-bit offset field. The PML4 field references an entry within the PML4, the PDPT references an entry within the PDPT, and so on and so forth.
If we want to reference a PTE instead of a byte located within a page, we can use the special PML4 entry 0y111101101. We fill the PML4 field with this special entry and then shift everything 9-bits to the right so that we get an address with the following format:
We use this technique to build a function which maps the address of a byte of memory to the address of its PTE. If you are following along in the code, this is implemented in the function “getPTfromVA” in the file “computation.cpp”. It should be noted that, even though the last offset field is 12 bits, we still do a 9-bit shift and set the remaining bits to 0 so that we have an aligned address.
To get the other structures, we can simply use the same technique several times. Here is an example for the PDE addresses:
We use the term PXE as a generic term for paging structures, as many of them share the same structure, which is as follows:
There are a number of fields that are interesting here, especially the NX bit field, which defines how the memory can be accessed, and the flags at the end, which include the U/S flag. This U/S flag denotes whether the memory is for use in user-mode or supervisor-mode (kernel-mode).
When checking the rights of a page, the kernel will check every PXE involved in the address translation. That means that if we want to check if the U/S flag is set, we will check all entries relating to that page. If any of the entries do not have the supervisor flag set, any attempt to use this page from kernel mode will trigger a fault. If all of the entries have the supervisor flag set, the page will be considered a kernel-mode page.
Because DEP is set at the page granularity level, typically higher level paging structures will be marked as executable, with DEP being applied at the PTE level by setting the NX bit. Because of this, rather than starting by allocating kernel memory, it is easier to allocate user memory with the executable rights using the standard API, and then corrupt the paging structures to modify the U/S flag and cause it to be interpreted as kernel memory.
If we corrupt a random PXE, we are likely to be in a case where the target PXE is part of a series of PXEs that are contiguous in memory. In these cases, during exploitation it might mean that an attacker would corrupt adjacent PXEs, which has a high risk of causing a crash. Most of the time, the attacker can’t simply modify only 1 bit in memory, but has to corrupt several bytes (8 bytes in our POC), which will force the attacker to corrupt more than just the relevant flags for the exploit.
The easiest way to circumvent this issue is simply to target a PXE which is isolated (e.g., with unused PXE structures on either side of the target PXE). In 64-bit environments, a process has access to a huge virtual address space of 256TB as we are effectively using a 48-bit canonical addresses instead of the full 64-bit address space.
A 48-bit virtual address is composed of several fields allowing it to reference different paging structures. As the PML4 field is 9 bits, it refers to one of 512 (2**9) PML4 entries. Each PML4 entry describes a range of 512GB (2**39). Obviously, a user process will not use so much memory that it will use all of the PML4 entries. Therefore, we can request the allocation of memory at an address outside of any 512GB used range. This will force the use of a new PML4 entry, which will reference structures containing only a single PDPT entry, a single PDE and a single PTE.
An interested reader can verify this idea using the “!address” and “!pte” windbg extensions to observe those “holes” in memory. In the presented POC, the 0×100804020001 address is used, as it is very likely to be in an unused area.
The code for the mitigation bypass is very simple. Suppose that we’ve got a vulnerable kernel component for which we are able to exploit a vulnerability which gives us a write-what-where primitive from a user-land process (this is implemented within the “write_what_where” function in our POC). We choose a virtual address with isolated paging structures (such as 0×100804020001), allocate it and fill it with our shellcode. We then retrieve all of its paging structures using the mapping function described earlier in this post (using the field shifting and the self- referencing PML4). Finally, we perform unaligned overwrites of the 4 PXEs relating to our chosen virtual address to modify its U/S bit to supervisor.
Of course, other slightly different scenarios for exploitation could be considered. For instance, if we can decrement or increment an arbitrary value in memory, we could just flip the desired flags. Also, since we are using isolated paging structures, even in the case of a bug leading to the corruption of a lot of adjacent data, the technique can still be used because it is unlikely that any important structures are located in the adjacent memory.
With this blog post, we provide an exploit for a custom driver with a very simple write-what-where vulnerability so as to let the reader experiment with the technique. However, this document was originally submitted to Microsoft with a real-world use-after-free vulnerability. Indeed, in a lot of cases, it would be possible for an attacker to force a write-what-where primitive from a vulnerability such as a use-after-free or a pool overflow.
This technique is not affected by KASLR because it is possible to directly derive the addresses of paging structures from a given virtual address. If randomization was introduced into this mapping, this would no longer be possible, and this technique would be mitigated as a result. Randomizing this function would require having a different self-referencing PML4 entry each time the kernel boots. However, it is recognised that many of the core functions of the kernel memory management may rely on this mapping to locate and update paging structures.
It might also be possible to move the paging structures into a separate segment, and reference these structures using an offset in that segment. If we consider the typical write-what-where scenarios, unless the address specified already had a segment prefix, it would not be possible for an attacker to overwrite the paging structures, even if the offset within the segment was known.
If this is not possible, another approach might be to use a hardware debug register as a faux locking mechanism. For example, if a hardware breakpoint was set on the access to the paging structures (or key fields of the structures), a handler for that breakpoint could test the value of the debug register to assess whether this access is legitimate or not. For example, before a legitimate modification to the paging structures, the kernel can unset the debug register, and no exception would be thrown. If an attacker attempted to modify the memory without unsetting the debug register, an exception could be thrown to detect this.
We reported this issue to Microsoft as part of their Mitigation Bypass Bounty Program. However, they indicated that this did not meet all of their program guidelines as it cannot be used to remotely exploit user-mode vulnerabilities. In addition, Microsoft stated that their security engineering team were aware of this “limitation”, they did not consider this a security vulnerability, and that the development of a fix was not currently planned.
With this in mind, we have decided to release this post and the accompanying code to provide a public example of current Windows kernel research. However, we have chosen not to release the fully weaponised exploit we developed as part of the submission to Microsoft, as this makes use of a vulnerability that has only recently been patched.
The technique proposed in this post allows an attacker to reliably bypass both DEP and SMEP in a generic way. We showed that it is possible to derive the addresses of paging structures from a virtual address, and how an attacker could use this to corrupt paging structures so as to create executable kernel memory, even in low memory addresses. We demonstrated that this technique is usable without the fear of corrupting non targeted PXEs, even if the attacker had to corrupt large quantities of memory. Furthermore, we showed that this technique is not specific to bugs that provide a write-what-where primitive, but can also be used for a broad range of bug classes.