Using userfaultfd in page guard manager

LunarG · Jul 28, 2023 · c5f14f5 · c5f14f5
1 parent 4fa32d6
commit c5f14f5
Show file tree

Hide file tree

Showing 14 changed files with 1,328 additions and 121 deletions.
diff --git a/USAGE_android.md b/USAGE_android.md
@@ -120,14 +120,90 @@ the layer to return `VK_ERROR_INITIALIZATION_FAILED` from its
 The Vulkan API allows Vulkan memory objects to be mapped by an application
 for direct modification.
 To successfully capture an application, the GFXReconstruct layer must be able to
-detect when the application modifies the mapped memory.
-
-The layer can be configured to detect memory modifications by marking the mapped
-memory as write protected, triggering an access violation when the application
-writes to the memory.
-The layer then uses a signal handler to intercept the signal generated by the
-access violation, where it removes the write protection, marks the modified
-memory page as dirty, and allows the application to continue.
+detect if the application modifies the mapped memory in order to dump the changes
+in the capture file so that they can be re-applied while replaying.
+To achieve this GFXR utilizes four different mechanisms:
+
+##### 1. `assisted`
+This method expects the application to call `vkFlushMappedMemoryRanges`
+ after memory is modified; the memory ranges specified to the
+ `vkFlushMappedMemoryRanges` call will be written to the capture file
+ during the call.
+
+##### 2. `unassisted`
+This method writes the full content of mapped memory to the capture file
+on calls to `vkUnmapMemory` and `vkQueueSubmit`. It is very inefficient
+for performance and it will bloat capture file sizes. May be unusable
+ with real-world applications that map large amounts of memory.
+
+##### 3. `page_guard`
+`page_guard` tracks modifications to individual memory pages, which are
+written to the capture file on calls to `vkFlushMappedMemoryRanges`,
+`vkUnmapMemory`, and `vkQueueSubmit`. This method requires allocating
+shadow memory for all mapped memory. The way the changes are being tracked
+varies depending on the operating system.
+- On Windows `Vectored Exception Handling` mechanism is used on the shadow
+memories that correspond to the mapped device memory regions.
+- On Linux and Android the shadow memory regions are similarly trapped by
+changing its access protection to `PROT_NONE`. Every access from the
+application will generate a `SIGSEGV` which is handled by the signal handler
+installed by the page guard manager.
+
+Because a shadow memory is allocated and returned to the application instead
+of the actual mapped memory returned by the driver, both reads and writes need
+to be tracked.
+- Writes need to be dumped to the capture file.
+- Reads must cause a memory copy from the actual mapped memory into the shadow
+memory so that the application will not be reading garbage.
+
+`page_guard` is the most efficient, both performance and capture file size
+wise, mechanism. However, as described in [Conflicts With Crash Detection Libraries](#conflicts-with-crash-detection-libraries),
+it has some limitation when recording applications that install their own
+signal handler for handling the `SIGSEGV` signal. To work around this
+limitation there is the `userfaultfd` mechanism.
+
+##### 4. `userfaultfd`
+This is basically the same mechanism as `page_guard` but instead of trapping
+the shadow memory regions with the `PROT_NONE` + `SIGSEGV` trick, it utilizes
+the `userfaultfd` mechanism provided by the Linux kernel, making this
+mechanism available only on Linux and Android.
+
+Shadow memory regions are registered using the
+`UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_MISSING` flags with the
+userfaultfd mechanism and a handler thread is started and polls for faults
+ to trigger. The combination of those flags will trigger a fault in two cases:
+- When an unallocated page is accessed with either a write or a read.
+- Whan a page is written.
+
+This imposes a limitation. When the shadow memory is freshly allocated all
+pages will be unallocated, making tracking both reads and writes simple, but
+after the first time the accesses are tracked and dumped to the capture file
+the reads cannot be tracked any longer as the pages will be already allocated.
+To workaround this each time the memory is examined and the changes are dumped
+to the capture file, new pages are requested by the OS to be provided at the
+same virtual address and the memory is unregistered and registerd again. This
+has a performance penalty as in this case both reads and writes need to be
+copied from the actual mapped device memory into the shadow memory.
+
+Also there is another limitation. The way the new pages are requested each
+time and the regions are unregistered and registered again, makes this
+mechanism prone to race conditions when there are multiple threads. If a
+thread is accessing a specific page within a region and at the same time
+that region is being reset, then the access is not trapped and undefined
+behavior occurs.
+
+In order to work around this a list of the thread ids that access each
+region is kept. When that specific region is being reset a signal is
+sent to each thread which will essentially force them to enter the signal
+handler that GFXR registers for that signal. The signal handler essentially
+performs a form of synchronization between the thread that is triggering
+the reset and the rest of the threads that potentially are touching the
+pages that are being reset. The signal used one of the real time signals,
+the first in the range [`SIGRTMIN`, `SIGRTMAX`] that has no handler already
+installed.
+
+`userfaultfd` is less efficient performance wise than `page_guard` but
+should be fast enough for real-world applications and games.
 
 ##### Disabling Debug Breaks Triggered by the GFXReconstruct Layer
 
@@ -367,7 +443,7 @@ Log Break on Error | debug.gfxrecon.log_break_on_error | BOOL | Trigger a debug
 Log File Create New | debug.gfxrecon.log_file_create_new | BOOL | Specifies that log file initialization should overwrite an existing file when true, or append to an existing file when false. Default is: `true`
 Log File Flush After Write | debug.gfxrecon.log_file_flush_after_write | BOOL | Flush the log file to disk after each write when true. Default is: `false`
 Log File Keep Open | debug.gfxrecon.log_file_keep_open | BOOL | Keep the log file open between log messages when true, or close and reopen the log file for each message when false. Default is: `true`
-Memory Tracking Mode | debug.gfxrecon.memory_tracking_mode | STRING | Specifies the memory tracking mode to use for detecting modifications to mapped Vulkan memory objects. Available options are: `page_guard`, `assisted`, and `unassisted`. Default is `page_guard` <ul><li>`page_guard` tracks modifications to individual memory pages, which are written to the capture file on calls to `vkFlushMappedMemoryRanges`, `vkUnmapMemory`, and `vkQueueSubmit`. Tracking modifications requires allocating shadow memory for all mapped memory and that the `SIGSEGV` signal is enabled in the thread's signal mask.</li><li>`assisted` expects the application to call `vkFlushMappedMemoryRanges` after memory is modified; the memory ranges specified to the `vkFlushMappedMemoryRanges` call will be written to the capture file during the call.</li><li>`unassisted` writes the full content of mapped memory to the capture file on calls to `vkUnmapMemory` and `vkQueueSubmit`. It is very inefficient and may be unusable with real-world applications that map large amounts of memory.</li></ul>
+Memory Tracking Mode | debug.gfxrecon.memory_tracking_mode | STRING | Specifies the memory tracking mode to use for detecting modifications to mapped Vulkan memory objects. Available options are: `page_guard`, `userfaultfd`, `assisted`, and `unassisted`. See [Understanding GFXReconstruct Layer Memory Capture](#understanding-gfxreconstruct-layer-memory-capture) for more details. Default is `page_guard`.
 Page Guard Copy on Map | debug.gfxrecon.page_guard_copy_on_map | BOOL | When the `page_guard` memory tracking mode is enabled, copies the content of the mapped memory to the shadow memory immediately after the memory is mapped. Default is: `true`
 Page Guard Separate Read Tracking | debug.gfxrecon.page_guard_separate_read | BOOL | When the `page_guard` memory tracking mode is enabled, copies the content of pages accessed for read from mapped memory to shadow memory on each read. Can overwrite unprocessed shadow memory content when an application is reading from and writing to the same page. Default is: `true`
 Page Guard Persistent Memory | debug.gfxrecon.page_guard_persistent_memory | BOOL | When the `page_guard` memory tracking mode is enabled, this option changes the way that the shadow memory used to detect modifications to mapped memory is allocated. The default behavior is to allocate and copy the mapped memory range on map and free the allocation on unmap. When this option is enabled, an allocation with a size equal to that of the object being mapped is made once on the first map and is not freed until the object is destroyed.  This option is intended to be used with applications that frequently map and unmap large memory ranges, to avoid frequent allocation and copy operations that can have a negative impact on performance.  This option is ignored when GFXRECON_PAGE_GUARD_EXTERNAL_MEMORY is enabled. Default is `false`
@@ -507,13 +583,13 @@ This will download the file to the current directory.
 
 As described in
 [Understanding GFXReconstruct Layer Memory Capture](#understanding-gfxreconstruct-layer-memory-capture),
-the capture layer uses a signal handler to detect modifications to
-mapped memory.
+the capture layer, when utilizing the `page_guard` mechanism, it uses a signal
+handler to detect modifications to mapped memory.
 Only one signal handler for that signal can be registered at a time, which can
 lead to a potential conflict with crash detection libraries that will also
 register a signal handler.
 
-Conflict between the capture layer and crash detection libraries depends on the
+Conflict between the `page_guard` mechanism  and crash detection libraries depends on the
 order with which each component registers its signal handler.
 The capture layer will not register its signal handler until the first call to
 `vkMapMemory`.
@@ -532,10 +608,11 @@ After the crash detection library sets its signal handler, it immediately
 receives a SIGSEGV event generated by the concurrent write to mapped memory,
 which it detects as a crash and terminates the application.
 
+`userfaultfd` mechanism was introduced in order to work around such conflicts.
 
 #### Memory Tracking Limitations
 
-There is a limitation with the page guard memory tracking method used by the
+There is a limitation with the `page_guard` memory tracking method used by the
 GFXReconstruct capture layer.
 The logic behind that method is to apply a memory protection to the
 guarded/shadowed regions so that accesses made by the user to trigger a
@@ -813,10 +890,10 @@ activity with the following:
 
 ```bash
 adb shell am force-stop com.lunarg.gfxreconstruct.replay
-adb shell am start -n "com.lunarg.gfxreconstruct.replay/android.app.NativeActivity" \ 
-                   -a android.intent.action.MAIN \ 
-                   -c android.intent.category.LAUNCHER \ 
-                   --es "args" \ 
+adb shell am start -n "com.lunarg.gfxreconstruct.replay/android.app.NativeActivity" \
+                   -a android.intent.action.MAIN \
+                   -c android.intent.category.LAUNCHER \
+                   --es "args" \
                    "<arg-list>"
 ```