Allow me to share a story of the worst thing in D3D12: Handling VRAM exhaustion.
-
Allow me to share a story of the worst thing in D3D12: Handling VRAM exhaustion.
If multiple programs are collectively using too much VRAM, Windows will start demoting VRAM to system memory, causing perf to catastrophically tank.
Fortunately, DXGI provides QueryVideoMemoryInfo which gives you usage+budget info for your program. Unfortunately, THE BUDGET NUMBER IS FAKE AND DOESN'T WORK. You can stay under it and still get demoted.
How are you supposed to actually detect demotion? ...
You have to set up a kernel trace with ETW and monitor the VidMmProcessDemotedCommitmentChange event. That event is not actually documented anywhere except for a random man file in WPT. Searching for it on Google yields SIX hits, and five of those are NSIGHT docs where it just tells you it logs it.
Oh wait you want to do this on a USER'S machine? Buckle up!
ETW in-process traces don't support real-time mode (only logfile), have to use a global trace. There's an OS-wide limit of 64 traces.
-
You have to set up a kernel trace with ETW and monitor the VidMmProcessDemotedCommitmentChange event. That event is not actually documented anywhere except for a random man file in WPT. Searching for it on Google yields SIX hits, and five of those are NSIGHT docs where it just tells you it logs it.
Oh wait you want to do this on a USER'S machine? Buckle up!
ETW in-process traces don't support real-time mode (only logfile), have to use a global trace. There's an OS-wide limit of 64 traces.
Global traces are also not automatically cleaned up on process exit, so if your tracing process crashes, the trace leaks.
Creating traces is also restricted by security permissions, but you won't find that out on a dev machine because Visual Studio installation adds NT AUTHORITY\INTERACTIVE (a.k.a. any locally logged-in user) to the Performance Log Users group.
-
Global traces are also not automatically cleaned up on process exit, so if your tracing process crashes, the trace leaks.
Creating traces is also restricted by security permissions, but you won't find that out on a dev machine because Visual Studio installation adds NT AUTHORITY\INTERACTIVE (a.k.a. any locally logged-in user) to the Performance Log Users group.
So basically you're supposed to do something like set up a persistent service with the appropriate user permissions to manage a limited system resource tracking an undocumented kernel event just to get around DXGI giving you a "stay under this number and everything'll be okay" number that doesn't actually work.
-
Allow me to share a story of the worst thing in D3D12: Handling VRAM exhaustion.
If multiple programs are collectively using too much VRAM, Windows will start demoting VRAM to system memory, causing perf to catastrophically tank.
Fortunately, DXGI provides QueryVideoMemoryInfo which gives you usage+budget info for your program. Unfortunately, THE BUDGET NUMBER IS FAKE AND DOESN'T WORK. You can stay under it and still get demoted.
How are you supposed to actually detect demotion? ...
@EricLasota yeah it’s pretty terrible. The original design was that of you didn’t stay under that budget the OS would stop scheduling your command list submissions to run at all, but then it was changed to the auto demotion stuff before release. I’m guessing some wires got crossed somewhere, but it’s wild it’s been like this for 10 years now with no updates. Even a D3D API to notify you when demotion occurs would be a huge improvement.
-
@EricLasota yeah it’s pretty terrible. The original design was that of you didn’t stay under that budget the OS would stop scheduling your command list submissions to run at all, but then it was changed to the auto demotion stuff before release. I’m guessing some wires got crossed somewhere, but it’s wild it’s been like this for 10 years now with no updates. Even a D3D API to notify you when demotion occurs would be a huge improvement.
@mjp Personally I don't think being notified of demotion is that useful because there isn't a guarantee that freeing X amount of memory will make the OS give it back.
If you're over 250MB and you dump 250MB of allocations and... you've still got 250MB demoted... then what? (Did some other app's usage go up, or are we just waiting?)
If the OS dictates memory demotion, then it should be giving applications accurate budgets.
-
@mjp Personally I don't think being notified of demotion is that useful because there isn't a guarantee that freeing X amount of memory will make the OS give it back.
If you're over 250MB and you dump 250MB of allocations and... you've still got 250MB demoted... then what? (Did some other app's usage go up, or are we just waiting?)
If the OS dictates memory demotion, then it should be giving applications accurate budgets.
@EricLasota sure, of course an accurate budget would be ideal too. But getting notified would be useful for telemetry, and to understand exactly which allocation is getting demoted.
-
@EricLasota sure, of course an accurate budget would be ideal too. But getting notified would be useful for telemetry, and to understand exactly which allocation is getting demoted.
@mjp @EricLasota An accurate budget is also kind of an ill defined thing when you're over-subscribing, since you've got two or more apps fighting for resources, all of which you can arbitrarily penalize, and you can't just split the available memory. Honestly it's pretty impressive it works at all, well, after a minimize and restore cycle to kick vidmm...
-
@mjp @EricLasota An accurate budget is also kind of an ill defined thing when you're over-subscribing, since you've got two or more apps fighting for resources, all of which you can arbitrarily penalize, and you can't just split the available memory. Honestly it's pretty impressive it works at all, well, after a minimize and restore cycle to kick vidmm...
@mjp @EricLasota just think oneself lucky, on Linux there's no smart promotion or demotion, so once you get into the danger zone there's no recovery short of closing everything and trying again from the top.
-
@mjp @EricLasota An accurate budget is also kind of an ill defined thing when you're over-subscribing, since you've got two or more apps fighting for resources, all of which you can arbitrarily penalize, and you can't just split the available memory. Honestly it's pretty impressive it works at all, well, after a minimize and restore cycle to kick vidmm...
@dotstdy @EricLasota I believe the memory VidMM doles out has to be physically contiguous as well, which means fragmentation could also mess with your ability to give accurate budgets.
-
@dotstdy @EricLasota I believe the memory VidMM doles out has to be physically contiguous as well, which means fragmentation could also mess with your ability to give accurate budgets.
-
@mjp @EricLasota An accurate budget is also kind of an ill defined thing when you're over-subscribing, since you've got two or more apps fighting for resources, all of which you can arbitrarily penalize, and you can't just split the available memory. Honestly it's pretty impressive it works at all, well, after a minimize and restore cycle to kick vidmm...
@dotstdy @mjp It already gives some priority (i.e. more budget) to the in-focus program and that's probably a good baseline. Ultimately though, I think it's less important that it comes up with good budget numbers than that it tries to maintain the invariant that the in-focus program stays fully resident if it stays under its budget number.
If it can't do that, then yeah I at least want to know that it's demoting to provide feedback to the user that VRAM is critically low.
-
So basically you're supposed to do something like set up a persistent service with the appropriate user permissions to manage a limited system resource tracking an undocumented kernel event just to get around DXGI giving you a "stay under this number and everything'll be okay" number that doesn't actually work.
@EricLasota I'd argue such a number does not exist in any useful capacity, and the real mistake was pretending that it does. The kernel might evict buffers at pretty much any time, and processes can get preempted more or less at random, so any reported "safe capacity" is going to be completely useless because by the time your code gets to make any decisions based on the number, other processes could run which may completely change the circumstances the original capacity was determined under.
-
@EricLasota I'd argue such a number does not exist in any useful capacity, and the real mistake was pretending that it does. The kernel might evict buffers at pretty much any time, and processes can get preempted more or less at random, so any reported "safe capacity" is going to be completely useless because by the time your code gets to make any decisions based on the number, other processes could run which may completely change the circumstances the original capacity was determined under.
@pixelcluster The kernel can decide to just not evict anything from under-budget processes as long as the total's under physical memory.
It's kinda fine if the OS decides to lower the budget and start evicting memory without warning.
What's not fine is the memory becomes persistently evicted and the program has no (good) way of detecting the problem and recovering, turning what would be a short stall into an ongoing perf hit that makes it basically unusable.
-
@dotstdy @mjp It already gives some priority (i.e. more budget) to the in-focus program and that's probably a good baseline. Ultimately though, I think it's less important that it comes up with good budget numbers than that it tries to maintain the invariant that the in-focus program stays fully resident if it stays under its budget number.
If it can't do that, then yeah I at least want to know that it's demoting to provide feedback to the user that VRAM is critically low.
@EricLasota @mjp I think if i were to imagine a better solution, it would be more like a static allocation in a GAME MODE for the OS. So the title can negotiate a fixed budget while it's in the foreground, and to hell with anything else running on the machine. It's always going to be pretty busted when somebody decides to multi-box or have something annoying running in the background, but a strong hint of "i'm going to monopolize the vram, how much can i have" is better than a dynamic budget.
-
@EricLasota @mjp I think if i were to imagine a better solution, it would be more like a static allocation in a GAME MODE for the OS. So the title can negotiate a fixed budget while it's in the foreground, and to hell with anything else running on the machine. It's always going to be pretty busted when somebody decides to multi-box or have something annoying running in the background, but a strong hint of "i'm going to monopolize the vram, how much can i have" is better than a dynamic budget.
@EricLasota @mjp You probably keep the flexible budget on top, you just want to carve out a big chunk when launching (foregrounding?) the game so that you can actually promise not to evict if you stay in the blessed range (heap flag to choose?). At the moment you can end up having the game evicted entirely, and then never end up being able to restore that baseline due to interim changes to the system state.
OTOH maybe it's just re-arranging deck chairs on the titanic. Don't oversubscribe :'(
-
@EricLasota @mjp You probably keep the flexible budget on top, you just want to carve out a big chunk when launching (foregrounding?) the game so that you can actually promise not to evict if you stay in the blessed range (heap flag to choose?). At the moment you can end up having the game evicted entirely, and then never end up being able to restore that baseline due to interim changes to the system state.
OTOH maybe it's just re-arranging deck chairs on the titanic. Don't oversubscribe :'(
@dotstdy @EricLasota at the limit you basically end up with the console model where one “game” is allowed to run with guaranteed resources, and then any “apps” only ever get allocated enough to fit alongside a game. But that doesn’t really work on a windows PC where any number of apps might decide to be voracious consumers of RAM and VRAM (and someone will *always* complain loudly if their favorite thing doesn’t work while they’re gaming).
-
@dotstdy @EricLasota at the limit you basically end up with the console model where one “game” is allowed to run with guaranteed resources, and then any “apps” only ever get allocated enough to fit alongside a game. But that doesn’t really work on a windows PC where any number of apps might decide to be voracious consumers of RAM and VRAM (and someone will *always* complain loudly if their favorite thing doesn’t work while they’re gaming).
@mjp @EricLasota I think one advantage, even though that's absolutely the case, is you can kind of message it in the OS. "shootergame.exe wants to enter game mode, but there's not enough video memory because rivatuner.exe is using 6gb of video memory. do you want to launch in 'dogshit performance mode'?"
-
@mjp @EricLasota I think one advantage, even though that's absolutely the case, is you can kind of message it in the OS. "shootergame.exe wants to enter game mode, but there's not enough video memory because rivatuner.exe is using 6gb of video memory. do you want to launch in 'dogshit performance mode'?"
@mjp @EricLasota otoh when the thing eating all the memory is the nvidia app and steamwebhelper.exe maybe you might not make so many friends with this style of messaging :')
-
@dotstdy @EricLasota at the limit you basically end up with the console model where one “game” is allowed to run with guaranteed resources, and then any “apps” only ever get allocated enough to fit alongside a game. But that doesn’t really work on a windows PC where any number of apps might decide to be voracious consumers of RAM and VRAM (and someone will *always* complain loudly if their favorite thing doesn’t work while they’re gaming).
@mjp @dotstdy You can already request a minimum reservation. Problem is, how much can I use above the must-have? And what happens when the OS says "no you can't have all of that any more?"
And yeah in practice, I know the solution is to not oversubscribe VRAM, problem is I want to be able to either avoid oversubscribing or tell the user they're out of VRAM because otherwise guess who gets blamed for the game running at 2 FPS?
-
@pixelcluster The kernel can decide to just not evict anything from under-budget processes as long as the total's under physical memory.
It's kinda fine if the OS decides to lower the budget and start evicting memory without warning.
What's not fine is the memory becomes persistently evicted and the program has no (good) way of detecting the problem and recovering, turning what would be a short stall into an ongoing perf hit that makes it basically unusable.
@EricLasota Well, as long as the total memory usage is below physical memory size, there is nothing to evict in any case.
On the problem of apps not recovering from persistent evictions, there's just not much that apps can do here. As long as there is memory contention, the kernel may have to evict "random" memory at "random" times. For every form of notification the kernel may provide, there is a case where apps acting on that notification does nothing at best and is harmful at worst.