Allow me to share a story of the worst thing in D3D12: Handling VRAM exhaustion.
-
So basically you're supposed to do something like set up a persistent service with the appropriate user permissions to manage a limited system resource tracking an undocumented kernel event just to get around DXGI giving you a "stay under this number and everything'll be okay" number that doesn't actually work.
@EricLasota I'd argue such a number does not exist in any useful capacity, and the real mistake was pretending that it does. The kernel might evict buffers at pretty much any time, and processes can get preempted more or less at random, so any reported "safe capacity" is going to be completely useless because by the time your code gets to make any decisions based on the number, other processes could run which may completely change the circumstances the original capacity was determined under.
-
@EricLasota I'd argue such a number does not exist in any useful capacity, and the real mistake was pretending that it does. The kernel might evict buffers at pretty much any time, and processes can get preempted more or less at random, so any reported "safe capacity" is going to be completely useless because by the time your code gets to make any decisions based on the number, other processes could run which may completely change the circumstances the original capacity was determined under.
@pixelcluster The kernel can decide to just not evict anything from under-budget processes as long as the total's under physical memory.
It's kinda fine if the OS decides to lower the budget and start evicting memory without warning.
What's not fine is the memory becomes persistently evicted and the program has no (good) way of detecting the problem and recovering, turning what would be a short stall into an ongoing perf hit that makes it basically unusable.
-
@dotstdy @mjp It already gives some priority (i.e. more budget) to the in-focus program and that's probably a good baseline. Ultimately though, I think it's less important that it comes up with good budget numbers than that it tries to maintain the invariant that the in-focus program stays fully resident if it stays under its budget number.
If it can't do that, then yeah I at least want to know that it's demoting to provide feedback to the user that VRAM is critically low.
@EricLasota @mjp I think if i were to imagine a better solution, it would be more like a static allocation in a GAME MODE for the OS. So the title can negotiate a fixed budget while it's in the foreground, and to hell with anything else running on the machine. It's always going to be pretty busted when somebody decides to multi-box or have something annoying running in the background, but a strong hint of "i'm going to monopolize the vram, how much can i have" is better than a dynamic budget.
-
@EricLasota @mjp I think if i were to imagine a better solution, it would be more like a static allocation in a GAME MODE for the OS. So the title can negotiate a fixed budget while it's in the foreground, and to hell with anything else running on the machine. It's always going to be pretty busted when somebody decides to multi-box or have something annoying running in the background, but a strong hint of "i'm going to monopolize the vram, how much can i have" is better than a dynamic budget.
@EricLasota @mjp You probably keep the flexible budget on top, you just want to carve out a big chunk when launching (foregrounding?) the game so that you can actually promise not to evict if you stay in the blessed range (heap flag to choose?). At the moment you can end up having the game evicted entirely, and then never end up being able to restore that baseline due to interim changes to the system state.
OTOH maybe it's just re-arranging deck chairs on the titanic. Don't oversubscribe :'(
-
@EricLasota @mjp You probably keep the flexible budget on top, you just want to carve out a big chunk when launching (foregrounding?) the game so that you can actually promise not to evict if you stay in the blessed range (heap flag to choose?). At the moment you can end up having the game evicted entirely, and then never end up being able to restore that baseline due to interim changes to the system state.
OTOH maybe it's just re-arranging deck chairs on the titanic. Don't oversubscribe :'(
@dotstdy @EricLasota at the limit you basically end up with the console model where one “game” is allowed to run with guaranteed resources, and then any “apps” only ever get allocated enough to fit alongside a game. But that doesn’t really work on a windows PC where any number of apps might decide to be voracious consumers of RAM and VRAM (and someone will *always* complain loudly if their favorite thing doesn’t work while they’re gaming).
-
@dotstdy @EricLasota at the limit you basically end up with the console model where one “game” is allowed to run with guaranteed resources, and then any “apps” only ever get allocated enough to fit alongside a game. But that doesn’t really work on a windows PC where any number of apps might decide to be voracious consumers of RAM and VRAM (and someone will *always* complain loudly if their favorite thing doesn’t work while they’re gaming).
@mjp @EricLasota I think one advantage, even though that's absolutely the case, is you can kind of message it in the OS. "shootergame.exe wants to enter game mode, but there's not enough video memory because rivatuner.exe is using 6gb of video memory. do you want to launch in 'dogshit performance mode'?"
-
@mjp @EricLasota I think one advantage, even though that's absolutely the case, is you can kind of message it in the OS. "shootergame.exe wants to enter game mode, but there's not enough video memory because rivatuner.exe is using 6gb of video memory. do you want to launch in 'dogshit performance mode'?"
@mjp @EricLasota otoh when the thing eating all the memory is the nvidia app and steamwebhelper.exe maybe you might not make so many friends with this style of messaging :')
-
@dotstdy @EricLasota at the limit you basically end up with the console model where one “game” is allowed to run with guaranteed resources, and then any “apps” only ever get allocated enough to fit alongside a game. But that doesn’t really work on a windows PC where any number of apps might decide to be voracious consumers of RAM and VRAM (and someone will *always* complain loudly if their favorite thing doesn’t work while they’re gaming).
@mjp @dotstdy You can already request a minimum reservation. Problem is, how much can I use above the must-have? And what happens when the OS says "no you can't have all of that any more?"
And yeah in practice, I know the solution is to not oversubscribe VRAM, problem is I want to be able to either avoid oversubscribing or tell the user they're out of VRAM because otherwise guess who gets blamed for the game running at 2 FPS?
-
@pixelcluster The kernel can decide to just not evict anything from under-budget processes as long as the total's under physical memory.
It's kinda fine if the OS decides to lower the budget and start evicting memory without warning.
What's not fine is the memory becomes persistently evicted and the program has no (good) way of detecting the problem and recovering, turning what would be a short stall into an ongoing perf hit that makes it basically unusable.
@EricLasota Well, as long as the total memory usage is below physical memory size, there is nothing to evict in any case.
On the problem of apps not recovering from persistent evictions, there's just not much that apps can do here. As long as there is memory contention, the kernel may have to evict "random" memory at "random" times. For every form of notification the kernel may provide, there is a case where apps acting on that notification does nothing at best and is harmful at worst.
-
@EricLasota Well, as long as the total memory usage is below physical memory size, there is nothing to evict in any case.
On the problem of apps not recovering from persistent evictions, there's just not much that apps can do here. As long as there is memory contention, the kernel may have to evict "random" memory at "random" times. For every form of notification the kernel may provide, there is a case where apps acting on that notification does nothing at best and is harmful at worst.
@EricLasota The best you can probably do is give the kernel as much info as possible about which pieces of memory have the least bad effect when evicted (i.e. memory priorities), because the kernel is in a position to actually make meaningful decisions on what to evict.
Of course that doesn't replace app-side freeing of resources on memory contention, but I'm doubtful a kernel-side notif would be less racy/broken than e.g. detecting memory contention based on sampling total mem usage.
-
@EricLasota The best you can probably do is give the kernel as much info as possible about which pieces of memory have the least bad effect when evicted (i.e. memory priorities), because the kernel is in a position to actually make meaningful decisions on what to evict.
Of course that doesn't replace app-side freeing of resources on memory contention, but I'm doubtful a kernel-side notif would be less racy/broken than e.g. detecting memory contention based on sampling total mem usage.
@pixelcluster @EricLasota I think one big issue is that while the kernel is the only one which can decide what to evict sensibly, there's no such thing as "less important memory" from the application point of view. The entire dynamic texture pool is often trivially evictable by the application, but if the OS demotes that memory the application is going to tank performance. Would almost rather some kind of VM_DROPPABLE for that kind of memory, where the app is trivially able to deal with eviction
-
@pixelcluster @EricLasota I think one big issue is that while the kernel is the only one which can decide what to evict sensibly, there's no such thing as "less important memory" from the application point of view. The entire dynamic texture pool is often trivially evictable by the application, but if the OS demotes that memory the application is going to tank performance. Would almost rather some kind of VM_DROPPABLE for that kind of memory, where the app is trivially able to deal with eviction
@pixelcluster @EricLasota it works well in other cases where you oversubscribe, though. E.g. if you have a level editor and you want to preferentially demote (without freeing) all the editor side GPU allocations when you enter "game mode".
-
@EricLasota Well, as long as the total memory usage is below physical memory size, there is nothing to evict in any case.
On the problem of apps not recovering from persistent evictions, there's just not much that apps can do here. As long as there is memory contention, the kernel may have to evict "random" memory at "random" times. For every form of notification the kernel may provide, there is a case where apps acting on that notification does nothing at best and is harmful at worst.
@pixelcluster What I mean is if the kernel decides on per-process budgets that total less than the physical memory, then it can guarantee that only over-budget processes will get evicted, and that if it has to evict an under-budget one anyway, that it will always update the budgets when it does.
-
@EricLasota The best you can probably do is give the kernel as much info as possible about which pieces of memory have the least bad effect when evicted (i.e. memory priorities), because the kernel is in a position to actually make meaningful decisions on what to evict.
Of course that doesn't replace app-side freeing of resources on memory contention, but I'm doubtful a kernel-side notif would be less racy/broken than e.g. detecting memory contention based on sampling total mem usage.
@pixelcluster The problem with sampling total memory usage is that the applications don't talk to each other.
If one program is hogging all of the VRAM and another wants more than what's available, then there are really two options: Fail allocations in the new program (allowing the first-comer to monopolize the VRAM), or make the first one lower its memory usage.
So, the point of having budgets is to have some way of coordinating VRAM usage reduction across multiple programs.
-
undefined aeva@mastodon.gamedev.place shared this topic on
undefined oblomov@sociale.network shared this topic on