I was looking at the phoronix benchmarks and why the ProjectPhysX "fp32" result is bad.
-
I was looking at the phoronix benchmarks and why the ProjectPhysX "fp32" result is bad.
turns out, it's doing fma in a loop. And because it goes throw zink and zink doesn't make use of VK_KHR_shader_fma nor do we support it with any driver inside mesa, the only choice we have is to software emulate CL's fma, because it's expected to be fused and the GLSL SPIR-V fma isn't required to be fused 🙃
It sounds like I'll have to dig out my 5 year old MR to sort out this mess https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6591
-
I was looking at the phoronix benchmarks and why the ProjectPhysX "fp32" result is bad.
turns out, it's doing fma in a loop. And because it goes throw zink and zink doesn't make use of VK_KHR_shader_fma nor do we support it with any driver inside mesa, the only choice we have is to software emulate CL's fma, because it's expected to be fused and the GLSL SPIR-V fma isn't required to be fused 🙃
It sounds like I'll have to dig out my 5 year old MR to sort out this mess https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6591
@karolherbst Couple of other things to consider, if you're trying to sort out the mess 🙂
OpenCL only requires a * b + c to remain unfused when FP_CONTRACT is OFF. Note that FP_CONTRACT is ON by default.
Might not help you, but LLVM uses fmuladd to indicate an operation that can be fused, or not, whatever the device prefers. LLVM fma must remain fused.
I thought you might be able to un-fuse fma with -cl-fast-relaxed-math, but it doesn't appear this is the case. mad can be un-fused, though.
-
@karolherbst Couple of other things to consider, if you're trying to sort out the mess 🙂
OpenCL only requires a * b + c to remain unfused when FP_CONTRACT is OFF. Note that FP_CONTRACT is ON by default.
Might not help you, but LLVM uses fmuladd to indicate an operation that can be fused, or not, whatever the device prefers. LLVM fma must remain fused.
I thought you might be able to un-fuse fma with -cl-fast-relaxed-math, but it doesn't appear this is the case. mad can be un-fused, though.
@bashbaug yeah... atm in nir we don't really have fmad vs ffma opcodes, but we really should, because all those rules around multiply add are super annoying and not consistent across APIs
-
@bashbaug yeah... atm in nir we don't really have fmad vs ffma opcodes, but we really should, because all those rules around multiply add are super annoying and not consistent across APIs
@karolherbst @bashbaug you can unfuse fma with a macro: https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/opencl.hpp#L287
ARM GPUs need that, and Nvidia CMP 170HX mining GPU too as it has fma disabled through hardware or firmware. -
undefined oblomov@sociale.network shared this topic