I was looking at the phoronix benchmarks and why the ProjectPhysX "fp32" result is bad.

karolherbst@chaos.social

turns out, it's doing fma in a loop. And because it goes throw zink and zink doesn't make use of VK_KHR_shader_fma nor do we support it with any driver inside mesa, the only choice we have is to software emulate CL's fma, because it's expected to be fused and the GLSL SPIR-V fma isn't required to be fused 🙃

It sounds like I'll have to dig out my 5 year old MR to sort out this mess https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6591

bashbaug@mastodon.gamedev.place

@karolherbst Couple of other things to consider, if you're trying to sort out the mess 🙂

OpenCL only requires a * b + c to remain unfused when FP_CONTRACT is OFF. Note that FP_CONTRACT is ON by default.

Might not help you, but LLVM uses fmuladd to indicate an operation that can be fused, or not, whatever the device prefers. LLVM fma must remain fused.

I thought you might be able to un-fuse fma with -cl-fast-relaxed-math, but it doesn't appear this is the case. mad can be un-fused, though.

karolherbst@chaos.social

@bashbaug yeah... atm in nir we don't really have fmad vs ffma opcodes, but we really should, because all those rules around multiply add are super annoying and not consistent across APIs

projectphysx@mast.hpc.social

@karolherbst @bashbaug you can unfuse fma with a macro: https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/opencl.hpp#L287
ARM GPUs need that, and Nvidia CMP 170HX mining GPU too as it has fma disabled through hardware or firmware.

Piero Bosio Social Web Site Personale

I was looking at the phoronix benchmarks and why the ProjectPhysX "fp32" result is bad.

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

This post did not contain any content.

In culo a San Saturnino di Tolosa

⭐ SPECIALE BLACK FRIDAY: Tutti i miei corsi in OFFERTA all'88% di sconto solo su:

Clarity fucking matters.