I gave up on trying to make gcc emit branchless code on amd64 and switched to inline asm.
-
It's literally 4 fucking instructions, less than C++ code. How hard can it be?
Clang has no trouble whatsoever, consistently delivering 10% better perf.
-
Clang has no trouble whatsoever, consistently delivering 10% better perf.
Actually I did manage to get branchless version of gcc with masks and bitwise arithmetic but it was shit.
-
Actually I did manage to get branchless version of gcc with masks and bitwise arithmetic but it was shit.
Aren't compilers supposed to be good at optimizing code?
-
Clang has no trouble whatsoever, consistently delivering 10% better perf.
@vitaut I have found clang will emit branchless code come hell or high water. Even in situations where the branchless code is horrible.
-
@vitaut I have found clang will emit branchless code come hell or high water. Even in situations where the branchless code is horrible.
@sabena Ideally it should be controllable
-
@sabena Ideally it should be controllable
@sabena In case it wasn’t clear I wasn’t generalizing but speaking about my specific case.
-
@sabena Ideally it should be controllable
@vitaut yeah, it should. Does GCC get it with pgo data?
-
@vitaut yeah, it should. Does GCC get it with pgo data?
@sabena Maybe but it seems like even more pain that a few lines of inline asm.
-
@sabena Maybe but it seems like even more pain that a few lines of inline asm.
@vitaut yeah, true. You should be able to annotate with both [[likely]] and [[unlikely]] at the same time tbh
-
I gave up on trying to make gcc emit branchless code on amd64 and switched to inline asm.
@vitaut Is the clang code better on average or is it better for your benchmark? A branch is not necessarily bad, especially if it’s short and predictable. Your benchmark may be stressing the core in a way that makes the branch-based version underperform, while it might do better in a one-shot situation.
-
undefined oblomov@sociale.network shared this topic on