fixed point was a mistake
-
@eniko@mastodon.gamedev.place @lina@vt.social do you happen to own a Bulldozer CPU?
-
I have completed the triangle rasterizer fixed point conversion
The benchmarks have all improved between 0 and 35%
Except for threaded random triangles with flat color. That metric has increased 106%. As in its twice as fast as before
I have no idea why but I'm fairly sure I've ruled out bugs in my benchmarking
I am very confused
@eniko mutex/locking/semaphores?
-
I have completed the triangle rasterizer fixed point conversion
The benchmarks have all improved between 0 and 35%
Except for threaded random triangles with flat color. That metric has increased 106%. As in its twice as fast as before
I have no idea why but I'm fairly sure I've ruled out bugs in my benchmarking
I am very confused
@eniko (Highly speculative, since FPUs are pretty good these days:) If the CPU has hyperthreading: Maybe two fixed-point threads share ALUs better than two floating-point threads share FPUs?
You might even find that combined fixed and float makes better overall use of a modern CPU, provided you can do both on a single thread or jump through the hoops to get both threads scheduled on the same core.
-
@lina not sure offhand and I'm in bed now but my CPU is a Ryzen 7 5600g
@eniko That doesn't exist... Ryzen 5 or different model number?
If it's the 5 5600G then that's 6 cores, so with 4 threads you shouldn't have HT effects as long as the OS scheduler isn't dumb about it...
-
fixed point was in fact *not* a mistake
@eniko these two messages are my constant bistable state about fixed points
-
@eniko That doesn't exist... Ryzen 5 or different model number?
If it's the 5 5600G then that's 6 cores, so with 4 threads you shouldn't have HT effects as long as the OS scheduler isn't dumb about it...
@lina er yeah 5 sorry
-
@eniko mutex/locking/semaphores?
@slyecho wouldn't that make it slower, not faster?
-
@slyecho wouldn't that make it slower, not faster?
@eniko one would assume 4 times as fast with four threads, not 30% faster. But I don’t know exactly what the code is doing without seeing it
-
I have completed the triangle rasterizer fixed point conversion
The benchmarks have all improved between 0 and 35%
Except for threaded random triangles with flat color. That metric has increased 106%. As in its twice as fast as before
I have no idea why but I'm fairly sure I've ruled out bugs in my benchmarking
I am very confused
@eniko do you already have experience with the kind of profiler that lets you get performance counter values?
on linux my go-to first step is
perf stat -d ./myprogram(-d for details gives a couple more numbers. gotta have numbers!) then you'll see a few numbers that may point at a drastic difference.I'm thinking a higher instruction per cycle number probably means fewer instructions that take many cycles (though I hear integer division is much better nowadays?), or your cache hit rate for data or instruction cache may be a lot better, or maybe your code ends up with fewer total instructions for some reason?
-
@lina I don't know? >_> I just split the screen between 4 worker threads that all draw each triangle to their quadrant
-
@eniko do you already have experience with the kind of profiler that lets you get performance counter values?
on linux my go-to first step is
perf stat -d ./myprogram(-d for details gives a couple more numbers. gotta have numbers!) then you'll see a few numbers that may point at a drastic difference.I'm thinking a higher instruction per cycle number probably means fewer instructions that take many cycles (though I hear integer division is much better nowadays?), or your cache hit rate for data or instruction cache may be a lot better, or maybe your code ends up with fewer total instructions for some reason?
@timotimo I'm incredibly new at running benchmarks at this level so I don't really know what that is
-
@eniko one would assume 4 times as fast with four threads, not 30% faster. But I don’t know exactly what the code is doing without seeing it
@slyecho the 30% improvement was over the same implementation with floating point
-
@slyecho the 30% improvement was over the same implementation with floating point
@slyecho as in threaded flat color random triangles with fixed point is 2x as fast as threaded flat color random triangles with floating point
-
I have completed the triangle rasterizer fixed point conversion
The benchmarks have all improved between 0 and 35%
Except for threaded random triangles with flat color. That metric has increased 106%. As in its twice as fast as before
I have no idea why but I'm fairly sure I've ruled out bugs in my benchmarking
I am very confused
To be clear, everything improved 0-35% from the previous implementation that used floating point after switching to fixed point
So the current threaded random triangles with flat color metric (fixed point) is 2x as fast as the previous threaded random triangles with flat color metric (floating point)
-
To be clear, everything improved 0-35% from the previous implementation that used floating point after switching to fixed point
So the current threaded random triangles with flat color metric (fixed point) is 2x as fast as the previous threaded random triangles with flat color metric (floating point)
@eniko from my experience the biggest upside of using fixed-point in rasterization is that you get exact sub-pixel precision with as many bits as you like (or need), and it doesn't depend on how far away from the origin you are. That alone would be worth it even without performance improvements
-
@eniko from my experience the biggest upside of using fixed-point in rasterization is that you get exact sub-pixel precision with as many bits as you like (or need), and it doesn't depend on how far away from the origin you are. That alone would be worth it even without performance improvements
@gabrielesvelto also helps if you wanna run it on really old CPUs >_>
-
@gabrielesvelto also helps if you wanna run it on really old CPUs >_>
@gabrielesvelto also to be clear getting a +100% performance boost is *good* I'm just having a hard time it's not a benchmarking bug. But if it is a bug I sure can't find it, and the threading code is only 300 lines so it's not like there's a lot of places it could be hiding
-
@gabrielesvelto also to be clear getting a +100% performance boost is *good* I'm just having a hard time it's not a benchmarking bug. But if it is a bug I sure can't find it, and the threading code is only 300 lines so it's not like there's a lot of places it could be hiding
@eniko BTW are you using only scalar math or are you leveraging SIMD extensions? IIUC one of the advantages of fixed-point math is that you could implement some stuff on x86 even with the oldest, crustiest SIMD stuff (hello MMX!) and get at least some benefits
-
@midnaw not really no, they're rarely used nowadays
-