fixed point was a mistake

gabrielesvelto@mas.to

@eniko from my experience the biggest upside of using fixed-point in rasterization is that you get exact sub-pixel precision with as many bits as you like (or need), and it doesn't depend on how far away from the origin you are. That alone would be worth it even without performance improvements

eniko@mastodon.gamedev.place

@gabrielesvelto also helps if you wanna run it on really old CPUs >_>

eniko@mastodon.gamedev.place

@gabrielesvelto also to be clear getting a +100% performance boost is *good* I'm just having a hard time it's not a benchmarking bug. But if it is a bug I sure can't find it, and the threading code is only 300 lines so it's not like there's a lot of places it could be hiding

gabrielesvelto@mas.to

@eniko BTW are you using only scalar math or are you leveraging SIMD extensions? IIUC one of the advantages of fixed-point math is that you could implement some stuff on x86 even with the oldest, crustiest SIMD stuff (hello MMX!) and get at least some benefits

etc@toot.wales

@eniko @midnaw if you stretch the definition a little, things like DateTime and TimeStamp in c# are fixed point… they have an underlying int representation counting timer ticks, and a ratio value that converts ticks to human-friendly units.

midnaw@idtech.space

@etc @eniko oh i hadn't thought about that

eniko@mastodon.gamedev.place

@oblomov @lina yes

scalar
flat random triangles/sec 1,049,510 -> 1,210,082
flat random pixels/sec 460,701,636 -> 529,507,748
+15%

tiled (x4 workers)
flat random triangles/sec 5,440,951 -> 11,253,417
flat random pixels/sec 2,388,405,031 -> 4,924,271,126
+106%

eniko@mastodon.gamedev.place

@gabrielesvelto i'm not using any SIMD atm so its scalar only

ataylor@mastodon.gamedev.place

@eniko it would take some staring at assembly to know for sure, but one possibility is that the lack of needing to care about floating point specials (inf, nan) lets the optimizer do a better job. Float semantics are hard for compilers to work around without fastmath (do not use fastmath.)

eniko@mastodon.gamedev.place

@ataylor but the single threaded random flat color triangles metric only improved by +15%

and all the threading does is take 4 worker threads, split the screen into 4 quadrants, and have each of them call the regular single-threaded renderer for every triangle on their quadrant

ataylor@mastodon.gamedev.place

@eniko that is quite odd. What is the relative speedup between threaded and unthreaded for each? (Like, float single threaded versus multi threaded and so on.)

eniko@mastodon.gamedev.place

@ataylor
random flat color triangles x9.3
random gouraud triangles x4.7
fullscreen flat color triangles x2.2
fullscreen gouraud triangles x2.8

lina@vt.social

@eniko @oblomov Did the conversion only affect the pure computation part, or did some buffer formats/sizes change too?

I'm thinking if you did something like f32 to u8 per channel for the framebuffer, the tiles now fit in cache and that can be a huge speedup.

Verify by testing just one of the four quadrant threads, or alternatively just non threaded version with lower resolution.

slyecho@mdon.ee

@eniko Yeah that makes sense, ints are faster.

ataylor@mastodon.gamedev.place

@eniko that seems… weird. I would expect close to 4x for all of these, tbh.

eniko@mastodon.gamedev.place

@lina @oblomov changing the resolution to 160x100 from 320x200 and running it single thread and multiplying fillrate by 4, the threaded version is still almost 4x as fast

eniko@mastodon.gamedev.place

@lina @oblomov if i drop the number of threads to 2 then perf goes down to only 30% of 4 threads

oblomov@sociale.network

@eniko @lina moving to 4 threads gives you an over 5x performance improvement in the old code which is … odd, and for the new code it's over 9x. These are both very strange single/multithreaded numbers

eniko@mastodon.gamedev.place

@lina @oblomov if i turn 3 of the 4 worker threads idle so they dont render triangles at all, then divide the fillrate for the remaining thread by 4, the perf boost from single thread to "threaded" is x4.47

kate@hai.z0ne.social

@eniko@mastodon.gamedev.place @lina@vt.social @oblomov@sociale.network ..probably cache performance getting better due to more thread-wise allocation of space??

Piero Bosio Social Web Site Personale

fixed point was a mistake

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

I wrote yery late instead of very late and it didn't sound wrong tbh

それはそれとしてThisIsMissEm大先生がグラントを得られたことは良い話である。同氏が長いこと資金難そうだったのは気掛かりだったし

Come in Austin Powers, anch'io adesso ho il 'maipiùmoscio' 🤣

This is a AI-tailored LinkedIn ad with my name and position.