Managed to shave off another 0.7ns from zmij on my EPYC Milan.

vitaut@mastodon.social

vitaut@mastodon.social

In related news: SSE sucks

oschonrock@mastodon.social

@vitaut

Not spent much time with it. Why?

I have tried the various AVX instr sets a few times and found it ok.

Is that aiming too high in the CPU requirements?

vitaut@mastodon.social

@oschonrock No integer FMA and other API limitations that result in the SSE implementation in zmij being twice as large as its NEON counterpart while being significantly slower. Have to workaround API limitations in an awkward way which affects other parts of the algorithm.

vitaut@mastodon.social

@oschonrock I guess part of it is general x86 terribleness.

oschonrock@mastodon.social

@vitaut

could be..

SSE is truly anicent.. like 2002? or something

what about AVX?

vitaut@mastodon.social

@oschonrock SSE4.1 is 2006 which is only slightly older than NEON

vitaut@mastodon.social

@oschonrock I think AVX is an overkill

oblomov@sociale.network

@vitaut @oschonrock I'd suggest going for it anyway. AVX is basically when Intel finally started getting SIMD right.

Piero Bosio Social Web Site Personale

Managed to shave off another 0.7ns from zmij on my EPYC Milan.

Feed RSS

Gli ultimi otto messaggi ricevuti dalla Federazione

Post suggeriti

Conte: “Calendario assurdo ma andiamo a testa alta.

Comprensibilmente il #CannesFilmFestival promuove (anche) le eccellenze francesi.

Comprensibilmente il #CannesFilmFestival promuove (anche) le eccellenze francesi.

Preparatevi alla fine del mondo