Managed to shave off another 0.7ns from zmij on my EPYC Milan.
-
Managed to shave off another 0.7ns from zmij on my EPYC Milan.
-
Managed to shave off another 0.7ns from zmij on my EPYC Milan.
In related news: SSE sucks
-
In related news: SSE sucks
Not spent much time with it. Why?
I have tried the various AVX instr sets a few times and found it ok.
Is that aiming too high in the CPU requirements?
-
Not spent much time with it. Why?
I have tried the various AVX instr sets a few times and found it ok.
Is that aiming too high in the CPU requirements?
@oschonrock No integer FMA and other API limitations that result in the SSE implementation in zmij being twice as large as its NEON counterpart while being significantly slower. Have to workaround API limitations in an awkward way which affects other parts of the algorithm.
-
@oschonrock No integer FMA and other API limitations that result in the SSE implementation in zmij being twice as large as its NEON counterpart while being significantly slower. Have to workaround API limitations in an awkward way which affects other parts of the algorithm.
@oschonrock I guess part of it is general x86 terribleness.
-
@oschonrock I guess part of it is general x86 terribleness.
-
@oschonrock SSE4.1 is 2006 which is only slightly older than NEON
-
@oschonrock SSE4.1 is 2006 which is only slightly older than NEON
@oschonrock I think AVX is an overkill
-
@oschonrock I think AVX is an overkill
@vitaut @oschonrock I'd suggest going for it anyway. AVX is basically when Intel finally started getting SIMD right.