Skip to content

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone

I just enabled ASTC on Tegra with, effectively, one line of code and got it right on the first try.

Uncategorized
1 1 0
  • I just enabled ASTC on Tegra with, effectively, one line of code and got it right on the first try.

    Do you have any idea how amazing this is?

    You know. ASTC. The one format that has a non-power-of-two block size. The only format group NVIDIA supports where the block size isn't square.

    First try. Every single Vulkan CTS test for ASTC passes.

    On Intel, getting ASTC working was utter hell. We had so many image layout bugs. Block sizes of 4 were assumed all over everywhere. There were x/y mixups that only mattered for non-square blocks. The whole cooncept of a non-power-of-two didn't exist. Our block compressed format handling was also just plumb wrong.

    Then Lina and I wrote ISL and put a lot of time and thought into how we do layout calculations to avoid a lot of the anti-patterns in the old code. Everything has units. Everything is an isl_extent4d and those have helpers so you aren't banging on them manually. We carefully ensured that the calculation flow only went one direction and we never multiplied and then divided it back out later. ISL fixed so many bugs.

    With NVK, I wrote NIL which was based on our learnings from ISL. After Daniel helped me port NIL to Rust, I took it a step further and encoded the units in Rust types instead of just suffixes on variable names. This means you have to go out of your way to ever screw up a unit conversion.

    The result is ASTC in one line of code.

  • oblomov@sociale.networkundefined oblomov@sociale.network shared this topic on

Gli ultimi otto messaggi ricevuti dalla Federazione
Post suggeriti
  • 0 Votes
    2 Posts
    1 Views
    @cwebber hey me, hope you're doing well. out there in the future.i've been coding in react a lot lately. i think the unified data flow really makes sense in wrangling in this frontend complexity.i'm a little sad by all the build tooling though. It feels more complicated to scaffold a basic UI now than it was ten years ago. I'm sure ten years from now we'll have some sort of reactivity system without all the complex overhead.right? right?
  • 0 Votes
    1 Posts
    2 Views
    Taalas ha rilasciato un chip ASIC che esegue Llama 3.1 8B a 17.000 token al secondoTaalas ha praticamente inciso i 32 strati di Llama 3.1 in sequenza su un chip: i pesi del modello sono transistor fisici incisi nel silicio.Dovrebbe essere 10 volte più economico in termini di costi di gestione rispetto ai sistemi di inferenza basati su GPU e 10 volte meno energivoro. Non ci sono DRAM/HBM esterne, ma una piccola quantità di SRAM on-chip.https://www.anuragk.com/blog/posts/Taalas.html @aitech
  • oh look another one https://poc.bcachefs.org/

    Uncategorized
    14
    0 Votes
    14 Posts
    0 Views
    @jacqueline Our long-term plan is literally "go off in space and start the Culture." erm… 👀
  • 0 Votes
    31 Posts
    115 Views
    @casarayuela Caro Luigi, poc'anzi ho risposto ad un nuovo gruppo che vuole fare politica di sinistra vera...Ho detto loro che sono in tanti e che sarebbe utile che si unissero anziché generare ulteriore frammentazione, che sostenessero le mobilitazioni in ogni dove, che dessero spazio alle donne viste le esperienze disumane e criminali di uomini di destra, di donne che li scimmiottano e di pseudo sinistri facili al compromesso. Perché hai ragione tu, e saremo condannati ad