In the early days of personal computing CPU bugs were so rare as to be newsworthy.
-
All in all modern CPUs are beasts of tremendous complexity and bugs have become inevitable. I wish the industry would be spending more resources addressing them, improving design and testing before CPUs ship to users, but alas most of the tech sector seems more keen on playing with unreliable statistical toys rather than ensuring that the hardware users pay good money for works correctly. 31/31
@gabrielesvelto that was super fascinating. Thanks for the thread!
-
@gabrielesvelto thank you for this great, informative overview.
numerous times, i had asked myself if a reported crash could be caused by a hardware bug, and so far i would think i never saw a real case - possibly due to the software i work on running in more controlled environments.
but i would be curious how a crash from a real hardware bug could be classified automatically. do you have pointers to foss tools?@slink oh yes, we have tools for that. First however I'd point you to my thread about memory errors because those are even more common when analyzing crashes: https://fosstodon.org/@gabrielesvelto/112407741329145666
For crash analysis we have a rust crate to analyze minidumps, which we generate when Firefox crashes. The crate can be used both as a tool and as a library:
-
@slink oh yes, we have tools for that. First however I'd point you to my thread about memory errors because those are even more common when analyzing crashes: https://fosstodon.org/@gabrielesvelto/112407741329145666
For crash analysis we have a rust crate to analyze minidumps, which we generate when Firefox crashes. The crate can be used both as a tool and as a library:
@slink this crate can detect patterns that suggest a memory error was encountered or that the crash was inconsistent and thus most likely due to a hardware bug. If you check out the output schema of the tool you'll find two fields called "possible_bit_flips" and "crash_inconsistencies" that capture this information: https://github.com/rust-minidump/rust-minidump/blob/main/minidump-processor/json-schema.md
-
@slink oh yes, we have tools for that. First however I'd point you to my thread about memory errors because those are even more common when analyzing crashes: https://fosstodon.org/@gabrielesvelto/112407741329145666
For crash analysis we have a rust crate to analyze minidumps, which we generate when Firefox crashes. The crate can be used both as a tool and as a library:
@gabrielesvelto yes, i know the memory error thread, thank you. ECC absolutely is a must and in this regard i am glad that my code (usually) does not run on consumer devices. fwiw, relying on every single bit in a multi-tb ram system still feels scary at times, and it is amazing that these machines actually work.
thank you for the links! -
undefined stefano@mastodon.bsd.cafe shared this topic