In the early days of personal computing CPU bugs were so rare as to be newsworthy.
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. đź§µ 1/31
@gabrielesvelto I was just thinking about those bugs and something crept up my neck when i thought "now add hallucinating AIs to all that". Pretty sure they're already used in CPU development to deal with the increasing complexity.
So the problem will become much worse than it already is.👌 for the article. Loved it.
-
@gabrielesvelto I actually keep meaning to find a decent reference text on FET construction and modelling. I've got plenty on SI/EMI, power delivery, etc. but everything I've found for FETs has been the sort of thing that presumes you're either someone with a deep background in semiconductor physics or a professional semiconductor/ASIC engineer just looking for a reference text. very little out there for EE folks who are coming at it from the practical side.
@gsuberland @gabrielesvelto You mean you don’t want my college device physics textbook that starts with solving Schrödinger's equation for a hydrogen atom? They should not have allowed that class at 7am.
For the practical side, I really like Jacob Baker’s books.
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. đź§µ 1/31
@gabrielesvelto As someone involved in CPU design in those days I would frame it slightly differently. The bugs were there but we did a better job of providing work arounds, usually through compiler changes to avoid code sequences that would trigger the bug. The FDIV bug was memorable because of how many Pentium chips were in customer hands before it was discovered.
-
@gabrielesvelto wow, and where does it get the microcode from? Another computer within the computer? (turtles and all that :)
@mdione @gabrielesvelto in the same flash that contains UEFI. There's a set of headers that describe what is in the flash. That typically includes microcode for the chip generations supported by the motherboard. For example, a board that supports Zen2 and Zen3 will have two microcodes in the flash and the one that matches the CPU installed will be used
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. đź§µ 1/31
@gabrielesvelto @eniko I shouldn’t have read that while sick. Now every time I’m between sleep and wake, I have one of these feverish hallucinations that I’m a little worker inside a CPU core, waiting for a branch prediction to resolve, my hand on the button that dumps everything that was wrongly preloaded.
That’s a very boring job.
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. đź§µ 1/31
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. đź§µ 1/31
@gabrielesvelto thank you for this great, informative overview.
numerous times, i had asked myself if a reported crash could be caused by a hardware bug, and so far i would think i never saw a real case - possibly due to the software i work on running in more controlled environments.
but i would be curious how a crash from a real hardware bug could be classified automatically. do you have pointers to foss tools? -
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. đź§µ 1/31
@gabrielesvelto let’s assume .1 major bug per 1kByte binary code. For a 6502 or Z80, you get 6.4 bugs any given time. Now with 16 GByte main memory … it’s the scale that ruins it.
-
All in all modern CPUs are beasts of tremendous complexity and bugs have become inevitable. I wish the industry would be spending more resources addressing them, improving design and testing before CPUs ship to users, but alas most of the tech sector seems more keen on playing with unreliable statistical toys rather than ensuring that the hardware users pay good money for works correctly. 31/31
@gabrielesvelto that was super fascinating. Thanks for the thread!
-
@gabrielesvelto thank you for this great, informative overview.
numerous times, i had asked myself if a reported crash could be caused by a hardware bug, and so far i would think i never saw a real case - possibly due to the software i work on running in more controlled environments.
but i would be curious how a crash from a real hardware bug could be classified automatically. do you have pointers to foss tools?@slink oh yes, we have tools for that. First however I'd point you to my thread about memory errors because those are even more common when analyzing crashes: https://fosstodon.org/@gabrielesvelto/112407741329145666
For crash analysis we have a rust crate to analyze minidumps, which we generate when Firefox crashes. The crate can be used both as a tool and as a library:
-
@slink oh yes, we have tools for that. First however I'd point you to my thread about memory errors because those are even more common when analyzing crashes: https://fosstodon.org/@gabrielesvelto/112407741329145666
For crash analysis we have a rust crate to analyze minidumps, which we generate when Firefox crashes. The crate can be used both as a tool and as a library:
@slink this crate can detect patterns that suggest a memory error was encountered or that the crash was inconsistent and thus most likely due to a hardware bug. If you check out the output schema of the tool you'll find two fields called "possible_bit_flips" and "crash_inconsistencies" that capture this information: https://github.com/rust-minidump/rust-minidump/blob/main/minidump-processor/json-schema.md
-
@slink oh yes, we have tools for that. First however I'd point you to my thread about memory errors because those are even more common when analyzing crashes: https://fosstodon.org/@gabrielesvelto/112407741329145666
For crash analysis we have a rust crate to analyze minidumps, which we generate when Firefox crashes. The crate can be used both as a tool and as a library:
@gabrielesvelto yes, i know the memory error thread, thank you. ECC absolutely is a must and in this regard i am glad that my code (usually) does not run on consumer devices. fwiw, relying on every single bit in a multi-tb ram system still feels scary at times, and it is amazing that these machines actually work.
thank you for the links! -
undefined stefano@mastodon.bsd.cafe shared this topic