In the early days of personal computing CPU bugs were so rare as to be newsworthy.
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. 🧵 1/31
-
In the early days of personal computing CPU bugs were so rare as to be newsworthy. The infamous Pentium FDIV bug is remembered by many, and even earlier CPUs had their own issues (the 6502 comes to mind). Nowadays they've become so common that I encounter them routinely while triaging crash reports sent from Firefox users. Given the nature of CPUs you might wonder how these bugs arise, how they manifest and what can and can't be done about them. 🧵 1/31
The root of all these issues is fundamentally the same: complexity. Modern cores have become so complex that it's impossible to demonstrate at design time that they will work reliably under all possible conditions, and thoroughly testing them is also infeasible. In addition to ever increasing logic complexity the the conditions in which they operate have also changed: fixed voltages and frequencies are a thing of the past, complicating physical design. 2/31
-
The root of all these issues is fundamentally the same: complexity. Modern cores have become so complex that it's impossible to demonstrate at design time that they will work reliably under all possible conditions, and thoroughly testing them is also infeasible. In addition to ever increasing logic complexity the the conditions in which they operate have also changed: fixed voltages and frequencies are a thing of the past, complicating physical design. 2/31
Let's start with logic bugs. As you probably know a CPU has a certain amount of visible state: a set of registers that contain data and which is manipulated by the instructions, an instruction pointer pointing holding the address of the executing instructions, a set of special registers to alter the core's behavior, for example by changing how floating-point operations round their operands. 3/31
-
Let's start with logic bugs. As you probably know a CPU has a certain amount of visible state: a set of registers that contain data and which is manipulated by the instructions, an instruction pointer pointing holding the address of the executing instructions, a set of special registers to alter the core's behavior, for example by changing how floating-point operations round their operands. 3/31
In the early days of integrated CPUs this state was not only visible to the user, but corresponded physically to what was in the core. The registers corresponded to entries in an actual bank of SRAMs inside the core, the instruction pointer was a physical register that would be read every cycle to fetch instructions from memory. In today's CPUs all these things are merely abstractions and the underlying physical reality is dramatically more complex. 4/31
-
In the early days of integrated CPUs this state was not only visible to the user, but corresponded physically to what was in the core. The registers corresponded to entries in an actual bank of SRAMs inside the core, the instruction pointer was a physical register that would be read every cycle to fetch instructions from memory. In today's CPUs all these things are merely abstractions and the underlying physical reality is dramatically more complex. 4/31
Modern CPUs contain a tremendous amount of state that they need to track. Hundreds of instructions can be in-flight at any given moment, each of them operates on physical registers which are assigned just-in-time via a mechanism that maps the registers in the ISA to spare physical slots in very large banks. Each instruction is associated with a set of data that is entirely speculative for a very long time, including the instruction itself. 5/31
-
Modern CPUs contain a tremendous amount of state that they need to track. Hundreds of instructions can be in-flight at any given moment, each of them operates on physical registers which are assigned just-in-time via a mechanism that maps the registers in the ISA to spare physical slots in very large banks. Each instruction is associated with a set of data that is entirely speculative for a very long time, including the instruction itself. 5/31
Instruction addresses, operands, dependencies are tracked as the CPU fetches and executes a stream of instructions which might or might not have to be executed, depending on branch prediction. In the case of a misprediction the state of the "wrong" instructions needs to be discarded. Similarly instruction timing is not predictable anymore. Memory accesses can take anything from a few cycles to hundreds, and fetch their data through different structures both inside and outside of the core. 6/31
-
Instruction addresses, operands, dependencies are tracked as the CPU fetches and executes a stream of instructions which might or might not have to be executed, depending on branch prediction. In the case of a misprediction the state of the "wrong" instructions needs to be discarded. Similarly instruction timing is not predictable anymore. Memory accesses can take anything from a few cycles to hundreds, and fetch their data through different structures both inside and outside of the core. 6/31
Instruction faults cannot be predicted either: if a memory access fails because it access a protected memory area the flow of instructions must be stopped, undoing everything that should have happened before the faulting instruction and steering the core towards executing code provided by the operating system for such case, giving the impression that execution stopped right there in a perfectly sequential way. 7/31
-
Instruction faults cannot be predicted either: if a memory access fails because it access a protected memory area the flow of instructions must be stopped, undoing everything that should have happened before the faulting instruction and steering the core towards executing code provided by the operating system for such case, giving the impression that execution stopped right there in a perfectly sequential way. 7/31
And that's without mentioning the large amount of hidden state carried by a CPU purely for performance reasons: physical-to-memory address translations are done via tables stored in memory, but this data needs to be cached in a translation lookaside buffer inside the core. Cache lines can be shared by different cores and must track their state, are they owned by a core? Shared? Is the data dirty locally and needs to be fetched remotely? 8/31
-
And that's without mentioning the large amount of hidden state carried by a CPU purely for performance reasons: physical-to-memory address translations are done via tables stored in memory, but this data needs to be cached in a translation lookaside buffer inside the core. Cache lines can be shared by different cores and must track their state, are they owned by a core? Shared? Is the data dirty locally and needs to be fetched remotely? 8/31
The execution of every single instruction can alter a significant chunk of this enormous amount of state and must do so reliably. But as I mentioned before it's impossible to test all possible combinations and some sequences might lead to inconsistent or corrupted state, which in turn will manifest itself as a software bug. 9/31
-
The execution of every single instruction can alter a significant chunk of this enormous amount of state and must do so reliably. But as I mentioned before it's impossible to test all possible combinations and some sequences might lead to inconsistent or corrupted state, which in turn will manifest itself as a software bug. 9/31
Here's a few examples I've encountered: the instruction pointer is fetched from the stack while returning from a function call but it appears wrong, possibly because the wrong instruction pointer was sent to the instruction fetch pipeline: https://bugzilla.mozilla.org/show_bug.cgi?id=1746270 10/31
-
Here's a few examples I've encountered: the instruction pointer is fetched from the stack while returning from a function call but it appears wrong, possibly because the wrong instruction pointer was sent to the instruction fetch pipeline: https://bugzilla.mozilla.org/show_bug.cgi?id=1746270 10/31
The code expected a piece of data to be loaded from memory, but the load/store unit returned stale data from a previous fetch: https://bugzilla.mozilla.org/show_bug.cgi?id=1687914 11/31
-
The code expected a piece of data to be loaded from memory, but the load/store unit returned stale data from a previous fetch: https://bugzilla.mozilla.org/show_bug.cgi?id=1687914 11/31
The instruction pointer associated with an instruction is corrupted, and what appears to be the currently executing instruction is not. You can tell because a load causes a store exception, or a jump causes an access exception: https://bugzilla.mozilla.org/show_bug.cgi?id=1820832 12/31
-
The instruction pointer associated with an instruction is corrupted, and what appears to be the currently executing instruction is not. You can tell because a load causes a store exception, or a jump causes an access exception: https://bugzilla.mozilla.org/show_bug.cgi?id=1820832 12/31
Other bugs could have even worse effects, such as AMD's infamous Barcelona TLB bug which would put the core from which recovery wasn't possible, effectively halting execution: https://arstechnica.com/gadgets/2007/12/linux-patch-sheds-light-on-amds-tlb-errata/ 13/31
-
Other bugs could have even worse effects, such as AMD's infamous Barcelona TLB bug which would put the core from which recovery wasn't possible, effectively halting execution: https://arstechnica.com/gadgets/2007/12/linux-patch-sheds-light-on-amds-tlb-errata/ 13/31
In all these cases the likely culprit is a bug in the machinery that tracks the internal CPU state when an unlikely sequence of events happens: a rapid series of interrupt or context switches, execution timing of certain instructions while the processor leaves or enters a particular mode of execution. These are not unlikely software bugs where you missed checking a particular condition at a specific time, and most of the time it doesn't matter except for that one time when it does. 14/31
-
In all these cases the likely culprit is a bug in the machinery that tracks the internal CPU state when an unlikely sequence of events happens: a rapid series of interrupt or context switches, execution timing of certain instructions while the processor leaves or enters a particular mode of execution. These are not unlikely software bugs where you missed checking a particular condition at a specific time, and most of the time it doesn't matter except for that one time when it does. 14/31
Reading the errata of any relatively recent CPU you will find the same wording applied to every known issue: "Under complex microarchitectural conditions...". That's hardwarese for "a state we had not anticipated we could end up in". Try looking it up yourself on an errata document such as this one: https://edc.intel.com/content/www/us/en/secure/design/confidential/products-and-solutions/processors-and-chipsets/tiger-lake/11th-generation-intel-core-processor-family-specification-update/errata-details/ 15/31
-
Reading the errata of any relatively recent CPU you will find the same wording applied to every known issue: "Under complex microarchitectural conditions...". That's hardwarese for "a state we had not anticipated we could end up in". Try looking it up yourself on an errata document such as this one: https://edc.intel.com/content/www/us/en/secure/design/confidential/products-and-solutions/processors-and-chipsets/tiger-lake/11th-generation-intel-core-processor-family-specification-update/errata-details/ 15/31
Now you might wonder if these kinds of bugs can be fixed after the fact. Well, sometimes they can, sometimes they can't. CPUs are not purely hard-coded beasts, they rely on microcode for part of their operation. Traditionally microcode is a set of internal instructions that the CPU ran to execute external instructions. That's mostly not the case anymore, and modern microcode ships not only with implementations of complex instructions but also a significant amount of configuration. 16/31
-
Now you might wonder if these kinds of bugs can be fixed after the fact. Well, sometimes they can, sometimes they can't. CPUs are not purely hard-coded beasts, they rely on microcode for part of their operation. Traditionally microcode is a set of internal instructions that the CPU ran to execute external instructions. That's mostly not the case anymore, and modern microcode ships not only with implementations of complex instructions but also a significant amount of configuration. 16/31
As an example microcode can be used to disable certain circuits. Imagine something like a loop buffer, a structure that captures decoded instructions and re-executes them in a loop bypassing instruction fetches. If it turns out to be buggy a microcode update might disable it entirely, effectively sacrificing an optimization for stability. 17/31
-
As an example microcode can be used to disable certain circuits. Imagine something like a loop buffer, a structure that captures decoded instructions and re-executes them in a loop bypassing instruction fetches. If it turns out to be buggy a microcode update might disable it entirely, effectively sacrificing an optimization for stability. 17/31
When implementing a new core it is commonplace to implement new structures, and especially more aggressive performance features, in a way that makes it possible to disable them via microcode. This gives the design team the flexibility to ship a feature only if it's been proven to be reliable, or delay it for the next iteration. 18/31
-
When implementing a new core it is commonplace to implement new structures, and especially more aggressive performance features, in a way that makes it possible to disable them via microcode. This gives the design team the flexibility to ship a feature only if it's been proven to be reliable, or delay it for the next iteration. 18/31
Microcode can also be used to work around conditions caused by data races, by injecting bubbles in the pipeline under certain conditions. If the execution of two back-to-back operations is known to cause a problem it might be possible to avoid it by delaying the execution of the second operation by one cycle, again trading performance for stability. 19/31
-
Microcode can also be used to work around conditions caused by data races, by injecting bubbles in the pipeline under certain conditions. If the execution of two back-to-back operations is known to cause a problem it might be possible to avoid it by delaying the execution of the second operation by one cycle, again trading performance for stability. 19/31
However not all bugs can be fixed this way. Bugs within logic that sits on a critical path can rarely be fixed. Additionally some microcode fixes can only be made to work if the microcode is loaded at boot time, right when the CPU is initialized. If the updated microcode is loaded by the operating system it might be too late to reconfigure the core's operation, you'll need an updated UEFI firmware for some fix to work. 20/31