CopyPastor

A CPU should only predict when it too early to know for sure. e.g. right after decoding a conditional branch, before fetching the register operands and comparing them.
An unconditional direct branch's target is known immediately after decode, so no, you shouldn't use an incorrect "prediction" you already know is wrong.
----
Prediction for unconditional direct branches is a thing in real CPUs, but only before decode in the fetch stage. Without prediction, the fetch stage will fetch the instruction after the branch because the decode stage hasn't yet decoded the branch to figure out where we should be fetching. So there's a bubble in the pipeline from that useless fetch. Especially in a longer and wider pipeline, it's useful to predict which 16-byte chunk of machine code to fetch next, i.e. to predict the *existence* of a branch before it's even decoded. *https://stackoverflow.com/questions/38811901/slow-jmp-instruction* shows an example of this being fast (on a modern x86) with correct predictions vs. slow when there are so many `jmp`s that the BTB (branch target buffer) doesn't have room to hold all the predictions.
Unconditional *indirect* jumps like `jr $a0` would also be predicted in high-performance CPUs, where again you have to predict the whole target address, not just taken / not-taken. Especially with out-of-order exec, the register value might not be available for many cycles. In a simple MIPS that's not needed: the register value would already be fetched in parallel with the logic that looks at the opcode/funct fields for the R-type instruction and decides it's a jump.
---
### Real MIPS classic-RISC pipelines don't need any branch prediction
MIPS has a [branch delay slot][1]: the instruction after the branch executes before control transfers to the branch target. This one delay slot is enough to fully hide branch latency in MIPS R2000 / R3000, thanks to clever design using half-cycles in EX and IF. See *https://stackoverflow.com/questions/56586551/how-does-mips-i-handle-branching-on-the-previous-alu-instruction-without-stallin*
If the version of MIPS you're learning doesn't have that, it's a simplified MIPS or something, not the real MIPS ISA that commercial MIPS CPUs implement. That's fine for teaching, but you might as well learn RISC-V, which doesn't have a branch delay slot but is a lot like MIPS in many ways. (RISC-V's machine-code instruction formats aren't as simple as MIPS, designed to make hardware decode of immediates need as few gates as possible, especially keeping the critical path as short as possible for sign-extension of immediates.)
It wasn't until later higher-performance MIPS CPUs that branch prediction became useful. (And the branch-delay slot basically became an inconvenience; exposing a pipeline detail of the classic 5-stage in-order design, which later CPUs had to preserve compat with even though they didn't work like that.)
[1]: https://en.wikipedia.org/wiki/Delay_slot

It does sometimes help in code with jump tables optimized following Intel's recommendation to make one of the targets (preferably the most common) a fall-through, often possible for `switch` statements. (Not so much for indirect `call`)

I think that behaviour was designed into P6 long before hyperthreading was a thing so stealing execution resources from the other logical core wasn't a thing for a path of execution that's unlikely to be the correct one.
Oldest-ready-first uop scheduling makes this speculative path of execution not very likely to steal cycles from code before the jump. I think an in-flight `div` or load can get cancelled without waiting for it to complete, so that shouldn't be a big factor; do you have any data to support your concern about resource conflicts with earlier work from the known-good path of execution? I guess if a mis-speculated load used up an LFB waiting for a useless cache line, that could delay progress on a useful load whose address wasn't ready until just after that. And it can of course pollute caches and TLBs.
Spectre was only conceived around 2017; before that, CPU architects weren't even considering any kind of security threat from speculative execution that didn't affect the architectural state. If any Intel architects had any conception of that kind of vulnerability back in the early 90s when P6 was being designed, Meltdown wouldn't have been a thing, nor most MDS vulnerabilities.
---
If the CPU *did* stop fetching, something would need to restart it. I guess perhaps executing the indirect `jmp` / `call` that produces a correct address could trigger that, but it might need a special mechanism? (Or maybe not, by the time I finished writing this section, I think probably not.)
`ud2` / `int` trap if they reach retirement, which is a whole complicated thing that always involves restarting fetch from a new location, with the ROB (reorder buffer) and scheduler already empty since those instructions always stop fetch. Unlike an indirect call or jump which in your proposed design would still speculate if a branch-target prediction was available.
So I suspect there's a benefit in simplicity of the CPU internals for the current design, with fewer special cases in different parts of the CPU. That might not be a big deal in terms of number of transistors needed these days, but it might have been significant in first-gen P6.
The branch-recovery mechanism is obviously highly optimized to keep branch miss latency as low as possible. IDK if there's any obstacle to hooking into that mechanism for something that stopped fetch/decode and needs to restart it. Probably not; a mis-speculated `int` or `ud2` could have stopped fetch, and executing the branch needs to restart fetch.
So an indirect `jmp` or `call` already needs to be capable of restarting fetch, so probably it's not a big deal.

CopyPastor

Possible Plagiarism

Original Post