If you’re still using Intel’s Itanium processors, you’d better get your orders in soon. Intel has announced that it will fulfill the final shipment of Itanium 9700 processors on July 29, 2021. The company says orders must be placed no later than January 30, 2020 (spotted by Anandtech).
The Itanium 9700 line of four- and eight-core processors represents the last vestiges of Intel’s attempt to switch the world to an entirely new processor architecture: IA-64. Instead of being a 64-bit extension to IA-32 (“Intel Architecture-32,” Intel’s preferred name for x86-compatible designs), IA-64 was an entirely new design built around what Intel and HP called “Explicitly Parallel Instruction Computing” (EPIC).
High performance processors of the late 1990s—both the RISC processors in the Unix world and Intel’s IA-32 Pentium Pros—were becoming increasingly complicated pieces of hardware. The instruction sets the processors used were essentially serial, describing a sequence of operations to be performed one after the other. Executing instructions in that exact serial order limits performance (because each instruction must wait for its predecessor to be finished), and it turns out isn’t actually necessary.
There are often instructions that don’t depend on each other, and they can be executed simultaneously. Processors like the Pentium Pro and DEC Alpha analyzed the instructions they were running and the dependencies between them, and those used this information to execute instructions out of order. They extracted parallelism between independent instructions, breaking free from the strictly serial order that the program code implies. These processors also performed speculative execution; an instruction depending on the result of another instruction can still be executed if the processor can make a good guess at what the result of the first instruction is. If the guess is right, the speculative calculation is used; if the guess is wrong, the processor undoes the speculation and retries the calculation with the correct value.
The processor must still act “as if” it’s running instructions serially, one by one, in the exact order that the program determines. Considerable processor resources are dedicated to handling this; first figuring out which instructions can be run in parallel and out of order, and then putting things back together again when updating system memory, to ensure the illusion of serial execution is preserved . Instead of putting all this complexity in the processor, Intel’s idea for IA-64 was to put it into the compiler. Let the compiler identify which instructions can be run simultaneously, and let it tell the processor explicitly to run those independent instructions in parallel. With this approach, the processor’s transistors could be used for things like cache and functional units—the first-generation IA-64 processors could run six instructions in parallel, and the current chips can run a whopping 12 instructions in parallel—instead of using those transistors for all the machinery to handle the out-of-order, speculative execution.
Theory meets reality
This was a nice idea, and indeed for some workloads—particularly heavy-duty floating point number crunching—Itanium chips performed decently. But for common integer workloads, Intel discovered a problem that compiler developers had been warning the company about all along: it’s actually very hard to figure out all those dependencies and know which things can be done in parallel at compile time.
For example, loading a value from memory takes a varying amount of time. If the value is in the processor’s cache, it can be very quick, fewer than 10 cycles. If it is in main memory, it may take a few hundred cycles to load. If it’s been paged out to a hard disk, it could be billions of cycles before the value is actually available for the processor to use. An instruction that depends on that value might thus become ready for execution within a handful of nanoseconds, or a billion of them. When the processor is dynamically choosing which instructions to run and when, it can handle this kind of variation. But with EPIC, the scheduling of instructions is fixed and static. The processor has no way of carrying on with other work while waiting for a value to be fetched from memory, and it can’t easily fetch values “early” so that they’ll be available when they’re actually needed.
This problem alone was likely insurmountable, at least for general-purpose computing. But Itanium then faced challenges even in those fields where it showed some strength. The initial Itanium hardware included hardware-based IA-32 compatibility, so it could run existing x86 software, but it was much slower than contemporaneous x86 processors. For companies wanting to transition their software from 32-bit to 64-bit, this wasn’t very satisfactory. During the transition, the ability to run mixed workloads (some software 32-bit, some 64-bit) is valuable. IA-64 didn’t really offer this transitional path; it could run 64-bit software at native speed but took a big hit for 32-bit software, and the x86 chips that were good at 32-bit software couldn’t run IA-64 software at all.
Intel’s competitor AMD also wanted to build 64-bit processors, but without the resources to come up with an all-new 64-bit architecture, AMD did something different. Its AMD64 architecture was developed as an extension to x86 that supported 64-bit computation. AMD didn’t want to fundamentally change how processors and compilers worked; AMD64 processors continued to use the same out-of-order execution and complex hardware as was found in high-performance IA-32 chips (and which continues to be essential to high-performance processors to this day). Because AMD64 and IA-32 were so similar, the same hardware could be easily designed to handle both, and there was no performance hit to running 32-bit software on the 64-bit chips, so transitional, mixed workloads could run unhindered.
This made AMD64 much more appealing to developers and enterprises alike. Intel scrambled to create its own extension to IA-32, but Microsoft—which already supported IA-32, IA-64, and AMD64—told the company that it wasn’t willing to support a second 64-bit extension to x86, leaving Intel with little choice but to adopt AMD64 itself. It duly did so (albeit with some incompatibilities), under the name Intel 64.
IA-64 left with no place to go
This squeezed out Itanium from most markets. AMD64 offered the transitional path from IA-32, so it won over the enterprise and swiftly moved down into the consumer space, too. Itanium still had a few tricks up its sleeve—Intel’s most advanced reliability, availability, and serviceability (RAS) features made their debut with Itanium first, so if you needed a system that could take serious problems like memory failures and processor failures in stride, Itanium was, for a time, the way to go. But for the most part, these features are now available in Xeon chips, eliminating even that advantage.
The proliferation of vector instruction sets—AMD64 made SSE2 mandatory, and Intel’s AVX512 adds substantial new capabilities—also means that it’s still possible, in some ways, to explicitly instruct the processor to perform operations in parallel, albeit in a fashion that’s much more constrained. Rather than bundles of different instructions all meant to be performed simultaneously, the vector instruction sets perform the same instruction to multiple pieces of data simultaneously. This is not as rich and flexible as the EPIC idea, but it turns out to be good enough for many of those same number-crunching workloads that Itanium excelled at.
Currently, the only vendor still selling Itanium machines is HPE (the enterprise company that came from HP’s 2014 split) in its Integrity Superdome line, which runs the HP-UX operating system. Superdome systems offer a particular emphasis on RAS, which once made Itanium a good fit, but now they can be equipped with Xeon chips. Those, rather than Itanium, have a long-term future. HPE will support systems up to at least 2025, but with the end of manufacturing in 2021, the machines will be living on borrowed time.