Intel held a virtual Architecture Day presentation, revealing details of the engineering behind several emerging products in the consumer and data center spaces. While the exact specs of CPUs and GPUs will have to wait for them to actually launch, we now have a better idea of the building blocks Intel uses to put them together. Intel SVP and GM of the Accelerated Computing Systems and Graphics group, Raja Koduri, led the presentation as several senior Intel engineers appeared.
the 12e Gen Core CPU lineup, codenamed ‘Alder Lake’, is expected to launch within the next few months, starting with desktop models. These will be the first mainstream Intel CPUs with a mix of powerful and energy-efficient cores – which is common in mobile SoCs today. This follows the experimental ‘Lakefield’ CPU which has only had a limited release so far. Alder Lake will adopt a more modular approach than before, with different combinations of logic blocks for different product segments.
Intel will use the terms Performance core and Efficient core, often abbreviated to P core and E core. For Alder Lake, the E cores are based on the ‘Gracemont’ architecture, while the P cores use the ‘Golden Cove’ design. Before Gracemont, Intel focused on physical silicon size and throughput efficiency, to target multi-threaded performance across a large number of individual cores. These cores operate at low voltage and will mainly be used by simpler processes.
The Golden Cove-based P-cores are designed for speed and low latency. Intel calls this the best performing core it has ever built. New to this generation is support for Advanced Matrix Extensions to accelerate deep learning training and inference.
Combined, this generation of P and E cores in the Alder Lake architecture will be highly scalable, from 9W to 125W, covering most of today’s mobile and desktop categories. It will be manufactured using the newly announced Intel 7 process, which is a rebrand of the 10nm ‘Enhanced SuperFIN’ process. Different implementations will integrate different combinations of DDR5, PCIe Gen5, Thunderbolt 4 and Wi-Fi 6E.
The desktop implementation will use a new LGA1700 socket with up to eight performance cores (two threads each), eight efficient cores (single-threaded), and 30 MB of last-level cache. The integrated GPU has up to 32 execution units for basic display output and graphics capabilities. It doesn’t have integrated Thunderbolt or an image processing block, but it supports 16 lanes of PCIe Gen5 plus an additional four lanes of PCIe Gen4. The matching motherboard platform controllers have up to 12 additional PCIe Gen4 and 16 PCIe Gen3 lanes.
Two mobile versions of Alder Lake were also discussed: a more mainstream chip with six P-cores and eight E-cores, and an ultra-compact chip with two P-cores and eight E-cores. Both will have 96-execution unit GPUs, as well as image processing units and integrated Thunderbolt controllers, and will target devices that don’t have discrete GPUs.
All of Alder Lake’s CPUs are made up of modular logic blocks – the CPU cores, GPU, memory controller, IO, and more. They support up to DDR5-4800, LPDDR5-5200, DDR4-3200 and LPDDR4X-4266 RAM, and it’s up to motherboard and laptop OEMs to decide which to implement. The modular blocks of each CPU are connected through three fabrics: Compute, Memory and IO. Intel describes 100GBps of compute fabric bandwidth per P-core or per cluster of four E-cores, for a total of 1000GBps between 10 such units. The last level cache can be dynamically adjusted between inclusive and exclusive depending on the load.
We now have some information on how workloads are distributed between P and E cores. Intel announces a new hardware scheduler called Thread Director, which will be completely transparent to software and will work with the OS scheduler to assign threads to different cores based on urgency and real-time conditions. Thread Director is designed to scale across mobile and desktop CPUs and can adapt to thermal and power conditions and migrate threads from one type of core to another, as well as manage multi-threading on the P-cores, with “nanosecond precision”.
Thread Director requires Windows 11, so Alder Lake will perform optimally under this upcoming operating system, although Windows 10, Linux, and other operating systems will also work. It means that the OS scheduler now understands which types of threads require which types of resources, and can prioritize latency, power savings, or other parameters depending on operating conditions.
Intel has been teasing its first high-end gaming GPU for a while, and is ramping up the hype with the recent announcement of a new Intel Arc brand for GPU hardware, software, and services. The first generation product is codenamed ‘Alchemist’ and will be launched in early 2022. This is a tier of the Xe architecture product stack known as Xe-HPG or High Performance Gaming. Alchemist is produced by TSMC on its N6 node. It supports hardware ray tracing and DirectX 12 Ultimate features such as mesh shading and variable speed shading.
Each first-generation Xe-HPG core has 16 vector engines and 16 matrix engines plus caches, enabling common GPU workloads and AI acceleration. Four such cores, plus four ray tracing units and other rendering hardware, make up a “slice”. Each Alchemist GPU can have up to eight such segments.
Now we also know that Intel will be rolling out its own version of AI upscaling, called XeSS (Xe Super Sampling), to take on Nvidia’s DLSS and AMD’s FSR. XeSS is an AI-based scaling method that combines information from previous frames. Intel claims up to 2x better performance by rendering at lower resolutions and then scaling up to the target resolution. XeSS even runs on Xe LP integrated GPUs, and multiple game developers are on board to support it.
While we don’t have GPU specs yet, Intel said it has been working to deliver “leadership performance” per Watt. We’ll be sure to find out more as the launch gets closer.
Intel also made several announcements related to its server and data center operations during Architecture Day, including a demonstration of the forthcoming Ponte Vecchio big data architecture that will form the basis for the Aurora exascale supercomputer. Other highlights included the modular ‘Sapphire Rapids’ Xeon Scalable platform, the oneAPI software stack and an emerging product category – Infrastructure Processing Units (IPUs), designed to separate infrastructure overheads from customer data and processing requirements in cloud-centric data centers.