Double-Barreled Memory Controllers Boost PC Performance
Speedy CPUs grab most of the headlines, but as processors get faster, they put more pressure on other, less prominent parts of PC architecture to keep pace. That’s why perhaps this year’s biggest desktop technology trend is the move to higher-bandwidth memory architectures — specifically, to system chipsets with dual-channel memory controllers.
The memory controller, the interface between the CPU and system memory, is part of the Northbridge portion of the motherboard chipset. (See “Your PC’s Second Most Important Silicon” for an introduction to chipsets — Ed.) Besides handling data flow to and from the processor, the memory controller also governs the system’s support for different types (such as SDRAM, DDR, or RDRAM) and speeds (such as DDR333 or PC1066) of memory, along with the maximum module size and installable memory ceiling. In short, it determines the type, size, and overall performance of a PC’s memory subsystem.
There are many memory controller designs, each with its own pros and cons. The standard configuration for most of today’s PCs is a single-channel architecture, like that of the Intel 845 line; VIA KT333, KT400 and P4X400; and SiS 645DX and 648, among others. In addition to high availability, this design has the advantages of low cost and excellent memory compatibility and flexibility.
On the negative side, a single-channel memory controller becomes a performance bottleneck when it can’t keep up with the CPU bus, leaving the processor to waste clock cycles with nothing to process. This is a problem in all low-priced Pentium 4 chipsets, as Intel’s flagship processors’ 533MHz front-side bus can easily overwhelm even a 400MHz DDR400 memory pipeline.
The appetite gap even affects relatively high-end chipsets like Intel’s 845PE, which supports the latest Pentium 4 chips and Hyper-Threading technology, but is limited to the 333MHz memory speed of DDR333 (a.k.a. PC2700). The latter type of memory is a great match for a 333MHz-bus Athlon XP, but pairing it with a 533MHz-bus Pentium 4 is like using a Porsche to drive in city traffic.
Dual Channel to the Rescue
To cure the bandwidth backup, chipset manufacturers have been quick to embrace the benefits of dual-channel memory architecture. Although there are many differences between models, the basic concept is as simple as opening a second checkout line at the supermarket — adding a second memory channel for theoretically double the bandwidth. If a single channel of DDR266 memory supplies 2.13GB/sec of bandwidth, then naturally a dual-channel memory controller (using two modules or sticks of DDR266) can enjoy 4.3GB/sec of bandwidth.
(Before going any further, let’s be sure to differentiate between dual-channel memory controller designs and the dual-data-rate (DDR) memory that many of them use. The acronym DDR refers to RAM that transfers data on both the rising and falling edges of each clock cycle [see “Making Sense of System Memory” — Ed.], thereby turning 133MHz SDRAM into 266MHz DDR SDRAM. A dual-channel DDR controller implements two separate memory channels, each compatible with DDR memory. One is a memory design and the other is a chipset design.)
Sometimes the simplest solution can be the most elegant, and this old axiom certainly applies to dual-channel memory controllers. The concept is almost one of “reduce, recycle, reuse,” in that it turns older, slower, and cheaper memory into an up-to-date speed demon by adding a second, parallel memory pathway. Instead of having to ratchet up memory clock speeds and creating timing and stability issues (e.g., waiting for someone to invent DDR533), dual-channel controllers simply take what’s widely available (e.g., DDR266) and double it.
Dual-Channel Memory Designs
Of course, there are technical challenges in getting a dual-channel memory controller to work smoothly, handling data flows and keeping the CPU fed without traffic jams or collisions. While there’s a ton of complex technical data available to explain the process, we’ll give a broader overview here.
Right now, there are two different strategies employed in dual-channel memory controllers, with the first being a platform with two independent channels in hardware. This is the format of Nvidia’s nForce and nForce2 chipsets for AMD Athlon XP systems, where each memory bank has its own memory channel and an arbiter distributes the load between them and plays traffic cop for incoming data.
This has the benefit of high actual memory bandwidth, but comes at the cost of some system overhead or latency associated with the arbiter (a matter addressed or greatly reduced by the nForce2), along with greater limitations on installed memory. With the nForce2, for instance, sticking with two memory modules is the best option for full dual-channel performance and stability.
The second strategy is to actually create a wider memory channel, thereby “doubling up” on standard DDR’s 64-bit data paths. This is common of Intel dual-channel memory controllers. In the case of the E7205 workstation chipset (diagrammed below), each pair of installed modules acts as a 128-bit dual-channel memory module, which can transfer twice as much data as a single-channel solution, with no need for an arbiter. Depending on the application, this solution may not be as flexible as two hardware channels, and it does require an innovative chipset design to handle the 128-bit incoming data streams.
In terms of how the dual-channel memory controller interacts with the rest of the system, this is mostly transparent to the chipset. For example, Intel confirms that its Pentium 4 CPU bus shares the same basic design across their corresponding chipset lines, and it’s the other interfaces that change. Basically, a Pentium 4’s CPU bus doesn’t know dual-channel from a TV channel. Nor does AMD’s system bus; in the case of the nForce family, Nvidia has design control of the memory component, while adhering to AMD’s specifications as far as CPU, AGP, and PCI interfaces are concerned.
There are also hardware limitations placed on dual-channel memory platforms, such as maintaining consistency between modules and how they are installed. Install memory modules incorrectly or mix types and speeds, and the system will either hang (i850E) or downshift to single-channel operation (E7205, nForce2). These are basic rules placed on the platform to ensure consistency and keep user errors to a minimum.
The Contenders: Past, Present and Future
In terms of dual-channel memory controllers for mainstream desktops, Intel really got the ball rolling (although we can argue about how “mainstream” it was) with the i850 dual-channel PC800 RDRAM chipset for the first, high-end Pentium 4 PCs. This was followed by the updated i850E, which added support for faster PC1066 RDRAM, 533MHz-bus processors, and some other goodies.
Officially, these dual-channel RDRAM platforms are to this day the top end of Intel’s desktop lineup. But recently, some dual-channel DDR chipsets, with up-to-the-minute bells and whistles such as AGP 8X support, have emerged.
The first of these was the impressive E7205 “Granite Bay” workstation chipset, whose dual-channel DDR266 support virtually matches the i850E/PC1066 combo in overall performance, and in some ways surpasses it. The key is that doubling up DDR266 matches the 4.3GB/sec bandwidth of a 533MHz-bus Pentium 4, while higher-end DDR33 or DDR400 modules can be used with ludicrously low latencies or memory timings. To be sure, E7205 motherboards don’t come cheap, but the incredible user interest in this workstation platform should spell success for the desktop-oriented, dual-channel DDR400 “Springdale” and max-performance “Canterwood” chipsets due from Intel later this spring.
Next up is the SiS 655, which again raised the bar with dual-channel DDR333 (and on some implementations, run-at-your-own-risk DDR400), though it also brought the usual SiS “A” versus “B” stepping issues as far as Hyper-Threading support. Both of these high-performance and very flexible dual-channel DDR platforms have all but closed the door on RDRAM as a viable, high-end Pentium 4 solution.
nForce2 and Beyond
On the AMD side, we have the nForce and nForce2, which are at this writing the only two chipsets to supply dual-channel goodness for the Athlon XP. The original nForce was in many ways too little too late, but the new and improved dual-channel DDR400 nForce2 has been a smash success — in fact, is today’s de facto choice for performance-minded AMD desktop buyers. Neither VIA or SiS has chimed in with a competitor, and the nForce2’s 400MHz CPU bus support may spell even higher sales if (or when) AMD moves its processor line to that level.
Here’s a chart that illustrates the maximum bandwidth for each memory controller type, along with similar totals for Pentium 4 and Athlon XP processors. Note that the chipset examples are not a complete list, and have been placed at their highest possible memory configuration (i.e., the nForce2 also supports dual-channel DDR333 but is listed under dual-channel DDR400).
Real-World Performance
One of the main goals in designing dual-channel memory interfaces is to match the CPU bus to the memory bus, thereby creating a synchronous link between the two. This is most evident with the Pentium 4; when using a 533MHz-bus processor with anything less than a theoretical 533MHz memory bus, the system is not running at optimized speeds. That’s why the Intel 850E and E7205 and SiS 655 are currently leading the Pentium 4 performance race, because these chipsets can supply the bandwidth the CPU needs.
However, this is really only part of the equation. In the case of the Athlon XP, the nForce2’s dual-channel DDR400 memory system looks like overkill — after all, even the fastest Athlon XP has only a 333MHz front-side bus, which should be fine for a standard, single-channel DDR333 chipset.
But in fact, today’s AGP 8X/Serial ATA/USB 2.0 platforms are already memory-bandwidth-starved, and any architecture that can keep these components supplied with data without having to borrow bandwidth from the CPU is a good thing. This has been proven by benchmark tests of the nForce2, which shows clear performance gains when switched from single- to dual-channel mode.
In terms of real-world application throughput, actual gains are highly dependent on your individual workload. Basic business programs like Word and Excel won’t suddenly surge ahead on a dual-channel platform, but toss in some high-end image editing, demanding 3D games, and heavy multitasking, and the benefits become clear. It’s all about bandwidth — and it’s why innovations such as dual-channel memory controllers, Hyper-Threading, and faster system buses have joined raw CPU speeds in the PC performance spotlight.