AMD's Future in Servers: New 7000-Series CPUs Launched and EPYC Analysis
by Ian Cutress on June 20, 2017 4:00 PM EST- Posted in
- CPUs
- AMD
- Enterprise CPUs
- EPYC
- Whitehaven
- 1P
- 2P
The big news out of AMD was the launch of Zen, the new high-performance core that is designed to underpin the product roadmap for the next few generations of products. To much fanfare, AMD launched consumer level parts based on Zen, called Ryzen, earlier this year. There was a lot of discussion in the consumer space about these parts and the competitiveness, and despite the column inches dedicated to it, Ryzen wasn’t designed to be the big story this year. That was left to their server generation of products, which are designed to take a sizeable market share and reinvigorate AMD’s bottom line on the finance sheet. A few weeks ago AMD announced the naming of the new line of enterprise-class processors, called EPYC, and today marks the official launch with configurations up to 32 cores and 64 threads per processor. We also got an insight into several features of the design, including the AMD Infinity Fabric.'
What’s in a Processor?
Today’s announcement of the AMD EPYC product line sees the launch of the top four CPUs, focused primarily at dual socket systems. The full EPYC stack will contain twelve processors, with three for single socket environments, with the rest of the stack being made available at the end of July. It is worth taking a few minutes to look at how these processors look under the hood.
On the package are four silicon dies, each one containing the same 8-core silicon we saw in the AMD Ryzen processors. Each silicon die has two core complexes, each of four cores, and supports two memory channels, giving a total maximum of 32 cores and 8 memory channels on an EPYC processor. The dies are connected by AMD’s newest interconnect, the Infinity Fabric, which plays a key role not only in die-to-die communication but also processor-to-processor communication and within AMD’s new Vega graphics. AMD designed the Infinity Fabric to be modular and scalable in order to support large GPUs and CPUs in the roadmap going forward, and states that within a single package the fabric is overprovisioned to minimize any issues with non-NUMA aware software (more on this later).
With a total of 8 memory channels, and support for 2 DIMMs per channel, AMD is quoting a 2TB per socket maximum memory support, scaling up to 4TB per system in a dual processor system. Each CPU will support 128 PCIe 3.0 lanes, suitable for six GPUs with full bandwidth support (plus IO) or up to 32 NVMe drives for storage. All the PCIe lanes can be used for IO devices, such as SATA drives or network ports, or as Infinity Fabric connections to other devices. There are also 4 IO hubs per processor for additional storage support.
In a dual socket arrangement, each CPU uses 64 PCIe lanes in Infinity Fabric mode to communicate with each other. This means there is still a total of 128 PCIe lanes to be used inside the system, but the total memory support has doubled.
Going BIG and Attacking The Market: All The Cores, Please
AMD is launching a total of nine parts aimed at dual socket use, and three parts for single socket servers. This is consummate with AMD’s strategy of stating that 90-95% of all servers in use today are either single or dual socket, and there will not be quad-socket options on AMD. The goal here is that some of the single socket processor options from AMD could easily replace dual-socket servers for a lower TCO and simplifying the environment by offering more memory and more IO than what is currently on the market.
The new processors from AMD are called the EPYC 7000 series, with names such as EPYC 7301 and EPYC 7551P. The naming of the CPUs is as follows:
EPYC 7551P
- EPYC = Brand
- 7 = 7000 Series
- 30/55 = Dual Digit Number indicative of stack positioning / performance (non-linear)
- 1 = Generation
- P = Single Socket, not present in Dual Socket
So in the future we will see EPYC 7302 processors, or if AMD scales out the design there may be EPYC 5000 processors with fewer silicon dies inside, or EPYC 3000 with a single die but for the EPYC platform socket (obviously, those last two are speculation).
But starting with the 2P processors:
AMD EPYC Processors (2P) | |||||||||
Cores Threads |
Frequency (GHz) | L3 | DRAM | PCIe | TDP | Price | |||
Base | All | Max | |||||||
EPYC 7601 | 32 / 64 | 2.20 | 2.70 | 3.2 | 64 MB | 8-Ch DDR4 2666 MT/s |
8 x16 128 PCIe |
180W | $4200 |
EPYC 7551 | 32 / 64 | 2.00 | 2.55 | 3.0 | 180W | >$3400 | |||
EPYC 7501 | 32 / 64 | 2.00 | 2.60 | 3.0 | 155W/170W | $3400 | |||
EPYC 7451 | 24 / 48 | 2.30 | 2.90 | 3.2 | 180W | >$2400 | |||
EPYC 7401 | 24 / 48 | 2.00 | 2.80 | 3.0 | 155W/170W | $1850 | |||
EPYC 7351 | 16 / 32 | 2.40 | 2.9 | 155W/170W | >$1100 | ||||
EPYC 7301 | 16 / 32 | 2.20 | 2.7 | 155W/170W | >$800 | ||||
EPYC 7281 | 16 / 32 | 2.10 | 2.7 | 32 MB | 155W/170W | $650 | |||
EPYC 7251 | 8 / 16 | 2.10 | 2.9 | 120W | $475 |
All CPUs will have 128 PCIe 3.0 lanes, most have access to all 64MB of L3 cache (except the bottom two), and all support DDR4-2666. AMD continually makes clear that all processors will support all the features involved, and the only differentiation point will be on cores, frequencies, and power.
Sitting on top of the stack is the EPYC 7601, sporting 32 cores with 64 threads, a base frequency of 2.2 GHz, an all-core boost of 2.7 GHz and a boost frequency of 3.2 GHz. Depending on the distribution of software across the cores, the chip should be at the boost frequency when fewer than 12 cores are in use, although other factors such as localized temperature in the core may affect this.
The next two CPUs look the same, but are slightly different. They both have a base frequency of 2.0 GHz, and a peak frequency of 3.0 GHz. Again, the peak frequency should be active when fewer than 12 cores are active. The differences come in the power: the EPYC 7551 is an 180W part, but the EPYC 7501 is listed as 155W/170W. We were told at the AMD Tech Day for EPYC that this 155W/170W listing is due to the fact that this CPU can support DDR4-2400 memory at 155W or DDR4-2666W memory at 170W. So then we have the EPYC 7551 at DDR4-2666 with a 180W TDP, and the EPYC 7501 at DDR4-2666 with a 170W TDP: we’re trying to extract from AMD if there is another difference for this, given that the EPYC 7501 is priced lower and has a lower TDP, but are waiting to hear a response back.
On the 24 core parts, the EPYC 7451 and the EPYC 7401, we have a similar set of differences: the 2.3 GHz part has a base frequency of 2.3 GHz, a maximum boost frequency of 3.2 GHz, and supports 180W, while the 2.0 GHz part has a turbo of 3.0 GHz but has the separate 155W/170W modes again. The EPYC 7401 has an all-core turbo of 2.8 GHz due to having fewer cores, but the threshold for this when in 155W mode is at eight cores. For the 24 core parts, AMD has disabled one core per core complex, leaving 3 per complex (so 6 per die, leading to 24 per chip).
The sixteen-core processors disable two cores per CCX, leading to four per die, but still with the full complement of cache and memory channels. These all have reduced frequencies over the bigger chips, and all come in 155W/170W flavors. These processors will not be out on day one, but we are told to expect OEMs offering systems with these chips in late July.
The final processor is somewhat of an odd-ball. The EPYC 7251 is an eight-core processor, running at a 2.1 GHz base frequency and a 2.9 GHz base frequency, but at 120W. By comparison, Ryzen 7 1700 is an eight-core processor at 3.0/3.7 GHz frequencies but only at 65W, so what is going on here? As mentioned above, all these EPYC 7000-series are based on quad-die designs, so this processor still has the full 700+mm2 of silicon, access to 32MB of L3 cache, access to 8 memory channels up to 2TB of memory, and a full set of PCIe lanes. The chip only has one core active per CCX, meaning that core-to-core latency will be higher than normal, but AMD’s strategy here is one about having a ‘memory optimized’ part. Their justification is that some workloads are not compute bound but DRAM bound. Here is the cheapest CPU in the stack, available for under $400 (or two for under $800), but for software that pays for licenses per core but is memory size bound to require 2TB/4TB, or are GPU bound, then this is the processor to get.
The final three processors are for single socket systems:
AMD EPYC Processors (1P) | |||||||||
Cores Threads |
Frequency (GHz) | L3 | DRAM | PCIe | TDP | Price | |||
Base | All | Max | |||||||
EPYC 7551P | 32 / 64 | 2.0 | 2.6 | 3.0 | 64 MB | 8-Ch DDR4 2666 MT/s |
8 x16 128 PCIe |
180W | $2100 |
EPYC 7401P | 24 / 48 | 2.0 | 2.8 | 3.0 | 155W/170W | $1075 | |||
EPYC 7351P | 16 / 32 | 2.4 | 2.9 | 155W/170W | $750 |
These SKUs mirror the specifications of the 2P counterparts, but have a P in the name.
A Side Note on Performance Claims
In our presentations about the launch, AMD wanted to make two things clear: these parts are designed to offer a lot better raw performance (as defined by SPECint) at every price point, and that these parts aren’t designed to compete with the current E5 v4 processors on the market, but with Skylake-SP. The slide that was presented showed this:
AMD is claiming up to +70% performance for a dual socket system, especially in the ~$800 CPU market which they predict will be the biggest element for sales. Along with this, AMD claimed that for some parts of the market, only one AMD processor will be needed to replace two Intel processors:
In this case, an EPYC 7281 in single socket mode is listed as having +63% performance (in SPECint) over a dual socket E5-2609v4 system.
I must stress, these are AMD numbers, and vendor numbers should always be taken with a degree of salt due to the risk of cherry picking. Furthermore, as AMD notes in their endnotes, the Intel numbers have been modified. "Scores for these E5 processors extrapolated from test results published at www.spec.org, applying a conversion multiplier to each published score." So we’re waiting to get the chips ourselves to do our own comparison testing.
The next page in this analysis is on NUMA and the Infinity Fabric.
131 Comments
View All Comments
vladx - Tuesday, June 20, 2017 - link
Lol what a shady move from AMD to reduce Intel CPUs' benchmark numbers in order to make Epyc appear better than it actually is, never change AMD never change.tamalero - Tuesday, June 20, 2017 - link
COUGH COUGH COUGH Yeah, because Intel never has done the same.. COUGH COUGH COUGH..https://www.extremetech.com/computing/193480-intel...
https://www.theinquirer.net/inquirer/news/1567108/...
vladx - Tuesday, June 20, 2017 - link
First there's a big difference between straight-out misleading customers and making backside deals with OEMs, and second that compiler crippling stuff is still unsubstantiated and Intel has no obligation towards AMD with regards to Intel's own compiler. AMD should make their own compiler that offers better or at least equal to Intel's own optimizations instead of using disgraceful tactics like that.galahad05 - Wednesday, June 21, 2017 - link
How's Intel doing fighting that enormous fine the EU levied against it for their underhanded tactics against AMD years ago?vladx - Wednesday, June 21, 2017 - link
Afaik they paid billions which AMD squandered like it was nothing.galahad05 - Wednesday, June 21, 2017 - link
Um.... Where to begin?The fine doesn't go to AMD. It goes to the European Commission....
So far Intel's lawyers have held the EC at bay all these years. Which technically means Intel got away with it....
Such is life.
Mugur - Wednesday, June 21, 2017 - link
What I don't understand from the slide with the prices: it looks like the 1P cpu is priced higher ($750 versus $650) than the 2P counterpart? I assume that any 2P cpu could be used in a 1P motherboard, but not the other way around.Zizy - Wednesday, June 21, 2017 - link
Well, the corresponding 2P part is >1.1k, so 1P is cheaper. No idea why there isn't 7301P instead and slightly cheaper than the bottom 2P, but I guess that 7351P looks better on the 2P vs 1P.1008anan - Wednesday, June 21, 2017 - link
Trying to calculate how many 32 bit floating point operations (FPO) a zen server completes per second:Assume a 2 socket Zen server with two 32 core chips; operating at 2.5 gigahertz:
512 bits wide vector, Fused Multiply Add, two FPO per clock = 64 FPO per clock = 512/32 * 2 * 2.
64 FPO/clock * 32 cores = 2048 FPO/clock
2048 FPO/clock * 2 sockets = 4096 FPO/clock
4096 FPO/clock * 2.5 gigahertz = 10 trillion FPO/second = 10 teraflops
Is this accurate? Is Zen approximately the same number of FLOPS as Skylake E5/E7?
edzieba - Wednesday, June 21, 2017 - link
An interesting diagram lurking on the corner of this slide: http://images.anandtech.com/doci/11551/epyc_tech_d...Could just be that the diagram is nonsense marketing bling, but that sure looks like external lanes are connected to only two of the 4 cores, with the remaining two getting 'passthrough' lanes.