Applied Micro's X-Gene: The First ARMv8 SoC
by Anand Lal Shimpi on November 14, 2011 1:44 PM EST- Posted in
- CPUs
- Arm
- AppliedMicro
- X-Gene
- SoCs
We covered the X-Gene announcement a couple of weeks ago when the news was first made public. I was in London at the time meeting with Nokia so I didn't get a chance to sit down with Applied Micro's engineers to discuss the SoC and its architecture. Thankfully, upon my return, they gave me the opportunity to do just that.
We've been hearing about ARM based servers for a while now, but their advantages have always been lower power consumption than beefy x86 servers for lighter workloads. You always sacrifice performance and memory addressibility. APM hopes to change that with its X-Gene.
Development on X-Gene began three years ago. APM was originally a PowerPC house. The company was working on a 64-bit PowerPC core internally before meeting with ARM and eventually redirecting its efforts to a 64-bit ARM core. Together with ARM, APM started laying the foundation for ARM's first 64-bit instruction set - now known as ARMv8.
At a time when everyone else was working on ARMv7 cores, this gave APM a headstart on the ARMv8 transition. As of now there is no officially announced, licensable ARMv8 core from ARM itself. I believe this makes the X-Gene the world's first ARMv8 SoC.
At a high level the X-Gene is pretty beefy. Each CPU core can fetch and decode up to four ARMv8 (or eight Thumb) instructions per clock. APM wouldn't reveal the depth of the pipeline, but it is targeting a 3GHz operating frequency at 28/40nm so it's safe to say that the pipeline is fairly deep. APM did add that it's not quite as deep as the Pentium 4, but rather in the sweet spot. I'd take that to mean we're looking at something around or just shy of 20 stages for the integer pipeline.
APM wouldn't go into detail on the back end configuration of the X-Gene, nor would it comment on other intracacies like branch predictors or cache configuration. We can learn a lot from the front end alone though. Cortex A15 features a 3-issue front end, and moving to 4 implies a generational gap in IPC. Note that we saw a similar transition going from the P6/NetBurst eras to Intel's Conroe (aka Core 2) architecture.
As the X-Gene implements the ARMv8 ISA it is a full 64-bit architecture that is backwards compatible with 32-bit ARMv7. The CPU features hardware virtualization acceleration, MMU virtualization, advanced SIMD instructions and what APM is calling a "very sophisticated" FPU, although once again details were scarce.
Despite the aggressive architecture, each core is estimated to consume only 2W per core. Like most mobile SoCs, the entire chip will idle at around 300mW.
At the SoC level, APM plans to integrate many of these CPU cores onto a single package. The range is officially 2 - 128 cores, although I expect we'll see something more reasonable than the extremes. The SoC also features integrated SATA (up to six 6Gbps ports per SoC) and two 10GbE controllers.
Each SoC can feature up to four 72-bit DDR3 (64-bit + ECC) memory controllers, although lower core count configurations will have fewer memory controllers.
You can plop multiple SoCs down on a single board, connected by a coherent interface that can deliver up to 400Gbps of bandwidth between chips.
APM's performance estimates put a 3GHz X-Gene at roughly half the integer performance of a 2.4GHz Sandy Bridge. The X-Gene advantage however is the ability to integrate many more cores. APM expects a quad-core X-Gene will be able to perform similarly to a dual-core Sandy Bridge Xeon, but with much lower power consumption.
Update: APM has since pulled the slide it shared with us originally making the comparison to Intel's Sandy Bridge architecture. The implication being that its performance estimates may have been a bit too aggressive, only time will tell...
These are all estimates today. The first customer evaluation boards will be available in March 2012. The X-Gene SoCs on the eval boards will be delivered as FPGAs. The ASIC version for actual deployment won't hit until the second half of next year. The first chips will be built on a 40nm process to get them to market quickly and cost effectively, but the design is expected to transition to 28nm afterwards. At 40nm we may not see such aggressive clocks or tons of cores per SoC.
APM expects that even with a late 2012 launch it will have a 1 - 2 year lead on the competition. If it can get the X-Gene out on time, hitting power and clock targets (both very difficult goals), the headstart will be tangible. Note that by the end of 2012 we'll only just begin to see the first Cortex A15 implementations. ARMv8 based competitors will likey be a full year out, at least.
There's also the question of whether or not enterprise customers want to move to an ARM based server platform. Unlike in the smartphone/tablet space, x86 is the incumbent in the server arena. Equal performance at lower power consumption is quite attractive, but there's still a lot of convincing that needs to be done. Not to mention that Intel does have the ability to build a competitive, Atom based solution.
More than anything it's good to see such strong competition at both the high end and low end of the microprocessor business. Threatening to disrupt the status quo in both is going to pave the way for progress in our industry.
13 Comments
View All Comments
bjacobson - Friday, November 18, 2011 - link
what would keep them from moving to ARM? The internet runs Linux, so I don't see why it would be a big deal to recompile and jump.dertechie - Saturday, November 19, 2011 - link
I'll believe it when they ship product and have the implied claims verified by 3rd parties. I know the performance numbers are for a picked benchmark (that is apparently very friendly to large numbers of RISC cores), but the implication is "X-Gene is going to wipe the floor with x86 (specifically Intel)".I have no problem believing they can build high performance ARM cores (they just have to architect for it, and they did that).
I do have a problem believing they can do 3 GHz clock speed and vastly improved IPC for that small of a power penalty, with interconnects at 400 Gbps (that is 50 GB/s, or 50 lanes of PCIe 3.0). Interfaces that fast aren't cheap to run, power wise. Something smells fishy. The ARM ISA may be nice and efficient, but it's not magic, it doesn't obviate physics.
Plus by 2013 when these come out, Haswell will integrate more onto the chip (more power savings), and Atom (the real competition here) will be at 22 nm, and probably much improved architecturally.
mitcoes - Monday, October 22, 2012 - link
I have just read the space can be shrink in more than 90%The energy costs cut by almost half
And we do not know the SoC prices but I bet it will be far below intel ones.
As you can put 2 SoCs or 3 to beat Intel performance, even at a desktop cpmputer will make it cheaper.
But we do not know the price yet
i will not have any problem in having a desktop system with 3 or 4 of this - cheap SoCs - and as they say the space and the energy cost is lower than x86 solutions perhaps for servers will be a good product too.