Low Power Server CPUs

When you read about micro servers, the spotlights are on very low power servers with Atom and ARM servers. The reality in the server market is however very different.  But as you probably understand from the reasoning above, a server with an Atom CPU or ARM CPU is hardly a good solution for the homogenous webfarms.

In fact, with the exception of Seamicro, many micro servers have utterly failed in that market. Quite a few server vendors have offered dense Atom based solutions, but those solutions were poorly received. Simply cramming tens of Atom based servers in a small chassis is a pretty bad idea:

1.       The single threaded performance of the Atom is even for webfarms too low

2.       The power gains that you make by using lower power CPUs are negated by the cable management costs, the higher amount of PCBs and PHYs. 

To sum it up: the performance per watt of those servers did not and will not thrill anyone. Seamicro, the pioneer of this niche market, was successful despite and not thanks to the Atom processor inside it’s micro servers.  When Seamicro offered Low Power Xeons instead of Atoms inside their Borg Cube inspired server, the demand for their products really took off.

And even Calxeda, the champion of the low power servers admits that the current “ultra low power but low performance” microserver market is a small one.  Calxeda has high expectations for its next generation of servers as they will be based on the more powerful Cortex A15 and A57 CPUs.

Seamicro and Calxeda succeeded where others fails as they understood that an optimized PCB and network fabric was necessary to make the concept of micro server work.  Seamicro reduced the serverboard to its bare minimum (“credit card size”), turned unnecessary features off and connected all I/O via a high performance 3D torus interconnect.

Calxeda integrated several servers on one PCB and connected them together with a 2D torus network fabric. The result was a low power draw per server, not just per CPU.  Once you add a CPU with good enough single threaded performance, things get very interesting.

The Server Market Jungle Time for new server CPUs!
Comments Locked

80 Comments

View All Comments

  • zepi - Tuesday, June 18, 2013 - link

    The problem is, that AMD's big cores are so bad compared to Intel's big cores, that it makes next to no sense to compete against Intel with them. Perf/w is worse and manufacturing costs are far too high. To sell chips made out of these cores, they need to cut their prices so low, that they can't make any profits to pay for R&D or any other stuff. This applies to both server and desktop-market. Sure, you can sell expensive to manufacture multi-module phenoms for cheap-ass people who want best multicore performance per dollar, but what's the benefit when you can't make any money doing so?

    AMD is in dire need of a complete big-core CPU architecture renewal and with their R&D resources that just isn't probably going to happen any time soon. Unless they can pull some kind of a magical bunnyrabit from their hat, I don't see them being competetive in big-core ever again.

    They are shifting their target to those markets where they hope they can still compete.
  • JDG1980 - Tuesday, June 18, 2013 - link

    The big question right now is if Steamroller can fix the problems with AMD's construction equipment architecture or not. Official estimates are quite bullish, promising 15%-30% gains on a clock-for-clock basis. No doubt these are overly generous estimates and I take them with a grain of salt, but if AMD can increase actual IPC by 10% or more with Steamroller (rather than just cranking the clock speed higher) then there may be hope for the construction equipment cores. If not, then AMD's best bet is ditching that line altogether, and scaling up Jaguar or its successor so it's reasonably competitive on the desktop. The good thing is that Jaguar is already optimized for low power (which is where the Bulldozer lineage really falls short) and its IPC is pretty good. And they've already got some nice design wins with the PS4 and Xbone, which demonstrates that these cores are suitable for gaming (an "enthusiast" use). Perhaps they could backport some of the features that Bulldozer and its successors actually got right, like the improved branch predictor. (Or did they already do that with Jaguar?) After all, this is basically what Intel did when they dropped Netburst in favor of a revised version of the P6 architecture.
  • JPForums - Tuesday, June 18, 2013 - link

    @zepi "Sure, you can sell expensive to manufacture multi-module phenoms for cheap-ass people who want best multicore performance per dollar, but what's the benefit when you can't make any money doing so?"

    Even if you can't make any money, if you can break even it is still useful for keeping your employees employed. While a business doesn't have an inherent need to employ someone for the sake of employing them, in this case, it is useful for maintaining your talent pool. For a company like AMD, the engineers' work cycles are most likely punctuated with periods of high demand and low demand. When you have fewer product lines, this means their could be periods of time where they have no work to do at all while waiting for work to be completed farther up the line. Even a small loss is better that paying a chunk of employees to do nothing while waiting for the next thing to come down the pipe. Having more product lines allows you to even out such lulls by staggering releases and thus filling in the gaps from one product line with work from another.

    As an example, if the employees responsible for the low power line's layout only worked the low power line, many of them would have been left with nothing to do as they were waiting for the Jaguar architecture to be developed and simulated. Tweaks to the bobcat layout and preparations for the next node change would have kept some of them busy, but it is quite likely that many found work in the mainstream or even FX lines in the interim.
  • TiredOldFart2 - Tuesday, June 18, 2013 - link

    If you take a look at amd's assets, their portfolio and their current situation its not hard to see where they are headed. Money is usually made in the middle of the road. By this i mean most sales for enterprise class server cpus in this economic scenario will target a balance of sufficient computing power, price and power consumption.

    What would you, as a business owner, opt for your average vm server for your average medium business needs? the $600, 95w e5-2630 or the $290, 115w opteron 6320? I wont even discuss the different standards when it comes to the tdp rating both companies have, its a matter of cold hard cash.

    AMD will sell cheap, will move faster while listening to clients, will take more risks on niche markets, will leverage their gpu technologies onto the server market to make up for their less than stellar fpu performance.

    How big is the HPC market compared to the SME one?
  • JPForums - Tuesday, June 18, 2013 - link

    I'm not sure this is the correct time, but I do think that eventually we will see a merger or at least closer alignment of the FX line and the A series products. Consider that since before the bulldozer architecture was conceptualized, AMD had been looking to fuse the CPU and GPU into one chip. They wanted to allow people to program code for the "GPU" portions of the processor as easily as the "CPU" portions and even within the same code blocks. They've steadily (if slowly) progressed towards this goal since then culminating in their current HSA and hUMA technologies. When looked at from this perspective, the subpar floating point performance of bulldozer and its derivatives makes sense. If you have a set of "GPU" cores or "stream processors" available to handle floating point operations, then it seems less necessary to include them in the CPU cores.

    Unfortunately, this merger is taking longer than AMD's initial expectations. Even if AMD's intention was to leverage discrete GPU's, in the mean time, to cover the floating point gap, software hasn't yet progressed to the point make it happen. For the moment, a GPUless part is necessary to serve higher performance sectors. Though eventually, I do expect to see GPUish elements in their high end parts to handle parallel operations and possibly augment the floating point characteristics of the processors. At this point, the transistors dedicated to the "GPU" portion will no longer be useless die space in regards to CPU performance. Such processors would have a much easier time with voice recognition, facial recognition, pattern recognition, neural algorithms (A.I. learning), ect.
  • JDG1980 - Tuesday, June 18, 2013 - link

    AMD can't expect third-party code to be rewritten to accomodate their processors. If they can leverage the GPU for floating point, then fine, but it has to work seamlessly with existing CPU opcodes. In other words, the APU has to *internally* see that a stream of (say) SSE2 floating point instructions is coming, and hand that off to the GPU portion, without requiring anything to be recoded.

    AMD doesn't have the market share to tell software vendors to do things their way.
  • Shadowmaster625 - Tuesday, June 18, 2013 - link

    That 2013 picture is some scary schiznit! Last time I went to a concert that was what it was like too. Those screens are right out of some science fiction horror novel. It is amazing what people cannot see, even when it is so plainly obvious.
  • bji - Tuesday, June 18, 2013 - link

    Yeah people are more focused on burying their noses in their phones and capturing the moment than actually living the moment. I don't have a smart phone and I notice that I pay alot more attention to what I'm actually doing than most people most of the time. I don't know why people think it's necessary to record a crappy smart phone recording of an event when you can almost certainly buy a professionally made recording of almost any important event after the fact for a few bucks.
  • silverblue - Tuesday, June 18, 2013 - link

    "Andrew Feldman told us that Berlin will offer at least twice CPU processing performance than the Opteron X-series."

    I'd damn well hope it was a lot more than this. If it's clocked at twice the speed then Berlin will be forgettable, however if the comparison is with Berlin clocked at, say, 3GHz, that's not so bad.

    All non-BD AMD architectures seem to scale very well with additional cores, and this is the main area that SR looks to improve upon.
  • nismotigerwvu - Tuesday, June 18, 2013 - link

    Small typo on page 4, on the very first sentence," The current Opteron 4310 EE (2 modules, 4 cores at 2.2-3 GHz, 40W TDP) and Opteron 4376 HE (4 modules, 89 cores at 2.6-3.6 GHz, 65W TDP) are about the best AMD can deliver for low power servers that need some more processing power." Unless I'm mistaken (which an 89 core chip would be pretty sweet, especially at just 65 watts) that should read 8 core. Otherwise great read Johan.

Log in

Don't have an account? Sign up now