Using a Mobile Architecture Inside a 145W Server Chip

About 15 months after the appearance of the Haswell core in desktop products (June 2013), the "optimized-for-mobile" Haswell architecture is now being adopted into Intel server products.

Left to right: LGA1366 (Xeon 5600), LGA2011 (Xeon E5-2600v1/v2) and LGA2011v3 (E5-2600v3) socket. 

Haswell is Intel's fourth tock, a new architecture on the same succesful 22nm process technology (the famous P1270 process) that was used for the Ivy Bridge EP or Xeon E5-2600 v2. Anand discussed the new Haswell architecture in great detail back in 2012, but as a refresher, let's quickly go over the improvements that the Haswell core brings.

Very little has changed in the front-end of the core compared to Ivy Bridge, with the exception of the usual branch prediction improvements and enlarged TLBs. As you might recall, it is the back-end, the execution part, that is largely improved in the Haswell architecture:

  • Larger OoO Window (192 vs 168 entries)
  • Deeper Load and Store buffers (72 vs 64, 42 vs 36)
  • Larger scheduler (60 vs 54)
  • The big splash: 8 instead of 6 execution ports: more execution resources for store address calculation, branches and integer processing.

All in all, Intel calculated that integer processing at the same clock speed should be about 10% better than on Ivy Bridge (Xeon E5-2600 v2, launched September 2013), 15-16% better than on Sandy Bridge (Xeon E5-2600, March 2012), and 27% than Nehalem (Xeon 5500, March 2009).

Even better performance improvements can be achieved by recompiling software and using the AVX2 SIMD instructions. The original AVX ISA extension was mostly about speeding up floating point intensive workloads, but AVX2 makes the SIMD integer instructions capable of working with 256-bit registers.

Unfortunately, in a virtualized environment, these ISA extensions are sometimes more curse than blessing. Running AVX/SSE (and other ISA extensions) code can disable the best virtualization features such as high availability, load balancing, and live migration (vMotion). Therefore, administrators will typically force CPUs to "keep quiet" about their newest ISA extensions (VMware EVC). So if you want to integrate a Haswell EP server inside an existing Sandy Bridge EP server cluster, all the new features including AVX2 that were not present in the Sandy Bridge EP are not available. The results is that in virtualized clusters, ISA extensions are rarely used.

Instead, AVX2 code will typically run on a "native" OS. The best known use of AVX2 code is inside video encoders. However, the technology might still prove to be more useful to enterprises that don't work with pixels but with business data. Intel has demonstrated that the AVX2 instructions can also be used for accelerating the compression of data inside in-memory databases (SAP HANA, Microsoft Hekaton), so the integer flavor of AVX2 might become important for fast and massive data mining applications.

Last but not least, the new bit field manipulation and the use of 256-bit registers can speed up quite a few cryptographic algorithms. Large websites will probably be the application inside the datacenter that benefits quickly from AVX2. Simply using the right libraries might speed up RSA-2048 (opening a secure connection), SHA-256 (hashing), and AES-GCM. We will discuss this in more detail in our performance review.

Floating point

Floating point code should benefit too, as Intel has finally included Fused Multiply Add (FMA) instructions. Peak FLOP performance is doubled once again. This should benefit a whole range of HPC applications, which also tend to be recompiled much quicker than the traditional server applications. The L1 and L2 cache bandwidth has also been doubled to better cope with the needs of AVX2 instructions.

Introduction Next Stop: the Uncore
Comments Locked

85 Comments

View All Comments

  • coburn_c - Monday, September 8, 2014 - link

    MY God - It's full of transistors!
  • Samus - Monday, September 8, 2014 - link

    I wish there were socket 1150 Xeon's in this class. If I could replace my quad core with an Octacore...
  • wireframed - Saturday, September 20, 2014 - link

    If you can afford an 8-core CPU, I'm sure you can afford a S2011 board - it's like 15% of the price of the CPU, so the cost relative to the rest of the platform is negligible. :)
    Also, s1150 is dual-channel only. With that many cores, you'll want more bandwidth.
  • peevee - Wednesday, March 25, 2015 - link

    For many, if not most workloads it will be faster to run 4 fast (4GHz) cores on 4 fast memory channels (DDR4-2400+) than 8 slow (2-3GHz) cores on 2 memory channels. Of course, if your workload consists of a lot of trigonometry (sine/cosine etc), or thread worksets completely fit into 2nd level cache (only 256k!), you may benefit from 8/2 config. But if you have one of those, I am eager to hear what it is.
  • tech6 - Monday, September 8, 2014 - link

    The 18 core SKU is great news for those trying to increase data center density. It should allow VM hosts with 512Gb+ of memory to operate efficiently even under demanding workloads. Given the new DDR4 memory bandwidth gains I wonder if the 18 core dual socket SKUs will make quad socket servers a niche product?
  • Kevin G - Monday, September 8, 2014 - link

    In fairness, quad socket was already a niche market.

    That and there will be quad socket version of these chips: E5-4600v3's.
  • wallysb01 - Monday, September 8, 2014 - link

    My lord. My thought is that this really shows that v3 isn’t the slouch many thought it would be. An added 2 cores over v2 in the same price range and turbo boosting that appears to functioning a little better, plus the clock for clock improvements and move to DDR4 make for a nice step up when all combined.

    I’m surprised Intel went with an 18 core monster, but holy S&%T, if they can squeeze it in and make it function, why not.
  • Samus - Monday, September 8, 2014 - link

    I feel for AMD, this just shows how far ahead Intel is :\
  • Thermogenic - Monday, September 8, 2014 - link

    Intel isn't just ahead - they've already won.
  • olderkid - Monday, September 8, 2014 - link

    AMD saw Intel behind them and they wondered how Intel fell so far back. But really Intel was just lapping them.

Log in

Don't have an account? Sign up now