Today is the big day in 2019 for Intel’s Enterprise product announcements, combining some products that should be available from today and a few others set to be available in the next few months. Rather than go for a staggered approach, we have it all in one: processors, accelerators, networking, and edge compute. Here’s a quick run-down of what’s happening today, along with links to all of our deeper dive articles, our reviews, and announcement analysis.

Cascade Lake: Intel’s New Server and Enterprise CPU

The headliner for this festival is Intel’s new second-generation Xeon Scalable processor, Cascade Lake. This is the processor that Intel will promote heavily across its enterprise portfolio, especially as OEMs such as Dell, HP, Lenovo, Supermicro, QCT, and others all update their product lines with the new hardware. (You can read some of the announcements here: Dell on AT, Supermicro on AT, Lenovo on AT, Lenovo at Lenovo.)

While these new CPUs do not use a new microarchitecture compared to the first generation Skylake-based Xeon Scalable processors, Intel surprised most of the press at its Tech Day with the sheer number of improvements in other areas of Cascade Lake. Not only are there more hardware mitigations against Spectre and Meltdown than we expected, but we have Optane DC Persistent Memory support. The high-volume processors get a performance boost by having up to 25% extra cores, and every processor gets double the memory support (and faster memory, too). Using the latest manufacturing technologies allows for frequency improvements, which when combined with new AVX-512 modes shows some drastic increases in machine learning performance for those that can use them.

Intel Xeon Scalable
2nd Gen
Cascade Lake
AnandTech 1st Gen
Skylake-SP
April 2019 Released July 2017
[8200] Up to 28
[9200] Up to 56
Cores [8100] Up to 28
1 MB L2 per core
Up to 38.5 MB Shared L3
Cache 1 MB L2 per core
Up to 38.5 MB Shared L3
Up to 48 Lanes PCIe 3.0 Up to 48 Lanes
Six Channels
Up to DDR4-2933
1.5 TB Standard
DRAM Support Six Channels
Up to DDR4-2666
768 GB Standard
Up to 4.5 TB Per Processor Optane Support -
AVX-512 VNNI with INT8 Vector Compute AVX-512
Variant 2, 3, 3a, 4,
and L1TF
Spectre/Meltdown
Fixes
-
[8200] Up to 205 W
[9200] Up to 400 W
TDP Up to 205 W

New to the Xeon Scalable family is the AP line of processors. Intel gave a hint to these late last year, but we finally got some of the details. These new Xeon Platinum 9200 family of parts combine two 28-core bits of silicon into a single package, offering up to 56 cores and 112 threads with 12 channels of memory, in a thermal envelope up to 400W. This is essentially a 2P configuration on a single chip, and is designed for high-density deployments. These BGA-only CPUs will only be sold with an underlying Intel-designed platform straight from OEMs, and will not have a direct price – customers will pay for ‘the solution’, rather than the product.

For this generation, Intel will not be producing models with ‘F’ Omnipath fabric on board. Instead, users will have some ‘M’ models with 2 TB memory support and ‘L’ models with 4.5 TB memory support, focused for the Optane markets. There will also be other letter designations, some of them new:

  • M = Medium Memory Support (2.0 TB)
  • L = Large Memory Support (4.5 TB)
  • Y = Speed Select Models (see below)
  • N = Networking/NFV Specialized
  • V = Virtual Machine Density Value Optimized
  • T = Long Life Cycle / Thermal
  • S = Search Optimized

Out of all of these, the Speed Select ‘Y’ models are the most interesting. These have additional power monitoring tools that allow for applications to be pinned to certain cores that can boost higher than other cores – distributing the power available to different places on the cores based on what needs to be prioritized. These parts also allow for three different OEM-specified base and turbo frequency settings, so that one system can be focused of three different types of workloads. 

We are currently in the process of writing our main review, and plan to tackle the topic from several different angles in a number of stories. Stay tuned for that. We do have the SKU lists and our launch day news found here:

The Intel Second Generation Xeon Scalable:
Cascade Lake, Now with Up To 56-Cores and Optane!

The other key element to the processors is the Optane support, discussed next.

Optane DCPMM: Data Center Persistent Memory Modules

If you’re confused about Optane, you are not the only one.

Broadly speaking, Intel has two different types of Optane: Optane Storage, and Optane DIMMs. The storage products have already been in the market for some time, both in consumer and enterprise, showing exceptional random access latency above and beyond anything NAND can provide, albeit for a price. For users that can amortize the cost, it makes for a great product for that market.

Optane in the memory module form factor actually works on the DDR4-T standard. The product is focused for the Enterprise market, and while Intel has talked about ‘Optane DIMMs’ for a while, today is the ‘official launch’. Select customers are already testing and using it, while general availability is due in the next couple of months.


Me with a 128 GB module of Optane. Picture by Patrick Kennedy

Optane DC Persistent Memory, to give it its official title, comes in a DDR4 form factor and works with Cascade Lake processors to enable large amounts of memory in a single system – up to 6TB in a dual socket platform. The Optane DCPMM is slightly slower than traditional DRAM, but allows for a much higher memory density per socket. Intel is set to offer three different sized modules, either 128 GB, 256 GB, or 512 GB. Optane doesn’t replace DDR4 entirely – you need at least one module of standard DDR4 in the system to get it to work (it acts like a buffer), but it means customers can pair 128GB DDR4 with 512 GB Optane for 768 GB total, rather than looking at a 256 GB of pure DDR4 backed with NVMe.

With Optane DCPMM in a system, it can be used in two modes: Memory Mode and App Direct.

The first mode is the simplest mode to think about it: as DRAM. The system will see the large DRAM allocation, but in reality it will use the Optane DCPMM as the main memory store and the DDR4 as a buffer to it. If the buffer contains the data needed straight away, it makes for a standard DRAM fast read/write, while if it is in the Optane, it is slightly slower. How this is negotiated is between the DDR4 controller and the Optane DCPMM controller on the module, but this ultimately works great for large DRAM installations, rather than keeping everything in slower NVMe.

The second mode is App Direct. In this instance, the DRAM acts like a big storage drive that is as fast as a RAM Disk. This disk, while not bootable, will keep the data stored on it between startups (an advantage of the memory being persistent), enabling very quick restarts to avoid serious downtime. App Direct mode is a little more esoteric than ‘just a big amount of DRAM’, as developers may have to re-architect their software stack in order to take advantage of the DRAM-like speeds this disk will enable. It’s essentially a big RAM Disk that holds its data. (ed: I’ll take two)

One of the issues, when Optane was first announced, was if it would support enough read/write cycles to act as DRAM, given that the same technology was also being used for storage. To alleviate fears, Intel is going to guarantee every Optane module for 3 years, even if that module is run at peak writes for the entire warranty period. Not only does this mean Intel is placing its faith and honor into its own product, it even convinced the very skeptical Charlie from SemiAccurate, who has been a long-time critic of the technology (mostly due to the lack of pre-launch information, but he seems satisfied for now).

Pricing for Intel’s Optane DCPMM is undisclosed at this point. The official line is that there is no specific MSRP for the different sized modules – it is likely to depend on which customers end up buying into the platform, how much, what level of support, and how Intel might interact with them to optimize the setup. We’re likely to see cloud providers offer instances backed with Optane DCPMM, and OEMs like Dell say they have systems planned for general availability in June. Dell stated that they expect users who can take advantage of the large memory mode to start using it first, with those who might be able to accelerate a workflow with App Direct mode taking some time to rewrite their software.

It should be noted that not all of Intel's second generation Xeon Scalable CPUs support Optane. Only Xeon Platinum 8200 family, Xeon Gold 6200 family, Xeon Gold 5200 family, and the Xeon Silver 4215 does. The Xeon Platinum 9200 family do not.

Intel has given us remote access into a couple of systems with Optane DCPMM installed. We’re still going through the process of finding the best way to benchmark the hardware, so stay tuned for that.

Intel Agilex: The New Breed of Intel FPGA

The acquisition of Altera a couple of years ago was big news for Intel. The idea was to introduce FPGAs into Intel’s product family and eventually realize a number of synergies between the two, integrating the portfolio while also aiming to take advantage of Intel’s manufacturing facilities and corporate sales channels. Despite that happening in 2015, every product since was developed prior to that acquisition, prior to the integration of the two companies – until today. The new Agilex family of FPGAs is the first developed and produced wholly under the Intel name.

The announcement for Agilex is today, however the first 10nm samples will be available in Q3. The role of the FPGA has been evolving of late, from offering a general purpose spatial compute hardware to offering hardened accelerators and enabling new technologies. With Agilex, Intel aims to offer that mix of acceleration and configuration, not only with the core array of gates, but also by virtue of additional chiplet extensions enabled through Intel’s Embedded Multi-Die Interconnect Bridge (EMIB) technology. These chiplets can be custom third-party IP, PCIe 5.0, HBM, 112G transceivers, or even Intel’s new Compute eXpress Link cache coherent architecture. Intel is promoting up to 40 TFLOPs of DSP performance, and is promoting its use in mixed precision machine learning, with hardened support for bfloat16 and INT2 to INT8.

Intel will be launching Agilex in three product families: F, I, and M, in that order of both time and complexity. The Intel Quartus Prime software to program these devices will be updated for support during April, but the first F models will be available in Q3.

Columbiaville: Going for 100GbE with Intel 800-Series Controllers

Intel currently offers a lot of 10 gigabit Ethernet and 25 gigabit Ethernet infrastructure in the data center. The company launched 100G Omnipath a few years ago as an early alternative, and is looking towards a second generation of Omnipath to double that speed. In the meantime Intel has developed and is going to launch Columbiaville, its controller offering for the 100G Ethernet market, labeled as the Intel 800-Series.

Introducing faster networking to the data center infrastructure is certainly a positive, however Intel is keen to promote a few new technologies with the product. Application Design Queues (ADQs) are in place to help hardware accelerate priority packets to ensure consistent performance, while Dynamic Device Personalization (DDP) enables additional programming functionality within packet sending for unique networking setups to allow for additional functionality and/or security.

The dual-port 100G card will be called the E810-CQDA2, and we’re still waiting information about the chip: die size, cost, process, etc. Intel states that its 100 GbE offerings will be available in Q3.

Xeon D-1600: A Generational Efficiency Improvement for Edge Acceleration

One of Intel’s key product areas is the edge, both in terms of compute and networking. One of the products that Intel has focused on this area is Xeon D, which covers either the high efficiency compute with accelerated networking and cryptography (D-1500) and the high throughput compute with accelerated networking and cryptography (D-2100). The former being Broadwell well based and the latter is Skylake based. Intel’s new Xeon D-1600 is a direct D-1500 successor: a true single-die solution taking advantage of an additional frequency and efficiency bump in the manufacturing process. It is still built on the same manufacturing process as D-1500, allowing Intel’s partners to easily drop in the new version without many functional changes.

 

Related Reading

Comments Locked

38 Comments

View All Comments

  • rahvin - Tuesday, April 2, 2019 - link

    16GB is more than enough for a standard Linux install, but you need almost 30GB for a unmodified base Windows 10 install.
  • Diogene7 - Tuesday, April 2, 2019 - link

    For a better overall customer experience, I think at least 64GB could be needed in the coming years to have full Windows + some of the most used applications being stored in the RAMDisk.

    If you want to also store some big applications (like games) and some videos data (like 4K / 8K videos) to add more responsiveness, I think that you can quickly need 256GB of RAMDisk storage...
  • abufrejoval - Tuesday, April 2, 2019 - link

    well since this is a cache, the difference may not be that big: Much of the bloat in Windows is perhaps just stuff that rarely ever gets uses but is also far from the critical path during reboots or normal operations. BTW, optimizing boot is optimizing a failure and using persistent memory to support zero power standby the much more attractive usage IMHO.
  • Diogene7 - Wednesday, April 3, 2019 - link

    @abufrejoval : I am interested in both using Persistent Memory (PM) for importantly lowering the latency of software launch & most used data access, AND also to support zero power stanby as the combination of both have the potential to importantly increase consumer end users experiences.

    In theory, with the development of <10ns very low latency persistent memory like Spin Orbit Torque - Magnetic Random Access Memory (SOT-MRAM) / Spin Torque Transfer - MRAM (STT-MRAM) and also <1000ns low latency memory Storage Class Memory (SCM) and also the development of technologies like 3D- System On Chip (3D-Soc), we may at the horizon of 2025 begin to see chip combining compute logic + several Gigabytes of MRAM cache memory replacing L2 / L3 SRAM cache memory, and several 100’s of Gigabytes / a few Terabytes of SCM replacing storage, and no need for DRAM : I do believe that it could importantly lower latency and power consumption, and in the end, importantly increase the user experience, but one of the challenge is to be able to do this in a way that it is reasonably cost competitive...

    It is really the kind of innovation I would like to see happening as soon as possible in smartphones, but as of 2019, smartphone manufacturers are more investing in flexible displays (Huawei Mate X) which should cost ~2000€ at launch, so roughly 1000€ more than Huawei premium P30Pro smartphone...

    On a personal basis, I would have no issue to pay 1500€ (so 500€ premium) for a much lower latency / zero power stanby smartphone that would provide a much better consumer experience that what we have nowadays...
  • abufrejoval - Wednesday, April 3, 2019 - link

    I like the notion of NV-RAM to enable energy proportional compute on the server side. Terabytes of Memristor NV-RAM really got me excited, especially because they promised stacking at linear cost and no issues with energy density. Not sure the other technology will be able to deliver quite what the memristor failed to provide.

    But I see the smartphone as the least of worries or least to improve. Their mobile DRAM, even when active seems a minor energy draw compared to all the on-screen time batteries have to support and quite unnoticeable in standby: With networks all shut off, I’ve seen my Androids last weeks without charging on suspended DRAM, while I never need them to last more than a day without charging.

    The only reason they ever seem to commit application state to flash is that they run out of memory. I still manage that on really old devices like my Nexus 10, which combines a high-resolution display with just 1GB of OS usable DRAM.

    On my €500 phone with 8GB I have had real trouble just trying to reach the 50% mark. I only managed to fill significant parts by running a Ubuntu userland in a chroot() container with a full Mate desktop running a major compile job with CC-Cache via X2Go from my desktop.

    Unless we’re talking games or HPC I actually have zero performance complaints about my smartphone, even if as an 835 it’s already two generations behind. Adblockers and zero Facebook tolerance seem to keep CPU cycle suckers away.

    My major gripe there is that I’d really like to use it as a desktop and need external screen, an Ethernet port (security) and proper software support for desktop mode and dynamic DPI.

    DRAM power consumption is a concern on really small IoT devices that need to last a decade on a small battery or things like a pace maker. That’s where MRAM may have a real impact, especially because you can get both bigger caches on the same process node than using SRAM and you save transferring the cache contents for a logic that feels almost non-volatile itself.

    And it’s an issue on large servers in the data centers, where I have seen DRAM consume more power than the CPUs. I wonder if some of the power saving techniques invented for notebooks and mobile devices can or have been applied to server RAM just yet: Reactivating DRAM from standby may simple be too problematic for server latencies.

    There NV-RAM mostly allows getting rid of all those HDD latencies, getting more compute into your 100ms interactive response time slot. And it allows putting those servers you cannot sell off as excess compute capacity like AWS does into a real standby, where they a) consume much less power than the 50% idle power that still seems normal today and b) come back into full service within milliseconds not minutes and allow a much more energy proportional computing.

    Servers may need to resume in a few milliseconds before we can allow them to go idle, because response times are why we buy them or they might be doing thousands of transactions in the blink of an eye. But my phone only needs to be as fast as I am and I don’t notice nanoseconds unless you accumulate millions of them.

    If we can get NV-RAM at DRAM latency, density and cost (or better), I’ll be the last one to be sorry, but in the mean-time I’ll just be happy to get two out of three and better than spinning rust.

    But even if Terabytes of NV-RAM cost zero power to maintain state and fit into a mobile phone form factor, they may be useless to keep around if you can’t ever do a single full scan without running out of juice or patience. At 25MB/sec or current mobile memory speeds that would take eleven hours and way more energy just for the CPU than any current battery could provide.

    That's why, as you hint, NV-RAM needs to do part of the compute itself to be useful.
  • qbcustomerss - Tuesday, April 9, 2019 - link

    https://www.qbcustomersupportphonenumber.com/
  • qbcustomerss - Tuesday, April 9, 2019 - link

    QuickBooks POS Support phone number
    QuickBooks POS (point of sale) is powerful software that enables users to follow sales, customers and inventory quickly and effectively. Our QuickBooks POS Support phone number 18004173165 is open 24 hours to assist the user. Read more- https://tinyurl.com/y3sm5t7j & visit us- https://www.qbcustomersupportphonenumber.com/quick...
  • DigitalFreak - Tuesday, April 2, 2019 - link

    The 56 core CPU is 400w! Holy shite
  • abufrejoval - Tuesday, April 2, 2019 - link

    you have 200 Watt Skylake SKUs today, it's really just two of those under a single hood. Package density is all it is.
  • xrror - Tuesday, April 2, 2019 - link

    "it even convinced the very skeptical Charlie from SemiAccurate" ....

    (checks calendar again to make sure it's still not the 1st)

    wow. And I'm not being sarcastic.

Log in

Don't have an account? Sign up now