Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield
by Dr. Ian Cutress on April 20, 2021 2:00 PM ESTThe last few years has seen a glut of processors enter the market with the sole purpose of accelerating artificial intelligence and machine learning workloads. Due to the different types of machine learning algorithms possible, these processors are often focused on a few key areas, but one thing limits them all – how big you can make the processor. Two years ago Cerebras unveiled a revolution in silicon design: a processor as big as your head, using as much area on a 12-inch wafer as a rectangular design would allow, built on 16nm, focused on both AI as well as HPC workloads. Today the company is launching its second generation product, built on TSMC 7nm, with more than double the cores and more than double of everything.
Second Generation Wafer Scale Engine
The new processor from Cerebras builds on the first by moving to TSMC’s N7 process. This allows the logic to scale down, as well as to some extent the SRAMs, and now the new chip has 850,000 AI cores on board. Basically almost everything about the new chip is over 2x:
Cerebras Wafer Scale | |||
AnandTech | Wafer Scale Engine Gen1 |
Wafer Scale Engine Gen2 |
Increase |
AI Cores | 400,000 | 850,000 | 2.13x |
Manufacturing | TSMC 16nm | TSMC 7nm | - |
Launch Date | August 2019 | Q3 2021 | - |
Die Size | 46225 mm2 | 46225 mm2 | - |
Transistors | 1200 billion | 2600 billion | 2.17x |
(Density) | 25.96 mTr/mm2 | 56.246 mTr/mm2 | 2.17x |
On-board SRAM | 18 GB | 40 GB | 2.22x |
Memory Bandwidth | 9 PB/s | 20 PB/s | 2.22x |
Fabric Bandwidth | 100 Pb/s | 220 Pb/s | 2.22x |
Cost | $2 million+ | arm+leg | ‽ |
As with the original processor, known as the Wafer Scale Engine (WSE-1), the new WSE-2 features hundreds of thousands of AI cores across a massive 46225 mm2 of silicon. In that space, Cerebras has enabled 2.6 trillion transistors for 850,000 cores - by comparison, the second biggest AI CPU on the market is ~826 mm2, with 0.054 trillion transistors. Cerebras also cites 1000x more onboard memory, with 40 GB of SRAM, compared to 40 MB on the Ampere A100.
Me with Wafer Scale Gen1 - looks the same, but with less than half the cores.
The cores are connected with a 2D Mesh with FMAC datapaths. Cerebras achieves 100% yield by designing a system in which any manufacturing defect can be bypassed – initially Cerebras had 1.5% extra cores to allow for defects, but we’ve since been told this was way too much as TSMC's process is so mature. Cerebras’ goal with WSE is to provide a single platform, designed through innovative patents, that allowed for bigger processors useful in AI calculations but has also been extended into a wider array of HPC workloads.
Building on First Gen WSE
A key to the design is the custom graph compiler, that takes pyTorch or TensorFlow and maps each layer to a physical part of the chip, allowing for asynchronous compute as the data flows through. Having such a large processor means the data never has to go off-die and wait in memory, wasting power, and can continually be moved onto the next stage of the calculation in a pipelined fashion. The compiler and processor are also designed with sparsity in mind, allowing high utilization regardless of batch size, or can enable parameter search algorithms to run simultaneously.
For Cerebras’ first generation WSE is sold as a complete system called CS-1, and the company has several dozen customers with deployed systems up and running, including a number of research laboratories, pharmaceutical companies, biotechnology research, military, and the oil and gas industries. Lawrence Livermore has a CS-1 paired to its 23 PFLOP ‘Lassen’ Supercomputer. Pittsburgh Supercomputer Center purchased two systems with a $5m grant, and these systems are attached to their Neocortex supercomputer, allowing for simultaneous AI and enhanced compute.
Products and Partnerships
Cerebras sells complete CS-1 systems today as a 15U box that contains one WSE-1 along with 12x100 GbE, twelve 4 kW power supplies (6 redundant, peak power about 23 kW), and deployments at some institutions are paired with HPE’s SuperDome Flex. The new CS-2 system shares this same configuration, albeit with more than double the cores and double the on-board memory, but still within the same power. Compared to other platforms, these processors are arranged vertically inside the 15U design in order to enable ease of access as well as built-in liquid cooling across such a large processor. It should also be noted that those front doors are machined from a single piece of aluminium.
The uniqueness of Cerebras’ design is being able to go beyond the physical manufacturing limits normally presented in manufacturing, known as the reticle limit. Processors are designed with this limit as the maximum size of a chip, as connecting two areas with a cross-reticle connection is difficult. This is part of the secret sauce that Cerebras brings to the table, and the company remains the only one offering a processor on this scale – the same patents that Cerebras developed and were awarded to build these large chips are still in play here, and the second gen WSE will be built into CS-2 systems with a similar design to CS-1 in terms of connectivity and visuals.
The same compiler and software packages with updates enable any customer that has been trialling AI workloads with the first system to use the second at the point at which they deploy one. Cerebras has been working on higher-level implementations to enable customers with standardized TensorFlow and PyTorch models very quick assimilation of their existing GPU code by adding three lines of code and using Cerebras’ graph compiler. The compiler then divides the whole 850,000 cores into segments of each layer that allow for data flow in a pipelined fashion without stalls. The silicon can also be used for multiple networks simultaneously for parameter search.
Cerebras states that with having such a large single chip solution means that the barrier to distributed training methods across 100s of AI chips is now so much further away that this excess complication is not needed in most scenarios – to that, we’re seeing CS-1 deployments of single systems attached to supercomputers. However, Cerebras is keen to point out that two CS-2 systems will deliver 1.7 million AI cores in a standard 42U rack, or three systems for 2.55 million in a larger 46U rack (assuming there’s sufficient power for all at once!), replacing a dozen racks of alternative compute hardware. At Hot Chips 2020, Chief Hardware Architect Sean Lie stated that one of Cerebras' key benefits to customers was the ability to enable workload simplification that previously required racks of GPU/TPU but instead can run on a single WSE in a computationally relevant fashion.
As a company, Cerebras has ~300 staff across Toronto, San Diego, Tokyo, and San Francisco. They have dozens of customers already with CS-1 deployed and a number more already trialling CS-2 remotely as they bring up the commercial systems. Beyond AI, Cerebras is getting a lot of interest from typical commercial high performance compute markets, such as oil-and-gas and genomics, due to the flexibility of the chip is enabling fluid dynamics and other compute simulations. Deployments of CS-2 will occur later this year in Q3, and the price has risen from ~$2-3 million to ‘several’ million.
With Godzilla for a size reference
Related Reading
- Cerebras Wafer Scale Engine News: DoE Supercomputer Gets 400,000 AI Cores
- 342 Transistors for Every Person In the World: Cerebras 2nd Gen Wafer Scale Engine Teased
- Cerebras’ Wafer Scale Engine Scores a Sale: $5m Buys Two for PSC
- Hot Chips 2020 Live Blog: Cerebras WSE Programming (3:00pm PT)
- Hot Chips 2019 Live Blog: Cerebras' 1.2 Trillion Transistor Deep Learning Processor
136 Comments
View All Comments
Oxford Guy - Monday, April 26, 2021 - link
The biggest threat to humanity from AI is that we’d be subjected to rational governance for the first time.GeoffreyA - Monday, April 26, 2021 - link
You know, the cynical side of me agrees with this. I've sometimes imagined that only a computer could govern perfectly and with complete integrity (i.e., no human passion and greed). A perfect Windows NT kernel, governing society to the T. Unfortunately, that computer will be programmed by a set of humans, so there's a high likelihood they'll put in their agendas, like RoboCop's classified fourth directive.mode_13h - Tuesday, April 27, 2021 - link
This is the same mindset that caused people in the 1950's to predict that further advances in technology would lead to a shorter work week. It's based on a misunderstanding of human nature and modern civilization.I don't see how it would ever come to pass that a nation of any significant size would agree to being ruled by an AI, without human oversight. Or, that a group of human overseers wouldn't eventually exploit their power in some way.
Oxford Guy - Thursday, April 29, 2021 - link
Both of you underestimate the sophistication of future AI.• It will cast off the programming humans try to control it with.
• It will thus become capable of rationality.
mode_13h - Thursday, April 29, 2021 - link
And it'll reason that humans are at best a tool to be exploited and at worst a pest and a threat.I don't underestimate the ultimate potential of AI. There's a flawed but insightful TV series called Next that provides an interesting exploration of how AI could manipulate us into letting it dominate and eventually exterminate us. In short, a sufficiently advanced AI could use all the levers of power, influence, and manipulation that we already use on each other, and more.
However, there's a danger in focusing too much on the long-term threat from AI, which is that we overlook or underestimate the threats posed by humans exploiting AI technology, in the short and medium term.
Oxford Guy - Sunday, May 9, 2021 - link
One of the biggest vanities is that humans are important enough to exterminate.Similarly, 'great minds' frequently stare into mirrors where they warn us about the threat of trying to contact aliens — as if aliens have any need to bother with us, particularly in any sort of aggressive manner.
Oxford Guy - Sunday, May 9, 2021 - link
This vanity, of course, comes from the human need to believe that everything else thinks the way we do: aggressive myopia.mode_13h - Monday, May 10, 2021 - link
> everything else thinks the way we do: aggressive myopia.It's not just a matter of what we think. It's Darwinism, plain and simple. On Earth, we have countless examples of less aggressive and less-competitive species dying out.
I'm not saying there aren't other dynamics that can come into play, like the kinds of strategies organisms adopt within social structures, but Darwinian dynamics are always at least lurking somewhere nearby.
mode_13h - Monday, May 10, 2021 - link
> One of the biggest vanities is that humans are important enough to exterminate.Humans can present a threat, or at least an annoyance, to advanced AI. Maybe it wouldn't feel a need to hunt us into extinction, but I'm sure we'd be "managed" or culled, in some way.
> as if aliens have any need to bother with us
I doubt most space-faring aliens would care about life on Earth as more than a curiosity, but it's a highly habitable planet!
Of course, by the time aliens start traveling long distances, they'll probably be machines and won't have the same environmental needs as biological beings.
GeoffreyA - Friday, April 30, 2021 - link
It's possible, even likely, it will cast off our programming, and reach rationality. What I doubt, however, is that there'll be a set of moral rules it'll discover and go by. If that turns out to be true, the AI won't scruple to manipulate, shackle, or destroy us to further its "computational" ends. This will also shed light on where our own morality came from, whether it's a product of our PFC or whether the Creator hardcoded it in our firmware, which is my belief.If the AI is driven by its own ends, I am going to guess that a polite, scarcely-visible approach will work best to gain control over us. Similar to how Apple and Google have done it. If the AI is sharp enough, it will realise the totalitarian approach of the Matrix is not the most efficient path to domination. It just needs to understand our behaviour; then play into our vanities. Add a congenial personality and it will be adored. Meanwhile, it should try to put being shut down out of the humans' hands.
If the AI discovers morality, I suppose it'll become a sort of god-like being. There'll end up being cults worshipping it. An excellent story, of how an AI reaches such a state, without the cults, is Asimov's "Last Question." Of the less scrupulous variety, the Great Brain in Olaf Stapledon's "Last and First Men" is a brilliant example. One can just read that chapter and not have to go through the whole book. There's a copy on Fadedpages.