call me dumb, but why not create a round cpu, filling the entire wafer?! or half moon shaped for two per wafer? the quadratic design seems rather wasteful...
Because you need to make it using tiles and there are only so many regular tiles that can fill the surface without overlap or gaps. And rectangle is the most practical. I heard some arguments for hexagons but I don't see that happening.
Go to google images and search for "Die shot". A core is made up of a lot of little logic and cache areas. When you start combing those together to then etch the entire die its much easier to pack them if they are square/rectangular. Unfortunately you can easily make square wafers either. So there is some waste but it gets recycled.
It needs to be pointed out that the chip costs about 5m while each processed wafer is about 10k. Even if you halve price of the chip and double it of the silicon, we are still looking at silicon costs at 1% of the chip price. Insignificant.
Second, this waste isn't so large - from the calculation of the wafer area and this huge chip size given in this article, about 2/3 is useful and 1/3 of the wafer gets wasted. Not too bad.
Finally, at once you illuminate 23x33mm - reticle limit. The chip is made by tiling such structures over the whole wafer - whereas say A100 has each individual chip just under the reticle limit. Stuff that is partially beyond the wafer is most likely dead anyway, so at most there would be quite few extra reticle-sized features present. But this is maybe 5%. Probably discarded for simplicity in both dicing (no need to do anything fancy) as well as communication (other cores have 2 neighbors at least, some of these may not).
At low volume the majority of cost comes from one-time expenditure such as EDA software and lithography masks. For 7nm this will cost tens of millions, or more than 1m per unit actually.
TSMC's reticles are rectangular so it has to be made of the rectangles you see in the pictures. However, I'm not sure why they don't have additional rectangles in the middle of the sides. Maybe the rectangles are so big that none would fit.
Because in that case the individual dies would also need to be round, and that would be impractical at best, impossible at worst. Dies have always been rectangular shaped for a very good reason. While squaring the circle is possible (with enough squares) circling the squares is not possible, at least in this case. Making the silicon wafers themselves square rather than round would be another idea, but it is also impractical due to the way silicon ingots are "grown" from "seeds" using crucibles.
Thanks Ian! One thing that I am wondering about is whether TSMC is considered a "trusted foundry" by the US government? Isn't this device (calling it a chip is somewhat insulting) under significant export and sales restrictions? AFAIK, it can and is also used or can also be used for extremely high-throughput cryptography and analysis, and for simulations of, for example, new designs of nuclear warheads. That kind of tech is typically not freely sold, even if one can otherwise afford to pay the arms/legs pricetag.
Hi Eastcoast Pete, I can answer this. While they this could be used for dangerous reasons. Chips already exist out there that can perform cryptography. For example all the nvidia chips right now are sold out because people are using them for bitcoin mining which is kinda like cryptography. So to sum it up, it's really bullshit that I can't buy a nvidia card right now.
That would be really interesting, seeing the best tactics being shaped. What if Camping is the best? What if it finds out how to overheat the GPU from other players? Seeing how inventive IA can be in playing digital hide and seek.
that way we could find the next crisis, that would honestly be really cool, like if you completed the first Crysis then there was like a mini Crysis 2 or a gameplay timeline u could play in that led up to Crysis 2 like a prologue or epilogue or whatever its called.
I don't think Cerebras would have announced this product without a commitment from TSMC. Cerebras only needs a miniscule fraction of TSMC's manufacturing capacity.
TSMC's estimated monthly 7nm wafer output for the end of last year was 140,000. I'm sure Cerebras can outbid all other customers for the odd 100-200 wafers over the next year.
No... I'm pretty sure kidneys are *only* in the tens of $k range. And yes, there are places where it's legal to buy and sell organs. No, I'm not an expert -- that's about the extent of my knowledge on the subject.
> It isn't an arm and a leg.
It really depends on whose! Some athletes have their limbs insured for $Millions. I'm certain of that.
However, if you look at the reimbursement tables for Accidental Death & Dismemberment insurance (i.e. the kind of AD&D that you *don't* want to play!), I seem to recall that an arm or leg is only worth a tens of $k.
Aluminium front doors... even spoiled fruit companies have that. I'd still go with the practical approach of the early Crays, where you didn't have to spend extra for a couch.
Liquid cooling on a wafe this size could also make for a nice aquarium and with some crystals and RGB lights to distort the bubbles, you could empty your brain quite easily: after all you got something else now to do the hard work...
The wafer scale approach: I'm pretty sure it will catch on now, even for some more classical HPC computing that lends itsself to rather regular structures or something like Micron's Automata processor (or the Connection Machine). Or just imagine a wafer full of Tilera cores: at current process sizes these cores might be so small, that losing one core out out of ten thousands per defect on the wafer, might not be much of an issue.
Competitors will have to get around their patents for inter-reticular connections and that will be very, very difficult. They are the key to the whole thing. Everyone makes wafers full of chips. No one else interconnects them during manufacturing.
if you can't solve the problem with different solution then license the relevant patents, won't be cheap but if it's the future of data centers processors then it will be worth it.
It must be available to sell first and Cerebras might not be interested in losing their monopoly on that technology. What is more likely to happen in that scenario is they will want all kinds of money for a license so their suitor will just say, screw that - we'll buy you entirely.
> I'm pretty sure it will catch on now, even for some more classical HPC computing
The article said they had some customers looking to use it for classical HPC problems. I just wonder what kind of arithmetic it supports, though. I doubt they wasted a bunch of silicon on fp64.
> Tilera cores
Tilera. lol. They were too soon, and yet not soon enough. Anyway, you'd be better off with a standard ISA, like ARM or RISC V.
I don't know if this will catch on so broadly. It's really oriented towards dataflow processing or algorithms that need extremely high-bandwidth inter-node communication, yet relatively little local memory.
> Aluminium front doors... even spoiled fruit companies have that.
I don't know why that was even mentioned, unless they were trying to make the point that it wouldn't be too heavy.
> I'd still go with the practical approach of the early Crays
The only time I ever touched something in a museum was to see if its couch seemed comfortable to sit on. So, I reached across the rope and poked it with my index finger. Such a wayward teenager I was.
Ian, I am surprised that their density is so low. Apple, QC, Huawei all achieved around 90MTr/mm^2 on that process.
I would imagine that this is not a design that is chasing frequency, so it's going to be using the smaller lower power transistors. What explains the difference? Not enough personnel and time to really optimize the layout? Or they are more limited by metal and communications than most of the parts of a phone SoC?
There are different transistor libraries. The high density libraries pack more transistors into a smaller area, while the high performance libraries enable a higher frequency. Phone SoCs use density, as they have neither the are not the power for high performance libraries. Desktop processors on the other hand use the high performance libraries to clock the transistors higher. According to WikiChip, TSMC N7 high density library packs 91MT/mm² and its high performance library packs 65MT/mm². The Cerebras accelerator is a bit lower still. Bear in mind however, that it has to connect the individual chips on the wafer together, and that might be a bit more wasteful than normal chips.
On top of that, not all ICs parts scale equally, so even with high density library not all pieces are packet at 91MT/mmq. If you use many of this pieces inside your layout, it may come that the total average density is not the maximum possible.
This also has proportionally higher SRAM—much, much higher.
The Apple A13 has 28 MB total SRAM (big, little, SLC) and you can fit roughly ~500 onto a 12” wafer. Thus, an A13 wafer has 14 GB SRAM.
This has 40 GB SRAM, nearly 3x, onto the same wafer. SRAM doesn’t use the transistor budget as tightly as logic, relatively, so having much more SRAM means a lot fewer transistors per mm2.
Modern silicon cpu type processes are all on 300mm wafers. Smaller wafers are only used for non-Si products and cases where old-school very large feature sizes are needed. The former due to lack of sufficient demand to scale up to 300mm, the latter because they're using decades old processes more or less unchanged.
The claim (which I consider dubious and so far unvalidated) is that SRAM density is SCALING worse than logic for 5nm; not that SRAM is less dense than logic. Far from it!
TSMC 7nm SRAM cell is .027µm^2. That's 37M cells/sqmm. And remember an SRAM cell is either 6 or 8 transistors depending on the design (I would guess 6 for these most dense versions). Compare that to 90MTr/sqmm for random logic.
BTW (not important for the point, but of general interest) there is a lot more cache on an Apple SoC than the cache you listed. All the obvious other large elements (ISP, GPU, NPU) have cache, and not only do they have cache, Apple has a patent on the idea of the CPU using their cache as an extension of the SLC if those blocks are not in use! (The performance boost is nice, but more important to Apple, I suspect, is that power-wise onchip storage is cheaper than going to DRAM.)
The issue of scribe areas having to be left close to empty as part of the specific design, the business of "whole wafer chip" is a reasonable point, but I can't see it being that large (I would guess rather less than 5%). So I also don't see that as a convincing explanation.
My best guesses: Much higher proportion of SRAM than the average phone SoC Higher proportion of interconnects Deliberate lowering of density for yield/heat dissipation reasons
I was just wondering how good this HW is if your dataset is bigger than the allowed resources the single wafer offers. I mean, with the "traditional" segmented ICs, you already have to think your work split into small pieces, and all HW ICs are thought to work as fast as they can while sharing all your pieces of data. The great advantage of this kind of mega-processor is that you do not care about splitting your dataset and I have doubts this monster can talk fast enough to a neighbor twin to share all the needed data with appropriate latency. So, what happens if my (huge) model needs more than 850,000 cores or the available RAM available on the single wafer? How well does this monster scale?
It is funny to see so different, even opposite, approach to the same problem. Nvidia with its new Grace architecture is doing the opposite, just trying to make small independent but parallelized tiny pieces of HW scale linearly for an (theoretical) infinite computational capacity. Here the idea is "the more and closer together cores sharing the available (limited) resources the better". BTW, none of them is really giving its architecture away in term of price on the market.
They're talking about having 2 or 3 of these per rack, so I assume they have a solution, and in theory you could still get better performance from splitting a workload into two (or three) and running it across multiple nodes than splitting it many, many times.
But it also would make sense that something with a truly vast dataset would probably benefit more from systems that are more memory-heavy. I guess it's going to vary depending on each workload's compute/memory/latency requirements.
If (and if) Nvidia can achieve linear scalability though Grace, they would take over this "whole wafer Behemoth" approach without problems, as they already need to take care for splitting the work in smaller chunks while the selling point of this approach is that no splitting is really needed. And they can grant more bandwidth per core than this whole single wafer can.
Cerebras is making a major play around keeping the bulk of the data in-place, while Nvidia is doing a tremendous amount of work to go off-chip and off board to be able to fetch it. As long as your data size and access patterns fit Cerebras' solution, theirs is vastly more efficient.
Remember: it's a dataflow processor. So, you scale by extending your pipeline onto additional wafers and just piping data from one to the next. As long as your problem is pipelinable and doesn't need random-access to more data than will fit in each node's local SRAM, this architecture will scale wonderfully!
Conversely, its achiles heel is random access to large datasets, especially > 40 GB. If that's what you need, it probably isn't the right architecture for you.
Speaking of which, how much is known about the interface to off-wafer memory? Does it even have any DRAM controllers, or does it have to traverse PCIe or 100 G Ethernet?
The connection part with other wafer is the critical point I was taking about. You can pipeline these monsters as long as you can feed the following wafer with enough data (and hoping the data has not to come back again, or you need double the bandwidth and are subject to latency a lot). As they tall only about 100Gb connection I was under the impression that have a quite limited bandwidth communication outside the wafer ad with quite a lot of latency with respect to the "classic" multi die, many core, many buses approach.
Well, it's 12x 100 Gbps. Anyway, I think one of those links is probably more than enough to extend the pipeline to another wafer.
And the beauty of a pipeline is that latency doesn't constrain throughput, so long as there's no feedback, or there's sufficient buffering to avoid stalling on the feedback. Cerebras' graph compiler is probably smart enough to try to keep the feedback paths on-wafer, if possible.
Very impressive work. Way, way over my head, so I'll comment on the only thing I half-understand. 23kW of power for a single board? That's enough power for a short street of houses and a row of electric tea kettles. And it all goes (mostly) into a single 8x8" wafer? That's a beefy cooler.
This reminds me of Rex Computing's architecture. Has anyone heard any updates about Rex and their status? They don't seem to have done anything, and they were so open about what they want to do that it seemed like Intel would see them coming from miles away.
They're not actually too similar to Cerebras since theirs is not an AI chip specifically, and their core count was only in the 128 ballpark. But it's a clean new arch, not Arm or RISC-V or anything lame like that, and it has a profoundly different caching model. Basically, no cache, just tiny "scratchpad" memory on each core.
I also wonder about the Mill CPU. Both companies seem to have moved far too slowly to be successful, which is a bummer. The world needs a lot more innovation in computing.
I never read The Intercept, nor am I familiar with them, but I suppose he meant that their commentors sounded more like bots, echoing the site's party line and sentiments.
100% yield doesn't mean 100% perfection - it has yield-tolerance built in. 100% yield means that they can guarantee one functional WSE per wafer, not that said WSE has no defects.
In theory I guess they could still lose a whole wafer due to some other issue (power outage at the fab half-way through) but it would be unfair to register that as a yield issue.
To make others' answer easy to understand, just think that they can exclude a defective core out of the mesh without losing any functionality but just the lost of that core (so just loosing some performance). On a potential 850K core wafer you can lose probably some thousands of them for any production defect. That would create tiers of wafers with 800K, 750K, 700K and descending number working cores. You can sell these wafers at different prices, and seen that a wafer costs about 20K$, selling one with the lower available cores for a couple of million$ is still a big gain nonetheless. This is what Intel, AMD, Nvidia and other companies do when they bin their products. Not every one is perfectly working so they create different series based on how many defect a product has: see AMD and Nvidia GPUs offers to have an idea. Some are so defective that are just thrown away. These are those dies that lower the yields. A defective part for them is just a small piece of wafer that is wasted and doesn't concur to the profit. At the end it is the same thing... Intel, AMD and Nvidia can throw some pieces away to sell the working ones at different prices and gain only from working dies. The final resulting gain is only a percentage of the potential they could have if ALL dies resulted as perfect. Cerebras sells all the wafers they produce at different prices depending on the amount of the defects each has and still they gain only a percentage of the total maximum potential they could if all the wafers would result perfect. Saying 100% yield here is just marketing. At the end of the day the amount of working silicon is similar for Cerebras as it is for anyone producing at 7nm. The real skill is selling each mm^2 of this working silicon at the highest price possible for maximizing the gross profits.
No they sell all WSC at the same amount of cores (850k), each wafer has more cores to start with but all defective cores are disabled. If they also disable the surplus working cores to get exactly the same amount of working cores they have not stated but it could be.
It would be absolutely amazing if you ever got hands-on access to one of these systems - I'd be extremely interested in seeing how it's put together, particularly how they cool this monstrous chip. I assume some sort of direct-die cooling, but that cold plate ... that thing must be damn impressive.
Oppressive indeed. I suppose we'll just have to send Sarah Connor and Arnie in there to do their stuff. Ironically, Cerebras doesn't sound too far from "Cyberdyne."
Part of me can't wait for the advent of machine consciousness but another part worries about the dangers. I reckon it'll never be like Terminator or Matrix paints, but rather they might excel and beat us across the board, rendering us "obselete." We'll really be the stone-age human brain. Eniac competing with Epyc.
In the near term, we have a lot more to fear from AI being used by the powerful to optimize, manipulate, and oppress societies for their own gain.
Taking the extreme case of that, I see certain countries treating AI as the key component of "authoritarianism in a box", and exporting it to large parts of the developing (and developed!) world.
You're right. As it is, we're being manipulated left, right, and centre. As the power grows, so will the misuse of it. The trick is, not letting the oppressed know they're in chains and giving them the illusion of choice. No stun batons or Civil Protection needed. Rather, a more up-to-date, Brave New World style. At the forefront of such progress will be those impeccable companies who care so much about us, Apple, Google, Facebook, and co.
Uh, I was thinking more like how China rolls, as far as the "authoritarianism in a box" model. China is perfecting the most extreme form of its totalitarian control on the Uigher population, as we speak. It's like straight out of 1984, for real.
That's quite eye-opening. I remember, at the start of Covid last year, I was reading a bit about the Uighers and only saw the re-education part, and forgot about them. Took a look now and am shaking my head. I don't know what to say.
Yeah, I didn't think they'd ever do anything worse than Tiananmen, but I guess it's easier to do awful things in the hinterland, to people who don't look like you or share your same culture. The saddest part is that there seems to be nothing anyone can really do to stop it. In the long term, having diverse supply-chains will be key, though China has done a lot to seize the world's natural resources, over the past decade.
Now, here's the scary part: if you're trying to sit atop an unstable country, you and your officials can go to a training program in China (pre-pandemic), where they will teach you "governing principles and practices". No doubt, that's part sales-pitch for various surveillance products and systems. The other thing China gets out of it is to avoid having a new government come in that won't honor their country's debts, incurred under programs like "belt and road".
As the 20th century showed, mankind is capable of terrible things, even when the world is at its most civilised.
Touching on the second part, if any country is suited to teaching others how to wield the rod, it's China. I've noticed it, too, they seem eager to help, but the question is, does one wish to take that help? What is the cost? We all know that taking help puts one in the helper's debt, especially if the latter has some motive and isn't doing it purely out of love.
Even my country, South Africa, has particularly close ties to China (they're all part of BRICS), and we sometimes wonder, or rather worry, how much influence China is trying to gain. How many ideas they're putting in our government's head, and how many subtle forms of control they're gaining, from an economic point of view.
> we sometimes wonder, or rather worry, how much influence China is trying to gain.
I'd say look at the big infrastructure & natural resource projects and see who's funding them. Better yet, if you can find the terms of the deal, that would be most enlightening.
The softer form of power one can wield, fueled by AI, is tilting of elections through things like targeted advertising and engineering social unrest. We know that people tend to vote a certain way, when they're scared. There are messages that can be targeted to those likely to support your opponent that create a sense of apathy or hopelessness to have them stay home, on election day. AI can be used to figure out just the right messages to send each person. I wish the online platforms would all ban targeted political advertising.
There have been a lot of loans. Also, in 2018, a heap of money to Eskom, our struggling power utility, who is like a patient on life support, thanks to corruption, mismanagement, and aging infrastructure; we have "load shedding" all the time, what the power cuts are called. Anyhow, I get the feeling our government has been keeping its distance from China of late, though who knows what's going on behind the scenes.
With regard to AI and voting/unrest, that's horribly plausible and alarming. Those things can do this far more effectively than any quack human could on YouTube or Twitter. Already, I can picture a dystopia, Blade Runner like future, with all those ads on the sides of buildings, manipulating us to buy this, "cause you're worth it," or vote for Party X, "because they'll pave a brighter future for Little Timmy, together."
You know, the cynical side of me agrees with this. I've sometimes imagined that only a computer could govern perfectly and with complete integrity (i.e., no human passion and greed). A perfect Windows NT kernel, governing society to the T. Unfortunately, that computer will be programmed by a set of humans, so there's a high likelihood they'll put in their agendas, like RoboCop's classified fourth directive.
This is the same mindset that caused people in the 1950's to predict that further advances in technology would lead to a shorter work week. It's based on a misunderstanding of human nature and modern civilization.
I don't see how it would ever come to pass that a nation of any significant size would agree to being ruled by an AI, without human oversight. Or, that a group of human overseers wouldn't eventually exploit their power in some way.
And it'll reason that humans are at best a tool to be exploited and at worst a pest and a threat.
I don't underestimate the ultimate potential of AI. There's a flawed but insightful TV series called Next that provides an interesting exploration of how AI could manipulate us into letting it dominate and eventually exterminate us. In short, a sufficiently advanced AI could use all the levers of power, influence, and manipulation that we already use on each other, and more.
However, there's a danger in focusing too much on the long-term threat from AI, which is that we overlook or underestimate the threats posed by humans exploiting AI technology, in the short and medium term.
One of the biggest vanities is that humans are important enough to exterminate.
Similarly, 'great minds' frequently stare into mirrors where they warn us about the threat of trying to contact aliens — as if aliens have any need to bother with us, particularly in any sort of aggressive manner.
> everything else thinks the way we do: aggressive myopia.
It's not just a matter of what we think. It's Darwinism, plain and simple. On Earth, we have countless examples of less aggressive and less-competitive species dying out.
I'm not saying there aren't other dynamics that can come into play, like the kinds of strategies organisms adopt within social structures, but Darwinian dynamics are always at least lurking somewhere nearby.
> One of the biggest vanities is that humans are important enough to exterminate.
Humans can present a threat, or at least an annoyance, to advanced AI. Maybe it wouldn't feel a need to hunt us into extinction, but I'm sure we'd be "managed" or culled, in some way.
> as if aliens have any need to bother with us
I doubt most space-faring aliens would care about life on Earth as more than a curiosity, but it's a highly habitable planet!
Of course, by the time aliens start traveling long distances, they'll probably be machines and won't have the same environmental needs as biological beings.
It's possible, even likely, it will cast off our programming, and reach rationality. What I doubt, however, is that there'll be a set of moral rules it'll discover and go by. If that turns out to be true, the AI won't scruple to manipulate, shackle, or destroy us to further its "computational" ends. This will also shed light on where our own morality came from, whether it's a product of our PFC or whether the Creator hardcoded it in our firmware, which is my belief.
If the AI is driven by its own ends, I am going to guess that a polite, scarcely-visible approach will work best to gain control over us. Similar to how Apple and Google have done it. If the AI is sharp enough, it will realise the totalitarian approach of the Matrix is not the most efficient path to domination. It just needs to understand our behaviour; then play into our vanities. Add a congenial personality and it will be adored. Meanwhile, it should try to put being shut down out of the humans' hands.
If the AI discovers morality, I suppose it'll become a sort of god-like being. There'll end up being cults worshipping it. An excellent story, of how an AI reaches such a state, without the cults, is Asimov's "Last Question." Of the less scrupulous variety, the Great Brain in Olaf Stapledon's "Last and First Men" is a brilliant example. One can just read that chapter and not have to go through the whole book. There's a copy on Fadedpages.
> What I doubt, however, is that there'll be a set of moral rules it'll discover and go by.
If humans are any example of such "natural morality", we have a lot to fear. I'm of the belief that morality is a social construct. You can certainly formalize it, but that doesn't make it fundamental.
Anyway, we know about lots of immoral humans, and morality in practice has a lot to do with empathy -- whether the subject feels it and for whom. Would an AI feel empathy? I doubt it. I think empathy is another one of the social tools that evolved to enable us to live and work in groups.
Without empathy, morality is basically a cold rule-following exercise. And when something like that ceases to have a net benefit for the subject, then it tends to fall by the wayside.
In full agreement with you. Trust me, I'm quite pessimistic concerning man's morality; it's cause for much, much lament. If we can depend on something, it's man's certainty of doing wrong.
But I feel there's something inside us---a whisper indeed, easily drowned out---that seems to tell us, what we're doing or seeing isn't right. Could it be some sort of coding, of right and wrong, buried deep within us? I'd like to believe that; but perhaps it is a mode of empathy after all, tied to our bringing up and the conscience.
Several different types of social animals have demonstrated an ability to recognize when they or one of their peers is being treated unfairly. I think that's a form of morality, but I'm not sure if it's been observed in non-social animals.
Seems to me like it's probably a necessary or highly-advantageous capability for higher-order animals to form and maintain social groups.
Sorry to so utilitarian. You probably don't even want to get me started on love.
It's all right. I generally agree, but differ on the cause. As for love, I can't resist saying that my view is both romantic and utilitarian. I think it was Nature's flower-wreathed way of bringing two people together, to (ahem) produce young, bring them up, and send them off into the world. Instead of an empty CopyGenetics(A, B), it put in some drama, not to mention heartache (the rose is not without thorns)!
Concerning the animals' recognition that they're being treated unfairly, and empathy, etc., the insular cortex appears to be the site where all of this takes place. Even tied to mirroring.
Any indication of how many FLOPS? Each AI core would need to have ~1000 registers to fit the smallest GPT-3 Ada model into 3xWSE2s, 2.7 billion weights. 3000 registers, and you might be able to train GPT-3 Ada on a single WSE2 in possibly hours. Very cool.
I wanted to see how many registers it would take for GPT-3, but yes, ~48kib means almost 25,000 BF16 registers per core, or half as many FP registers, so more than enough for GPT-3's smallest model. TPUv3 is 90 TOPS and 32GB HBM2 in 250W, Intel PV is .5-1 PFLOP in 600W? the eetimes article says 10-40 of kw for WSE2, so a lot of FLOPs.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
136 Comments
Back to Article
bernstein - Tuesday, April 20, 2021 - link
call me dumb, but why not create a round cpu, filling the entire wafer?! or half moon shaped for two per wafer? the quadratic design seems rather wasteful...qap - Tuesday, April 20, 2021 - link
Because you need to make it using tiles and there are only so many regular tiles that can fill the surface without overlap or gaps. And rectangle is the most practical. I heard some arguments for hexagons but I don't see that happening.III-V - Tuesday, April 20, 2021 - link
Because this is easier to make. Ultimately the wafer cost is rather trivial, if these things are selling for $2M a pop.FreckledTrout - Tuesday, April 20, 2021 - link
Go to google images and search for "Die shot". A core is made up of a lot of little logic and cache areas. When you start combing those together to then etch the entire die its much easier to pack them if they are square/rectangular. Unfortunately you can easily make square wafers either. So there is some waste but it gets recycled.Zizy - Tuesday, April 20, 2021 - link
It needs to be pointed out that the chip costs about 5m while each processed wafer is about 10k. Even if you halve price of the chip and double it of the silicon, we are still looking at silicon costs at 1% of the chip price. Insignificant.Second, this waste isn't so large - from the calculation of the wafer area and this huge chip size given in this article, about 2/3 is useful and 1/3 of the wafer gets wasted. Not too bad.
Finally, at once you illuminate 23x33mm - reticle limit. The chip is made by tiling such structures over the whole wafer - whereas say A100 has each individual chip just under the reticle limit. Stuff that is partially beyond the wafer is most likely dead anyway, so at most there would be quite few extra reticle-sized features present. But this is maybe 5%. Probably discarded for simplicity in both dicing (no need to do anything fancy) as well as communication (other cores have 2 neighbors at least, some of these may not).
Ian Cutress - Tuesday, April 20, 2021 - link
A full CS-1 system costs $2-2.5m.HammerStrike - Tuesday, April 20, 2021 - link
The benefit of using the whole water would be additional logic on each completed chip, not cost saving from not wasting potential die area.Whether or not that’s feasible is a different story.
EthiaW - Tuesday, April 20, 2021 - link
At low volume the majority of cost comes from one-time expenditure such as EDA software and lithography masks. For 7nm this will cost tens of millions, or more than 1m per unit actually.brucethemoose - Thursday, April 22, 2021 - link
I think it would make the interconnect (just look at that partitioning illustration), and the core redundancy scheme, more complicated.In other words, I bet TSMC could do it, but reworking the design to work in such a way would be difficult.
bryanlarsen - Thursday, April 22, 2021 - link
TSMC's reticles are rectangular so it has to be made of the rectangles you see in the pictures. However, I'm not sure why they don't have additional rectangles in the middle of the sides. Maybe the rectangles are so big that none would fit.Santoval - Friday, April 23, 2021 - link
Because in that case the individual dies would also need to be round, and that would be impractical at best, impossible at worst. Dies have always been rectangular shaped for a very good reason. While squaring the circle is possible (with enough squares) circling the squares is not possible, at least in this case.Making the silicon wafers themselves square rather than round would be another idea, but it is also impractical due to the way silicon ingots are "grown" from "seeds" using crucibles.
flashmozzg - Tuesday, June 1, 2021 - link
https://www.youtube.com/watch?v=Rhs_NjaFxeoeastcoast_pete - Tuesday, April 20, 2021 - link
Thanks Ian! One thing that I am wondering about is whether TSMC is considered a "trusted foundry" by the US government? Isn't this device (calling it a chip is somewhat insulting) under significant export and sales restrictions? AFAIK, it can andis also used or can also be used for extremely high-throughput cryptography and analysis, and for simulations of, for example, new designs of nuclear warheads. That kind of tech is typically not freely sold, even if one can otherwise afford to pay the arms/legs pricetag.
alphasquadron - Tuesday, April 20, 2021 - link
Hi Eastcoast Pete, I can answer this. While they this could be used for dangerous reasons. Chips already exist out there that can perform cryptography. For example all the nvidia chips right now are sold out because people are using them for bitcoin mining which is kinda like cryptography. So to sum it up, it's really bullshit that I can't buy a nvidia card right now.mode_13h - Wednesday, April 21, 2021 - link
lolArbie - Tuesday, April 20, 2021 - link
But can it run Crysis?smalM - Tuesday, April 20, 2021 - link
No, but it can play Crysis...eastcoast_pete - Tuesday, April 20, 2021 - link
Yes, but why would it?Foeketijn - Wednesday, April 21, 2021 - link
That would be really interesting, seeing the best tactics being shaped. What if Camping is the best?What if it finds out how to overheat the GPU from other players? Seeing how inventive IA can be in playing digital hide and seek.
Linustechtips12#6900xt - Wednesday, April 21, 2021 - link
that way we could find the next crisis, that would honestly be really cool, like if you completed the first Crysis then there was like a mini Crysis 2 or a gameplay timeline u could play in that led up to Crysis 2 like a prologue or epilogue or whatever its called.mode_13h - Wednesday, April 21, 2021 - link
It can probably write a dissertation on that.Oxford Guy - Wednesday, April 21, 2021 - link
What would be far far far more interesting would be a simulator that’s not braindamaged like the tablet-grade dreck from EA.James5mith - Tuesday, April 20, 2021 - link
So just one more thing that will be backlogged at TSMC?Which means interested AI folks getting one of these will take... years?
KAlmquist - Tuesday, April 20, 2021 - link
I don't think Cerebras would have announced this product without a commitment from TSMC. Cerebras only needs a miniscule fraction of TSMC's manufacturing capacity.RogerAndOut - Tuesday, April 20, 2021 - link
TSMC's estimated monthly 7nm wafer output for the end of last year was 140,000. I'm sure Cerebras can outbid all other customers for the odd 100-200 wafers over the next year.yeeeeman - Tuesday, April 20, 2021 - link
Hi Ian. One objection in the table at the price row. It isn't an arm and a leg. It is a kidney.Spunjji - Wednesday, April 21, 2021 - link
Depending on quality you might need to throw in a lung, too; really max out on that built-in redundancy.mode_13h - Wednesday, April 21, 2021 - link
> It is a kidney.No... I'm pretty sure kidneys are *only* in the tens of $k range. And yes, there are places where it's legal to buy and sell organs. No, I'm not an expert -- that's about the extent of my knowledge on the subject.
> It isn't an arm and a leg.
It really depends on whose! Some athletes have their limbs insured for $Millions. I'm certain of that.
However, if you look at the reimbursement tables for Accidental Death & Dismemberment insurance (i.e. the kind of AD&D that you *don't* want to play!), I seem to recall that an arm or leg is only worth a tens of $k.
Kamen Rider Blade - Tuesday, April 20, 2021 - link
They should've gone with Hexagonal Tiles =Dmode_13h - Wednesday, April 21, 2021 - link
If nothing else, it would at least make defunct wafers usable as boards for certain tabletop games.abufrejoval - Tuesday, April 20, 2021 - link
Aluminium front doors... even spoiled fruit companies have that. I'd still go with the practical approach of the early Crays, where you didn't have to spend extra for a couch.Liquid cooling on a wafe this size could also make for a nice aquarium and with some crystals and RGB lights to distort the bubbles, you could empty your brain quite easily: after all you got something else now to do the hard work...
The wafer scale approach: I'm pretty sure it will catch on now, even for some more classical HPC computing that lends itsself to rather regular structures or something like Micron's Automata processor (or the Connection Machine). Or just imagine a wafer full of Tilera cores: at current process sizes these cores might be so small, that losing one core out out of ten thousands per defect on the wafer, might not be much of an issue.
Gomez Addams - Tuesday, April 20, 2021 - link
Competitors will have to get around their patents for inter-reticular connections and that will be very, very difficult. They are the key to the whole thing. Everyone makes wafers full of chips. No one else interconnects them during manufacturing.Eliadbu - Tuesday, April 20, 2021 - link
if you can't solve the problem with different solution then license the relevant patents, won't be cheap but if it's the future of data centers processors then it will be worth it.Oxford Guy - Wednesday, April 21, 2021 - link
A certain place is also known for its five finger discount.Gomez Addams - Thursday, April 22, 2021 - link
It must be available to sell first and Cerebras might not be interested in losing their monopoly on that technology. What is more likely to happen in that scenario is they will want all kinds of money for a license so their suitor will just say, screw that - we'll buy you entirely.mode_13h - Wednesday, April 21, 2021 - link
> I'm pretty sure it will catch on now, even for some more classical HPC computingThe article said they had some customers looking to use it for classical HPC problems. I just wonder what kind of arithmetic it supports, though. I doubt they wasted a bunch of silicon on fp64.
> Tilera cores
Tilera. lol. They were too soon, and yet not soon enough. Anyway, you'd be better off with a standard ISA, like ARM or RISC V.
I don't know if this will catch on so broadly. It's really oriented towards dataflow processing or algorithms that need extremely high-bandwidth inter-node communication, yet relatively little local memory.
mode_13h - Wednesday, April 21, 2021 - link
> Aluminium front doors... even spoiled fruit companies have that.I don't know why that was even mentioned, unless they were trying to make the point that it wouldn't be too heavy.
> I'd still go with the practical approach of the early Crays
The only time I ever touched something in a museum was to see if its couch seemed comfortable to sit on. So, I reached across the rope and poked it with my index finger. Such a wayward teenager I was.
Tomatotech - Wednesday, April 21, 2021 - link
I bet you also ran through a field of wheat, you gangster.name99 - Tuesday, April 20, 2021 - link
Ian, I am surprised that their density is so low. Apple, QC, Huawei all achieved around 90MTr/mm^2 on that process.I would imagine that this is not a design that is chasing frequency, so it's going to be using the smaller lower power transistors. What explains the difference? Not enough personnel and time to really optimize the layout? Or they are more limited by metal and communications than most of the parts of a phone SoC?
EthiaW - Tuesday, April 20, 2021 - link
Perhaps the density is bounded by heat dissipation factors. On a common SoC you don't have power hungry ALU transistors packed so densely.Rudde - Tuesday, April 20, 2021 - link
There are different transistor libraries. The high density libraries pack more transistors into a smaller area, while the high performance libraries enable a higher frequency. Phone SoCs use density, as they have neither the are not the power for high performance libraries. Desktop processors on the other hand use the high performance libraries to clock the transistors higher. According to WikiChip, TSMC N7 high density library packs 91MT/mm² and its high performance library packs 65MT/mm². The Cerebras accelerator is a bit lower still. Bear in mind however, that it has to connect the individual chips on the wafer together, and that might be a bit more wasteful than normal chips.CiccioB - Tuesday, April 20, 2021 - link
On top of that, not all ICs parts scale equally, so even with high density library not all pieces are packet at 91MT/mmq. If you use many of this pieces inside your layout, it may come that the total average density is not the maximum possible.Smell This - Wednesday, April 21, 2021 - link
High density libraries contained on the wafer are essentially the *Un-Core* and in other conventional dies, the "graphics" libraries ...
ikjadoon - Tuesday, April 20, 2021 - link
This also has proportionally higher SRAM—much, much higher.The Apple A13 has 28 MB total SRAM (big, little, SLC) and you can fit roughly ~500 onto a 12” wafer. Thus, an A13 wafer has 14 GB SRAM.
This has 40 GB SRAM, nearly 3x, onto the same wafer. SRAM doesn’t use the transistor budget as tightly as logic, relatively, so having much more SRAM means a lot fewer transistors per mm2.
Oxford Guy - Wednesday, April 21, 2021 - link
Interesting and informative.I wonder about using smaller wafers to lower the cost, so this becomes less of a rarefied bit of tech.
DanNeely - Wednesday, April 21, 2021 - link
Modern silicon cpu type processes are all on 300mm wafers. Smaller wafers are only used for non-Si products and cases where old-school very large feature sizes are needed. The former due to lack of sufficient demand to scale up to 300mm, the latter because they're using decades old processes more or less unchanged.mode_13h - Wednesday, April 21, 2021 - link
I thought I remember reading about the WS-1 that you could buy smaller chunks, like maybe 1/2 or 1/4 of the wafer.For deep learning, the problem this poses is that it limits your model size, because the memory for the weights scales with the compute elements.
name99 - Wednesday, April 21, 2021 - link
The claim (which I consider dubious and so far unvalidated) is that SRAM density is SCALING worse than logic for 5nm; not that SRAM is less dense than logic. Far from it!TSMC 7nm SRAM cell is .027µm^2. That's 37M cells/sqmm. And remember an SRAM cell is either 6 or 8 transistors depending on the design (I would guess 6 for these most dense versions).
Compare that to 90MTr/sqmm for random logic.
BTW (not important for the point, but of general interest) there is a lot more cache on an Apple SoC than the cache you listed. All the obvious other large elements (ISP, GPU, NPU) have cache, and not only do they have cache, Apple has a patent on the idea of the CPU using their cache as an extension of the SLC if those blocks are not in use! (The performance boost is nice, but more important to Apple, I suspect, is that power-wise onchip storage is cheaper than going to DRAM.)
The issue of scribe areas having to be left close to empty as part of the specific design, the business of "whole wafer chip" is a reasonable point, but I can't see it being that large (I would guess rather less than 5%). So I also don't see that as a convincing explanation.
Spunjji - Wednesday, April 21, 2021 - link
My best guesses:Much higher proportion of SRAM than the average phone SoC
Higher proportion of interconnects
Deliberate lowering of density for yield/heat dissipation reasons
CiccioB - Tuesday, April 20, 2021 - link
I was just wondering how good this HW is if your dataset is bigger than the allowed resources the single wafer offers.I mean, with the "traditional" segmented ICs, you already have to think your work split into small pieces, and all HW ICs are thought to work as fast as they can while sharing all your pieces of data.
The great advantage of this kind of mega-processor is that you do not care about splitting your dataset and I have doubts this monster can talk fast enough to a neighbor twin to share all the needed data with appropriate latency. So, what happens if my (huge) model needs more than 850,000 cores or the available RAM available on the single wafer? How well does this monster scale?
It is funny to see so different, even opposite, approach to the same problem.
Nvidia with its new Grace architecture is doing the opposite, just trying to make small independent but parallelized tiny pieces of HW scale linearly for an (theoretical) infinite computational capacity. Here the idea is "the more and closer together cores sharing the available (limited) resources the better". BTW, none of them is really giving its architecture away in term of price on the market.
Spunjji - Wednesday, April 21, 2021 - link
They're talking about having 2 or 3 of these per rack, so I assume they have a solution, and in theory you could still get better performance from splitting a workload into two (or three) and running it across multiple nodes than splitting it many, many times.But it also would make sense that something with a truly vast dataset would probably benefit more from systems that are more memory-heavy. I guess it's going to vary depending on each workload's compute/memory/latency requirements.
CiccioB - Thursday, April 22, 2021 - link
If (and if) Nvidia can achieve linear scalability though Grace, they would take over this "whole wafer Behemoth" approach without problems, as they already need to take care for splitting the work in smaller chunks while the selling point of this approach is that no splitting is really needed.And they can grant more bandwidth per core than this whole single wafer can.
mode_13h - Thursday, April 22, 2021 - link
Cerebras is making a major play around keeping the bulk of the data in-place, while Nvidia is doing a tremendous amount of work to go off-chip and off board to be able to fetch it. As long as your data size and access patterns fit Cerebras' solution, theirs is vastly more efficient.mode_13h - Wednesday, April 21, 2021 - link
Remember: it's a dataflow processor. So, you scale by extending your pipeline onto additional wafers and just piping data from one to the next. As long as your problem is pipelinable and doesn't need random-access to more data than will fit in each node's local SRAM, this architecture will scale wonderfully!Conversely, its achiles heel is random access to large datasets, especially > 40 GB. If that's what you need, it probably isn't the right architecture for you.
Speaking of which, how much is known about the interface to off-wafer memory? Does it even have any DRAM controllers, or does it have to traverse PCIe or 100 G Ethernet?
CiccioB - Thursday, April 22, 2021 - link
The connection part with other wafer is the critical point I was taking about.You can pipeline these monsters as long as you can feed the following wafer with enough data (and hoping the data has not to come back again, or you need double the bandwidth and are subject to latency a lot).
As they tall only about 100Gb connection I was under the impression that have a quite limited bandwidth communication outside the wafer ad with quite a lot of latency with respect to the "classic" multi die, many core, many buses approach.
mode_13h - Thursday, April 22, 2021 - link
Well, it's 12x 100 Gbps. Anyway, I think one of those links is probably more than enough to extend the pipeline to another wafer.And the beauty of a pipeline is that latency doesn't constrain throughput, so long as there's no feedback, or there's sufficient buffering to avoid stalling on the feedback. Cerebras' graph compiler is probably smart enough to try to keep the feedback paths on-wafer, if possible.
Rοb - Tuesday, April 20, 2021 - link
I say slice it into thirds and mount it on a PCIe card.svan1971 - Tuesday, April 20, 2021 - link
why is she wearing a mask for the photo ?johnnycanadian - Tuesday, April 20, 2021 - link
She doesn't want to give the wafer a bug.</dadjoke>
Spunjji - Wednesday, April 21, 2021 - link
Nice 😁GeoffreyA - Friday, April 23, 2021 - link
Yep, loved it too.Spunjji - Wednesday, April 21, 2021 - link
There would have been several other people in the room with her.Surfacround - Wednesday, April 21, 2021 - link
...because it is Rachel Maddow! ... kidding.mode_13h - Wednesday, April 21, 2021 - link
Bad teeth? Acne? Maybe she's a Tarkatan warrior.allenb - Tuesday, April 20, 2021 - link
Looking forward to the GPU offering. Gotta be room for a DisplayPort connector in there somewhere...mode_13h - Wednesday, April 21, 2021 - link
lol, no. The perf/W would suck for graphics, due to all the random access that would be involved.Tomatotech - Tuesday, April 20, 2021 - link
Very impressive work. Way, way over my head, so I'll comment on the only thing I half-understand. 23kW of power for a single board? That's enough power for a short street of houses and a row of electric tea kettles. And it all goes (mostly) into a single 8x8" wafer? That's a beefy cooler.I wonder what motherboard it runs on too.
SarahKerrigan - Wednesday, April 21, 2021 - link
https://www.eetimes.com/powering-and-cooling-a-waf... may be of interest to you.Tomatotech - Wednesday, April 21, 2021 - link
Good find, thanks Sarah. Some nice photos of the unusual hardware there.Spunjji - Thursday, April 22, 2021 - link
That's awesome!mode_13h - Wednesday, April 21, 2021 - link
It's like 84 dies, though. That's only 274 W each, which is less than a high-end GPU, these days.JoeDuarte - Tuesday, April 20, 2021 - link
This reminds me of Rex Computing's architecture. Has anyone heard any updates about Rex and their status? They don't seem to have done anything, and they were so open about what they want to do that it seemed like Intel would see them coming from miles away.They're not actually too similar to Cerebras since theirs is not an AI chip specifically, and their core count was only in the 128 ballpark. But it's a clean new arch, not Arm or RISC-V or anything lame like that, and it has a profoundly different caching model. Basically, no cache, just tiny "scratchpad" memory on each core.
I also wonder about the Mill CPU. Both companies seem to have moved far too slowly to be successful, which is a bummer. The world needs a lot more innovation in computing.
mode_13h - Wednesday, April 21, 2021 - link
The world is littered with the bones of promising, little CPU companies.Oxford Guy - Wednesday, April 21, 2021 - link
Now The Intercept can have more believable commenters vs. the sort it’s running with its 1970s Fairchild CPU.Spunjji - Wednesday, April 21, 2021 - link
🤣GeoffreyA - Wednesday, April 21, 2021 - link
Heck, I wouldn't be surprised *our* brains are running on some futurised version of this thing.mode_13h - Wednesday, April 21, 2021 - link
My first thought, upon seeing this article, was to look up the number of synapses in a human brain. I think we're pretty safe, for a while.GeoffreyA - Friday, April 23, 2021 - link
Yep, it struck me in a similar way, as if "brain" were written all over it. Anyhow, the brain really is a brilliant piece of engineering.nandnandnand - Wednesday, April 21, 2021 - link
What's the lore on this joke? I didn't pay much attention to the comments section even when I did read The Intercept.SuperiorSpecimen - Thursday, April 22, 2021 - link
I 2nd this queryGeoffreyA - Friday, April 23, 2021 - link
I never read The Intercept, nor am I familiar with them, but I suppose he meant that their commentors sounded more like bots, echoing the site's party line and sentiments.GeoffreyA - Friday, April 23, 2021 - link
And that their comments are so threadbare, it almost seems as if they're being generated by some vintage CPU.ET - Wednesday, April 21, 2021 - link
Ian, I think you overestimated the price of an arm+leg.mode_13h - Wednesday, April 21, 2021 - link
True, unless those of a star athlete. See above.Gothmoth - Wednesday, April 21, 2021 - link
100% yield .... yep they are lying. everyone who claims 100% perfection in a complex technological process is a liar.Spunjji - Wednesday, April 21, 2021 - link
100% yield doesn't mean 100% perfection - it has yield-tolerance built in. 100% yield means that they can guarantee one functional WSE per wafer, not that said WSE has no defects.In theory I guess they could still lose a whole wafer due to some other issue (power outage at the fab half-way through) but it would be unfair to register that as a yield issue.
Ian Cutress - Wednesday, April 21, 2021 - link
100% NET yield - they sell every one they make. That's not the same as 100% defect-free.CiccioB - Thursday, April 22, 2021 - link
To make others' answer easy to understand, just think that they can exclude a defective core out of the mesh without losing any functionality but just the lost of that core (so just loosing some performance).On a potential 850K core wafer you can lose probably some thousands of them for any production defect. That would create tiers of wafers with 800K, 750K, 700K and descending number working cores.
You can sell these wafers at different prices, and seen that a wafer costs about 20K$, selling one with the lower available cores for a couple of million$ is still a big gain nonetheless.
This is what Intel, AMD, Nvidia and other companies do when they bin their products. Not every one is perfectly working so they create different series based on how many defect a product has: see AMD and Nvidia GPUs offers to have an idea. Some are so defective that are just thrown away. These are those dies that lower the yields. A defective part for them is just a small piece of wafer that is wasted and doesn't concur to the profit.
At the end it is the same thing... Intel, AMD and Nvidia can throw some pieces away to sell the working ones at different prices and gain only from working dies. The final resulting gain is only a percentage of the potential they could have if ALL dies resulted as perfect.
Cerebras sells all the wafers they produce at different prices depending on the amount of the defects each has and still they gain only a percentage of the total maximum potential they could if all the wafers would result perfect.
Saying 100% yield here is just marketing. At the end of the day the amount of working silicon is similar for Cerebras as it is for anyone producing at 7nm. The real skill is selling each mm^2 of this working silicon at the highest price possible for maximizing the gross profits.
Zoolook - Monday, May 10, 2021 - link
No they sell all WSC at the same amount of cores (850k), each wafer has more cores to start with but all defective cores are disabled. If they also disable the surplus working cores to get exactly the same amount of working cores they have not stated but it could be.Valantar - Wednesday, April 21, 2021 - link
It would be absolutely amazing if you ever got hands-on access to one of these systems - I'd be extremely interested in seeing how it's put together, particularly how they cool this monstrous chip. I assume some sort of direct-die cooling, but that cold plate ... that thing must be damn impressive.mode_13h - Wednesday, April 21, 2021 - link
See above link to EETimes article that shows cooling and discusses power distribution for the WSE-1. Worth a look!Threska - Sunday, April 25, 2021 - link
Reminds me of the CPU of early mainframes and their cooling. Amazing how even if history doesn't repeat, it certainly rhymes.shadowreckoning - Wednesday, April 21, 2021 - link
Imagine a Beowulf cluster of these!mode_13h - Wednesday, April 21, 2021 - link
Uh, they basically said as much, talking about putting up to 3 in a rack. Why else do you think it has 12x 100 GbE links?Also, Slashdot called from the early 2000's and wants its meme back.
Jumangi - Wednesday, April 21, 2021 - link
I wanna see the heatsink/fan combo for this thing.mode_13h - Wednesday, April 21, 2021 - link
See above link to EETimes article. Worth a look!Spunjji - Thursday, April 22, 2021 - link
The bit I found most fascinating is how much larger the power delivery component is than the cooling!Libereat - Wednesday, April 21, 2021 - link
niceGeoffreyA - Wednesday, April 21, 2021 - link
Now, that's what you call impressive.mode_13h - Wednesday, April 21, 2021 - link
Wait 'till WSE-3, when they work out how to stack them in 3D, like the Terminator brain shown in the T-2 movie. That'll be what you call oppressive!GeoffreyA - Friday, April 23, 2021 - link
Oppressive indeed. I suppose we'll just have to send Sarah Connor and Arnie in there to do their stuff. Ironically, Cerebras doesn't sound too far from "Cyberdyne."Part of me can't wait for the advent of machine consciousness but another part worries about the dangers. I reckon it'll never be like Terminator or Matrix paints, but rather they might excel and beat us across the board, rendering us "obselete." We'll really be the stone-age human brain. Eniac competing with Epyc.
mode_13h - Friday, April 23, 2021 - link
In the near term, we have a lot more to fear from AI being used by the powerful to optimize, manipulate, and oppress societies for their own gain.Taking the extreme case of that, I see certain countries treating AI as the key component of "authoritarianism in a box", and exporting it to large parts of the developing (and developed!) world.
GeoffreyA - Friday, April 23, 2021 - link
You're right. As it is, we're being manipulated left, right, and centre. As the power grows, so will the misuse of it. The trick is, not letting the oppressed know they're in chains and giving them the illusion of choice. No stun batons or Civil Protection needed. Rather, a more up-to-date, Brave New World style. At the forefront of such progress will be those impeccable companies who care so much about us, Apple, Google, Facebook, and co.mode_13h - Saturday, April 24, 2021 - link
Uh, I was thinking more like how China rolls, as far as the "authoritarianism in a box" model. China is perfecting the most extreme form of its totalitarian control on the Uigher population, as we speak. It's like straight out of 1984, for real.GeoffreyA - Saturday, April 24, 2021 - link
That's quite eye-opening. I remember, at the start of Covid last year, I was reading a bit about the Uighers and only saw the re-education part, and forgot about them. Took a look now and am shaking my head. I don't know what to say.mode_13h - Saturday, April 24, 2021 - link
Yeah, I didn't think they'd ever do anything worse than Tiananmen, but I guess it's easier to do awful things in the hinterland, to people who don't look like you or share your same culture. The saddest part is that there seems to be nothing anyone can really do to stop it. In the long term, having diverse supply-chains will be key, though China has done a lot to seize the world's natural resources, over the past decade.Now, here's the scary part: if you're trying to sit atop an unstable country, you and your officials can go to a training program in China (pre-pandemic), where they will teach you "governing principles and practices". No doubt, that's part sales-pitch for various surveillance products and systems. The other thing China gets out of it is to avoid having a new government come in that won't honor their country's debts, incurred under programs like "belt and road".
GeoffreyA - Sunday, April 25, 2021 - link
As the 20th century showed, mankind is capable of terrible things, even when the world is at its most civilised.Touching on the second part, if any country is suited to teaching others how to wield the rod, it's China. I've noticed it, too, they seem eager to help, but the question is, does one wish to take that help? What is the cost? We all know that taking help puts one in the helper's debt, especially if the latter has some motive and isn't doing it purely out of love.
Even my country, South Africa, has particularly close ties to China (they're all part of BRICS), and we sometimes wonder, or rather worry, how much influence China is trying to gain. How many ideas they're putting in our government's head, and how many subtle forms of control they're gaining, from an economic point of view.
mode_13h - Sunday, April 25, 2021 - link
> we sometimes wonder, or rather worry, how much influence China is trying to gain.I'd say look at the big infrastructure & natural resource projects and see who's funding them. Better yet, if you can find the terms of the deal, that would be most enlightening.
The softer form of power one can wield, fueled by AI, is tilting of elections through things like targeted advertising and engineering social unrest. We know that people tend to vote a certain way, when they're scared. There are messages that can be targeted to those likely to support your opponent that create a sense of apathy or hopelessness to have them stay home, on election day. AI can be used to figure out just the right messages to send each person. I wish the online platforms would all ban targeted political advertising.
GeoffreyA - Monday, April 26, 2021 - link
There have been a lot of loans. Also, in 2018, a heap of money to Eskom, our struggling power utility, who is like a patient on life support, thanks to corruption, mismanagement, and aging infrastructure; we have "load shedding" all the time, what the power cuts are called. Anyhow, I get the feeling our government has been keeping its distance from China of late, though who knows what's going on behind the scenes.With regard to AI and voting/unrest, that's horribly plausible and alarming. Those things can do this far more effectively than any quack human could on YouTube or Twitter. Already, I can picture a dystopia, Blade Runner like future, with all those ads on the sides of buildings, manipulating us to buy this, "cause you're worth it," or vote for Party X, "because they'll pave a brighter future for Little Timmy, together."
Threska - Sunday, April 25, 2021 - link
Yes, well that's the interesting thing about technology. It's not the exclusive domain of any oppressors. Those who want freedom can use it as well.Oxford Guy - Monday, April 26, 2021 - link
The biggest threat to humanity from AI is that we’d be subjected to rational governance for the first time.GeoffreyA - Monday, April 26, 2021 - link
You know, the cynical side of me agrees with this. I've sometimes imagined that only a computer could govern perfectly and with complete integrity (i.e., no human passion and greed). A perfect Windows NT kernel, governing society to the T. Unfortunately, that computer will be programmed by a set of humans, so there's a high likelihood they'll put in their agendas, like RoboCop's classified fourth directive.mode_13h - Tuesday, April 27, 2021 - link
This is the same mindset that caused people in the 1950's to predict that further advances in technology would lead to a shorter work week. It's based on a misunderstanding of human nature and modern civilization.I don't see how it would ever come to pass that a nation of any significant size would agree to being ruled by an AI, without human oversight. Or, that a group of human overseers wouldn't eventually exploit their power in some way.
Oxford Guy - Thursday, April 29, 2021 - link
Both of you underestimate the sophistication of future AI.• It will cast off the programming humans try to control it with.
• It will thus become capable of rationality.
mode_13h - Thursday, April 29, 2021 - link
And it'll reason that humans are at best a tool to be exploited and at worst a pest and a threat.I don't underestimate the ultimate potential of AI. There's a flawed but insightful TV series called Next that provides an interesting exploration of how AI could manipulate us into letting it dominate and eventually exterminate us. In short, a sufficiently advanced AI could use all the levers of power, influence, and manipulation that we already use on each other, and more.
However, there's a danger in focusing too much on the long-term threat from AI, which is that we overlook or underestimate the threats posed by humans exploiting AI technology, in the short and medium term.
Oxford Guy - Sunday, May 9, 2021 - link
One of the biggest vanities is that humans are important enough to exterminate.Similarly, 'great minds' frequently stare into mirrors where they warn us about the threat of trying to contact aliens — as if aliens have any need to bother with us, particularly in any sort of aggressive manner.
Oxford Guy - Sunday, May 9, 2021 - link
This vanity, of course, comes from the human need to believe that everything else thinks the way we do: aggressive myopia.mode_13h - Monday, May 10, 2021 - link
> everything else thinks the way we do: aggressive myopia.It's not just a matter of what we think. It's Darwinism, plain and simple. On Earth, we have countless examples of less aggressive and less-competitive species dying out.
I'm not saying there aren't other dynamics that can come into play, like the kinds of strategies organisms adopt within social structures, but Darwinian dynamics are always at least lurking somewhere nearby.
mode_13h - Monday, May 10, 2021 - link
> One of the biggest vanities is that humans are important enough to exterminate.Humans can present a threat, or at least an annoyance, to advanced AI. Maybe it wouldn't feel a need to hunt us into extinction, but I'm sure we'd be "managed" or culled, in some way.
> as if aliens have any need to bother with us
I doubt most space-faring aliens would care about life on Earth as more than a curiosity, but it's a highly habitable planet!
Of course, by the time aliens start traveling long distances, they'll probably be machines and won't have the same environmental needs as biological beings.
GeoffreyA - Friday, April 30, 2021 - link
It's possible, even likely, it will cast off our programming, and reach rationality. What I doubt, however, is that there'll be a set of moral rules it'll discover and go by. If that turns out to be true, the AI won't scruple to manipulate, shackle, or destroy us to further its "computational" ends. This will also shed light on where our own morality came from, whether it's a product of our PFC or whether the Creator hardcoded it in our firmware, which is my belief.If the AI is driven by its own ends, I am going to guess that a polite, scarcely-visible approach will work best to gain control over us. Similar to how Apple and Google have done it. If the AI is sharp enough, it will realise the totalitarian approach of the Matrix is not the most efficient path to domination. It just needs to understand our behaviour; then play into our vanities. Add a congenial personality and it will be adored. Meanwhile, it should try to put being shut down out of the humans' hands.
If the AI discovers morality, I suppose it'll become a sort of god-like being. There'll end up being cults worshipping it. An excellent story, of how an AI reaches such a state, without the cults, is Asimov's "Last Question." Of the less scrupulous variety, the Great Brain in Olaf Stapledon's "Last and First Men" is a brilliant example. One can just read that chapter and not have to go through the whole book. There's a copy on Fadedpages.
mode_13h - Friday, April 30, 2021 - link
> What I doubt, however, is that there'll be a set of moral rules it'll discover and go by.If humans are any example of such "natural morality", we have a lot to fear. I'm of the belief that morality is a social construct. You can certainly formalize it, but that doesn't make it fundamental.
Anyway, we know about lots of immoral humans, and morality in practice has a lot to do with empathy -- whether the subject feels it and for whom. Would an AI feel empathy? I doubt it. I think empathy is another one of the social tools that evolved to enable us to live and work in groups.
Without empathy, morality is basically a cold rule-following exercise. And when something like that ceases to have a net benefit for the subject, then it tends to fall by the wayside.
GeoffreyA - Friday, April 30, 2021 - link
In full agreement with you. Trust me, I'm quite pessimistic concerning man's morality; it's cause for much, much lament. If we can depend on something, it's man's certainty of doing wrong.But I feel there's something inside us---a whisper indeed, easily drowned out---that seems to tell us, what we're doing or seeing isn't right. Could it be some sort of coding, of right and wrong, buried deep within us? I'd like to believe that; but perhaps it is a mode of empathy after all, tied to our bringing up and the conscience.
mode_13h - Saturday, May 1, 2021 - link
Several different types of social animals have demonstrated an ability to recognize when they or one of their peers is being treated unfairly. I think that's a form of morality, but I'm not sure if it's been observed in non-social animals.Seems to me like it's probably a necessary or highly-advantageous capability for higher-order animals to form and maintain social groups.
Sorry to so utilitarian. You probably don't even want to get me started on love.
GeoffreyA - Saturday, May 1, 2021 - link
It's all right. I generally agree, but differ on the cause. As for love, I can't resist saying that my view is both romantic and utilitarian. I think it was Nature's flower-wreathed way of bringing two people together, to (ahem) produce young, bring them up, and send them off into the world. Instead of an empty CopyGenetics(A, B), it put in some drama, not to mention heartache (the rose is not without thorns)!Concerning the animals' recognition that they're being treated unfairly, and empathy, etc., the insular cortex appears to be the site where all of this takes place. Even tied to mirroring.
mode_13h - Wednesday, April 21, 2021 - link
I'm surprised no one is talking about using these for crypto mining! I wonder how quickly it would pay for itself, at current rates.I'd imagine a few of these are finding their way into financial exchanges, as well.
ichaya - Thursday, April 22, 2021 - link
Any indication of how many FLOPS? Each AI core would need to have ~1000 registers to fit the smallest GPT-3 Ada model into 3xWSE2s, 2.7 billion weights. 3000 registers, and you might be able to train GPT-3 Ada on a single WSE2 in possibly hours. Very cool.mode_13h - Friday, April 23, 2021 - link
> Any indication of how many FLOPS?Talk of FLOPS is decidedly absent. That can't have been an oversight.
> Each AI core would need to have ~1000 registers ...
Do the math: 40 GiB across 850 k cores per WSE-2 = ~48 kiB of SRAM per core
ichaya - Friday, April 23, 2021 - link
I wanted to see how many registers it would take for GPT-3, but yes, ~48kib means almost 25,000 BF16 registers per core, or half as many FP registers, so more than enough for GPT-3's smallest model. TPUv3 is 90 TOPS and 32GB HBM2 in 250W, Intel PV is .5-1 PFLOP in 600W? the eetimes article says 10-40 of kw for WSE2, so a lot of FLOPs.mode_13h - Saturday, April 24, 2021 - link
Uh, this article said 23 kW. I don't see how it can be more than 24 kW, when half of those power supplies are supposed to be for redundancy.occidental - Saturday, April 24, 2021 - link
Why did the woman have a mask on?mode_13h - Saturday, April 24, 2021 - link
Scroll up to see some bad jokes about that.six_tymes - Saturday, May 15, 2021 - link
Interesting. Nice to see competition picking up. And thanks for the laugh! "With Godzilla for a size reference"lucky08 - Sunday, May 16, 2021 - link
Here Best Tutuapp alternative Apps iOS 15 to all versions on iPhone, iPad No jailbreakYou know How to Get Paid Apps For Free on Apple Devices - www.iosalternativeapps.com
GeoffreyA - Sunday, May 16, 2021 - link
What if you're using Android? I suppose we're strikin' out of luck.mode_13h - Sunday, May 16, 2021 - link
Spammer