Power Consumption: Hot Hot HOT

I won’t rehash the full ongoing issue with how companies report power vs TDP in this review – we’ve covered it a number of times before. But in a quick sentence, Intel uses one published value for sustained performance, and an unpublished ‘recommended’ value for turbo performance, the latter of which is routinely ignored by motherboard manufacturers. Most high-end consumer motherboards ignore the sustained value, often 125 W, and allow the CPU to consume as much as it needs with the real limits being the full power consumption at full turbo, the thermals, or the power delivery limitations.

One of the dimensions of this we don’t often talk about is that the power consumption of a processor is always dependent on the actual instructions running through the core.  A core can be ‘100%’ active while sitting around waiting for data from memory or doing simple addition, however a core has multiple ways to run instructions in parallel, with the most complex instructions consuming the most power. This was noticeable in the desktop consumer space when Intel introduced vector extensions, AVX, to its processor design. The concurrent introduction of AVX2, and AVX-512, means that running these instructions draws the most power.

AVX-512 comes with its own discussion, because even going into an ‘AVX-512’ mode causes additional issues. Intel’s introduction of AVX-512 on its server processors showcased that in order to remain stable, the core had to reduce the frequency and increase the voltage while also pausing the core to enter the special AVX-512 power mode. This made the advantage of AVX-512 suitable only for strong high-performance server code. But now Intel has enabled AVX-512 across its product line, from notebook to enterprise, allowing these chips to run AI code faster and enabling a new use cases. We’re also a couple of generations on from then, and AVX-512 doesn’t get quite the same hit as it did, but it still requires a lot of power.

For our power benchmarks, we’ve taken several tests that represent a real-world compute workload, a strong AVX2 workload, and a strong AVX-512 workload. Note that Intel lists the Core i7-11700K as a 125 W processor.

Motherboard 1: Microcode 0x2C

Our first test using Agisoft Photoscan 1.3 shows a peak power consumption around 180 W, although depending on the part of the test, we have sustained periods at 155 W and 130 W. Peak temperatures flutter with 70ºC, but it spends most of the time at around the 60ºC mark.

For the AVX2 workload, we enable POV-Ray. This is the workload on which we saw the previous generation 10-core processors exceed 260 W.

At idle, the CPU is consuming under 20 W while touching 30ºC. When the workload kicks in after 200 seconds or so, the power consumption rises very quickly to the 200-225 W band. This motherboard implements the ‘infinite turbo’ strategy, and so we get a sustained 200-225 W for over 10 minutes. Through this time, our CPU peaks at 81ºC, which is fairly reasonable for some of the best air cooling on the market. During this test, a sustained 4.6 GHz was on all cores.

Our AVX-512 workload is 3DPM. This is a custom in-house test, accelerated to AVX2 and AVX512 by an ex-Intel HPC guru several years ago (for disclosure, AMD has a copy of the code, but hasn’t suggested any changes).

This tests for 10-15 seconds and then idles for 10 seconds, and does rapidly go through any system that doesn’t run an infinite turbo. What we see here in this power only graph is the alarming peaks of 290-292 W. Looking at our data, the all-core turbo under AVX-512 is 4.6 GHz, sometimes dipping to 4.5 GHz. Ouch. But that’s not all.

Our temperature graph looks quite drastic. Within a second of running AVX-512 code, we are in the high 90ºC, or in some cases, 100ºC. Our temperatures peak at 104ºC, and here’s where we get into a discussion about thermal hotspots.

There are a number of ways to report CPU temperature. We can either take the instantaneous value of a singular spot of the silicon while it’s currently going through a high-current density event, like compute, or we can consider the CPU as a whole with all of its thermal sensors. While the overall CPU might accept operating temperatures of 105ºC, individual elements of the core might actually reach 125ºC instantaneously. So what is the correct value, and what is safe?

The cooler we’re using on this test is arguably the best air cooling on the market – a 1.8 kilogram full copper ThermalRight Ultra Extreme, paired with a 170 CFM high static pressure fan from Silverstone. This cooler has been used for Intel’s 10-core and 18-core high-end desktop variants over the years, even the ones with AVX-512, and not skipped a beat. Because we’re seeing 104ºC here, are we failing in some way?

Another issue we’re coming across with new processor technology is the ability to effectively cool a processor. I’m not talking about cooling the processor as a whole, but more for those hot spots of intense current density. We are going to get to a point where can’t remove the thermal energy fast enough, or with this design, we might be there already.

Smaller Packaging

I will point out an interesting fact down this line of thinking though, which might go un-noticed by the rest of the press – Intel has reduced the total vertical height of the new Rocket Lake processors.

The z-height, or total vertical height, of the previous Comet Lake generation was 4.48-4.54 mm. This number was taken from a range of 7 CPUs I had to hand. However, this Rocket Lake processor is over 0.1 mm thinner, at 4.36 mm. The smaller height of the package plus heatspreader could be a small indicator to the required thermal performance, especially if the airgap (filled with solder) between the die and the heatspreader is smaller. If it aids cooling and doesn’t disturb how coolers fit, then great, however at some point in the future we might have to consider different, better, or more efficient ways to remove these thermal hotspots.

Motherboard 2: Microcode 0x34

As an addendum to this review a week after our original numbers, we obtained a second motherboard that offered a newer microcode version from Intel.

On this motherboard, the AVX-512 response was different enough to warrant mentioning. Rather than enable a 4.6 GHz all-core turbo for AVX-512, it initially ramped up that high, peaking at 276 W, before reducing down to 4.4 GHz all-core, down to 225 W. This is quite a substantial change in behaviour:

This means that at 4.4 GHz, we are running 200 MHz slower (which gives a 3% performance decrease), but we are saving 60-70 W. This is indicative of how far away from the peak efficiency point that these processors are.

There was hope that this will adjust the temperature curve a little. Unfortunately we still see peaks at 103ºC when AVX-512 is first initiated, however during the 4.4 GHz time scale we are more akin to 90ºC, which is far more palatable.

On AVX2 workloads with the new 0x34 microcode, the results were very similar to the 0x2C microcode. The workload ran at 4.6 GHz all-core, reached a peak power of 214 W, and the processor temperature was sustained around 82ºC.

Peak Power Comparison

For completeness, here is our peak power consumption graph. These are the peak power consumption numbers taken from a series of benchmarks on which we run our power monitoring tools.

(0-0) Peak Power

Intel Core i7-11700K Review CPU Tests: Microbenchmarks
Comments Locked

541 Comments

View All Comments

  • blppt - Saturday, March 13, 2021 - link

    Not really---the Bulldozer design had far more problems than just the inadequate Windows scheduler---while "optimized" linux kernels gave better performance, the true issue with Bulldozer was that you had 8 relatively weak cores versus 4 strong cores from Intel, and at the time, coders were still struggling to optimize for anything over 2 cores.

    In a sense, the designers of Bulldozer REALLY misread the timeline of highly multithreaded coding taking over the market. Heck, even if it was released now alongside the equivalent Core 2 quads, it would still stink for the majority of users, because no game gets any significant advantage of 4+ physical cores even today---and most games still value high single thread/core performance.
  • Oxford Guy - Sunday, March 28, 2021 - link

    'the true issue with Bulldozer was that you had 8 relatively weak cores versus 4 strong cores from Intel, and at the time, coders were still struggling to optimize for anything over 2 cores. In a sense, the designers of Bulldozer REALLY misread the timeline of highly multithreaded coding taking over the market.'

    My guess is that AMD designed Bulldozer for the enterprise market and didn't want to invest in an additional design more suited to the consumer desktop space. Instead, its additional design priority was the console scam (Jaguar). While that was a good move for AMD it wasn't beneficial for consumers, as consoles are a parasitic redundancy.

    One thing many ignore is that Piledriver supercomputers occupied quite high spots in the world performance lists. Mostly that was due to the majority of their work being done by the GPUs, though. Even the original Bulldozer, in Opteron branding, was used in some.

    The cheapness of Piledriver chips was also probably a factor in the adoption of the design for supercomputers. Turn down the voltage/wattage so that you're in the efficient part of the improved 32nm SOI node and rely almost completely on heavily threaded code when not running GPU-specific code... and voila — you have an alternative to the monopoly-priced Intel stuff.

    But, on the desktop, Piledriver was a bad joke. That's because of its very poor single-thread performance mainly. Not everything can be multi-threaded and even if it is that can mean a speed regression sometimes. The slowness of the L3, the lack of enough operations caching... the design wasn't even all that optimized for multi-thread performance — especially FPU stuff. The cores were very deeply pipelined, designed to use very high clocks. They were not efficient with avoiding bubbles and such. I read that AMD relied too heavily on automated tools due to cost sensitivity.

    My vague understanding of the design is that it was narrow and deep like the Pentium 4. Why AMD tried NetBurst 2.0 is beyond me. Even for the enterprise market it's a bad move because power efficiency is important there, especially with servers (rather than supercomputers which, I think, were more tolerant of high power usage – in terms of acceptable design requirements). Even turning down the clocks/voltage to get the best efficiency from the node doesn't fix the issue of the pipelining inefficiency (although hand-tuned code used for some enterprise/scientific stuff would mask that weakness more than general-purpose consumer-grade apps would).
  • usiname - Monday, March 8, 2021 - link

    Intel know very well that is mistake, but alder lake's big cores are not much better in term of power consumption and 10nm quality wafers so their only choice is to keep the production of max 8 core mainstream. This is bad for them, because even 3 years after AMD show 16 core mainstream they can't and as our very well known Intel they will cheat by introduce their 16 core cpu with 8 fake cores. Even more, when they introduce their "super duper" 12900k with 16 cores they will set price higher than every amd main stream and this is triple win, cheap 8 core cpu for manufacturing, better binned with higher clocks and on price of $800-1000. If you think intel trying to do something new and innovative you are wrong.
  • GeoffreyA - Monday, March 8, 2021 - link

    You may well turn out to be right about 16 cores having 8 junk ones, and knowing Intel, that's how they operate, with smoke and mirrors when they can't compete properly.
  • Hifihedgehog - Tuesday, March 9, 2021 - link

    The irony here is how Intel used to give ARM smack for having inferior single core performance while they were surpassed in multicore by the likes of Qualcomm. I believe—paraphrasing—what they would say is not all cores are created equal. Well, it looks like Intel is trying to look like they are maintaining parity when they are really just giving us mostly crappy cores that can’t perform well at all.
  • GeoffreyA - Tuesday, March 9, 2021 - link

    Also, as others have pointed out before, the nomenclature is just there to obfuscate the whole picture. Not knowing anything much about Alder Lake, I did some searching and saw that it's Golden Cove + Gracemont. Wondering what exactly GC was, I searched a little but couldn't find the answer, so I'll guess it's just Sunny Cove with a new name stuck on.
  • Bagheera - Monday, March 8, 2021 - link

    the more I read about Intel's 10nm (check relevant articles on SemiWiki and Semi accurate), the more I feel ADL is designed as a workaround for the power/thermal limitations of their 10nm process, instead of being some sort of revelation for next gen performance.

    Note how TGL remains Intel's only viable product on 10nm at the moment, with Ice Lake SP now two years late. I think Intel knows their 10nm may never be ready for desktop parts, so ADL is a way to have a desktop product on 10nm except not really (it's more akin to a mobile part).

    It will probably do fine for gaming, but highly doubtful it will be a meaningful competition to Zen 4 for the prosumer space.
  • blppt - Monday, March 8, 2021 - link

    Bulldozer/Piledriver were a dumb design because they relied HEAVILY on highly-threaded applications to achieve their performance. Almost none of which existed in the late 2000s when they launched. Single thread/core performance was absolutely pathetic compared to Intel's offerings at the time (Sandy Bridge and on).
  • Zan Lynx - Sunday, March 14, 2021 - link

    And yet here we are today with 8 cores in game consoles and AMD's Mantle API being the basis of both DirectX 12 and Vulkan. Multicore was always the future and it was obvious even in 2003.
  • blppt - Monday, March 15, 2021 - link

    "And yet here we are today with 8 cores in game consoles and AMD's Mantle API being the basis of both DirectX 12 and Vulkan. Multicore was always the future and it was obvious even in 200"

    You're missing the point. Even TODAY, few (if any) games gain an advantage of more than 4 physical cores + 4 Virtual. It is still, to this day, far more advantageous for games to have 4 strong physical cores than 8 weak ones. The latter was Bulldozer.

    Never mind back in 2011/2012.

Log in

Don't have an account? Sign up now