Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true, especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first-generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

The Ryzen 7 5700G has the quickest thread-to-thread latency, however does offer a single slowest core-to-core latency. But compared to the 4000G series, having a single unified L3 cache reduces to core-to-core latency a good amount. The Ryzen 5 5300G has the slowest intracore latency, but the fastest average core-to-core.

Per-Core Power

One other angle to examine is how much power each core is drawing with respect to the rest of the chip. In this test, we run POV-Ray with a specific thread mask for a minute, and take a power reading 30 seconds into the test. We output the core power values from all cores, and compare them to the reported total package power.

The peak per-core power is shown as 15.2 W when one core is loaded on the Ryzen 7 5700G, and that comes down to ~8.8W when all cores are loaded. Interestingly this processor uses more power when six cores are loaded.

The Ryzen 5 5300G starts at 11.5 W for a single core, but then moves up to 12.3 W when three cores are loaded. It comes back down to 11.5 W when all four cores are loaded, but this ensures a consistent frequency (the 5300G has a 4.2 GHz Base and 4.4 GHz Turbo, explaining the small variation in loading).

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high-powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, the software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

In our test, the Ryzen 5 5600G jumps from 2700 to the turbo frequency in around a millisecond.

Power Consumption CPU Tests: Office and Science
Comments Locked

135 Comments

View All Comments

  • mode_13h - Monday, August 9, 2021 - link

    > RAMBUS was supposed to unlock the true power of the Pentium 4

    I think Intel underestimated what the DDR consortium was capable of doing. Perhaps they were right, as the DDR makers were eventually forced to license some RAMBUS patents, as I recall.

    > the Willamette I used for a decade had plain SDRAM, not even DDR.

    Northwood was the best. Sadly, I bought a Prescott because I wanted hyperthreading and hoped the 2x L2 cache would compensate for the longer pipeline. But, it turns out you could even get hyperthreading and 800 MHz FSB, in a couple Northwoods. I also thought SSE3 might be useful, but never got around to doing anything with it.

    BTW, I also used DDR400 in my P4.
  • GeoffreyA - Tuesday, August 10, 2021 - link

    For me, both the Prescott and A64 were available, but I went with the latter because I always wanted an Athlon. Originally, was looking at the XP 3200+ and dreamt of coupling that with an nForce2 motherboard. As for Northwood, masterpiece of a CPU. P4 would have put up a respectable defence against the A64 had they continued with it. My aunt had a 2.4 GHz Northwood back then, and my school friend a 2.66 GHz one. His struggled at first, but once he got more RAM and a GeForce FX 5700, it really flew. Still remember running through Delta Labs in Doom 3 at 60 fps!
  • mode_13h - Wednesday, August 11, 2021 - link

    Prescott was rumored to have 64-bit support, though it wasn't enabled. I think that explains some of the additional pipeline depth.

    When Core 2 first launched, I was skeptical the IPC could increase so much that so much lower-clocked CPUs would really outperform their predecessors. It took me a little while to fully accept it. I was hopeful the final 65 nm iteration of Pentium 4 would finally let the Netburst architecture stretch its legs, but even that couldn't overcome its inefficiencies and other deficits.
  • GeoffreyA - Friday, August 13, 2021 - link

    Quite likely. Come to think of it, didn't the Pentium Ds have x64? And they were Prescotts.

    Indeed, 65 nm might have taken Northwood further. Would've made an interesting processor which we'll never see. As for their 31-stage brethren, the 65 nm Cedar Mills dropped power a fair bit.
  • GeoffreyA - Friday, August 13, 2021 - link

    "65 nm might have taken Northwood further"

    Well, we didn't even get to see a 90 nm one.
  • coolrock2008 - Wednesday, August 4, 2021 - link

    Ryzen 5 APUs Table, there is a typo. the 5600G is listed as an 8 core part whereas its listed as a 6 core part in the previous table.
  • Wereweeb - Wednesday, August 4, 2021 - link

    I know how hard it is to actually publish something that is both excellently researched and at a moment the matter is still relevant. Thank you for your coverage.

    Plus, it's Anantech, the important parts here are the data and analysis, not how well a tired writer proofreads their own text.
  • Fulljack - Friday, August 6, 2021 - link

    I disagree. any researcher would say that proofreading are also as important as the analysis itself. it's how you serve the data and the analysis to broader audience, after all.
  • dsplover - Wednesday, August 4, 2021 - link

    Three times the IPC of my beloved i7 4790k’s. I’ll try one, maybe a few as I don’t need the fastest.
    The cooler, fast enough is fine for my 1U builds.

    Thanks AMD. Tiger Lake never appeared, you win.
  • dsplover - Wednesday, August 4, 2021 - link

    I meant 30% more IPC…

Log in

Don't have an account? Sign up now