Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true, especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first-generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

The Ryzen 7 5700G has the quickest thread-to-thread latency, however does offer a single slowest core-to-core latency. But compared to the 4000G series, having a single unified L3 cache reduces to core-to-core latency a good amount. The Ryzen 5 5300G has the slowest intracore latency, but the fastest average core-to-core.

Per-Core Power

One other angle to examine is how much power each core is drawing with respect to the rest of the chip. In this test, we run POV-Ray with a specific thread mask for a minute, and take a power reading 30 seconds into the test. We output the core power values from all cores, and compare them to the reported total package power.

The peak per-core power is shown as 15.2 W when one core is loaded on the Ryzen 7 5700G, and that comes down to ~8.8W when all cores are loaded. Interestingly this processor uses more power when six cores are loaded.

The Ryzen 5 5300G starts at 11.5 W for a single core, but then moves up to 12.3 W when three cores are loaded. It comes back down to 11.5 W when all four cores are loaded, but this ensures a consistent frequency (the 5300G has a 4.2 GHz Base and 4.4 GHz Turbo, explaining the small variation in loading).

Frequency Ramping

Both AMD and Intel over the past few years have introduced features to their processors that speed up the time from when a CPU moves from idle into a high-powered state. The effect of this means that users can get peak performance quicker, but the biggest knock-on effect for this is with battery life in mobile devices, especially if a system can turbo up quick and turbo down quick, ensuring that it stays in the lowest and most efficient power state for as long as possible.

Intel’s technology is called SpeedShift, although SpeedShift was not enabled until Skylake.

One of the issues though with this technology is that sometimes the adjustments in frequency can be so fast, the software cannot detect them. If the frequency is changing on the order of microseconds, but your software is only probing frequency in milliseconds (or seconds), then quick changes will be missed. Not only that, as an observer probing the frequency, you could be affecting the actual turbo performance. When the CPU is changing frequency, it essentially has to pause all compute while it aligns the frequency rate of the whole core.

We wrote an extensive review analysis piece on this, called ‘Reaching for Turbo: Aligning Perception with AMD’s Frequency Metrics’, due to an issue where users were not observing the peak turbo speeds for AMD’s processors.

We got around the issue by making the frequency probing the workload causing the turbo. The software is able to detect frequency adjustments on a microsecond scale, so we can see how well a system can get to those boost frequencies. Our Frequency Ramp tool has already been in use in a number of reviews.

In our test, the Ryzen 5 5600G jumps from 2700 to the turbo frequency in around a millisecond.

Power Consumption CPU Tests: Office and Science
Comments Locked

135 Comments

View All Comments

  • abufrejoval - Thursday, August 5, 2021 - link

    There are indeed so many variables and at least as many shortages these days. And it's becoming a playground for speculators, who are just looking for such fragilities in the suppy chain to extort money.

    I remember some Kaveri type chips being sold by AMD, which had the GPU parts chopped off by virtue of being "borderline dies" on a round 300mm wafer. Eventually they also had enough of these chips with the CPU (and SoC) portion intact, to sell them as a "GPU-less APU".

    Don't know if the general layout of the dies allows for such "halflings" on the left or right of a wafer...
  • mode_13h - Wednesday, August 4, 2021 - link

    Ian, please publish the source of 3DPM, preferably to github, gitlab, etc.
  • mode_13h - Wednesday, August 4, 2021 - link

    For me, the fact that 5600X always beats 5600G is proof that the non-APUs' lack of an on-die memory controller is no real deficiency (nor is the fact that the I/O die is fabbed on an older process node).
  • GeoffreyA - Thursday, August 5, 2021 - link

    The 5600X's bigger cache and boost could be helping it in that regard. But, yes, I don't think the on-die memory controller makes that much of a difference compared to the on-package one.
  • mode_13h - Friday, August 6, 2021 - link

    I wrote that knowing about the cache difference, but it's not going to help in all cases. If the on-die memory controller were a real benefit over having it on the I/O die, I'd expect to see at least a couple benchmarks where the 5600G outperformed the 5600X. However, they didn't switch places, even once!

    I know the 5600X has a higher boost clock, but they're both 65W and the G has a higher base frequency. So, even on well-threaded, non-graphical benchmarks, it's quite telling that the G can never pass the X.
  • GeoffreyA - Friday, August 6, 2021 - link

    Remember how the Core 2 Duo left the Athlon 64 dead on the floor? And that was without an on-die MC.
  • mode_13h - Saturday, August 7, 2021 - link

    That's not relevant, since there were incredible differences in their uArch and fab nodes.

    In this case, we get to see Zen 3 cores on the same manufacturing process. So, it should be a very well-controlled comparison. Still not perfect, but about as close as we're going to get.

    Also, the memory controller is in-package, in both cases. The main difference of concern is whether or not it's integrated into the 7 nm compute die.
  • GeoffreyA - Saturday, August 7, 2021 - link

    In agreement with what you are saying, even in my first comment. I think Cezanne shows that having the memory controller on the package gets the critical gains (vs. the old northbridge), and going onto the main die doesn't add much more.

    As for K8 and Conroe, I always felt it was notable in that C2D was able to do such damage, even without an IMC. Back when K8 was the top dog, the tech press used to make a big deal about its IMC, as if there were no other improvements besides that.
  • mode_13h - Sunday, August 8, 2021 - link

    One bad thing about moving it on-die is that this gave Intel an excuse to tie ECC memory support to the CPU, rather than just the motherboard. I had a regular Pentium 4 with ECC memory, and all it required was getting a motherboard that supported it.

    As I recall, the main reason Intel lagged in moving it on-die is that they were still flirting with RAMBUS, which eventually went pretty much nowhere. At work, we built one dual-CPU machine that required RAMBUS memory, but that was about the only time I touched the stuff.

    As for the benefits of moving it on-die, it was seen as one of the reasons Opteron was able to pull ahead of Pentium 4. Then, when Nehalem eventually did it, it was seen as one of the reasons for its dominance over Core 2.
  • GeoffreyA - Sunday, August 8, 2021 - link

    Intel has a fondness for technologies that go nowhere. RAMBUS was supposed to unlock the true power of the Pentium 4, whatever that meant. Well, the Willamette I used for a decade had plain SDRAM, not even DDR. But that was a downgrade, after my Athlon 64 3000+ gave up the ghost (cheapline PSU). That was DDR400. Incidentally, when the problems began, they were RAM related. Oh, those beeps!

Log in

Don't have an account? Sign up now