Anandtech reviews products independently. When you click links to buy products we may earn money that supports our work.


Original Link: https://www.anandtech.com/show/13627/memory-frequency-scaling-in-sff-systems-an-investigation-with-sodimms-and-coffee-lake



Overclocking has generally been the domain of enthusiasts with desktop rigs. Though more recently we have seen even SFF PCs joining the bandwagon – Intel's Hades Canyon NUC, for example, supports overclocking the CPU as well as the GPU. However, increasing the CPU frequency beyond the official specifications is not the only way to extract more performance from a computing system. Memory-bound workloads can benefit from memory hierarchies with increased bandwidth and/or lower latencies.

We last looked at DDR4 memory scaling effects on SFF PCs when we experimented with different SO-DIMMs in the Skull Canyon NUC (NUC6i7KYK) based on the Skylake platform. Current SFF PCs are based on Coffee Lake, which brings in more cores while keeping power efficiency in mind. Compared to the Skylake memory controller's official limit of 2133 MT/s, the Coffee Lake memory controller ships with DDR4-2666 supported out of the box. In this article, we explore the effects of varying DDR4 SO-DIMM frequencies and timings on a SFF PC with a standard Coffee Lake desktop CPU.

Introduction

Since the introduction of DDR4 support in the Skylake platform, we have seen expanded support for overclocked memory kits on both the desktop and notebook segments. On standard non-overclocked systems, the DDR4 memory controller in Coffee Lake desktop CPUs operates at 2666 MT/s, while the U-series CPUs have a 2400 MT/s interface. DDR4 DIMMs operating as high as 4266 MT/s are available for desktop systems with full-sized memory slots. On the SO-DIMM side, we have seen various vendors introduce kits operating between 2133 MT/s and 3200 MT/s. While 2400 MT/s has become the de-facto SO-DIMM frequency for current systems, the usage of desktop CPUs in SFF PCs such as the ASRock DeskMini have ensured that the higher frequency SO-DIMMs also have adoption.

At the time of our Skull Canyon NUC memory scaling experiments, the number of SFF PCs supporting overclocked SO-DIMMs was very few. Since then, we have had numerous SFF systems utilizing desktop CPUs go through our test bench – the Zotac ZBOX MAGNUS EN1080K and the ASRock DeskMini GTX bring prime examples. For today's article we chose the latter for our experiments; the BIOS includes XMP support, making it very easy to get the SO-DIMM kits running at their higher-rated frequencies.

With memory prices still running quite high, we shifted from the 2x16GB configurations used in our Skull Canyon memory scaling piece to 2x8GB configuration for the Coffee Lake experiments. To this end, we procured some of Team Group's T-Force Vulcan SO-DIMMs, which are available in both 2400 MT/s and 2666 MT/s speed grades. Meanwhile to look at higher memory speeds, we also picked up one of Kingston's HyperX Impact 3200 kits.

The rest of this review deals with the quantitative measurement of the effectiveness of different types of DRAM in the ASRock DeskMini GTX (Z370). In order to do this, we processed various benchmarks while keeping everything other than the DRAM SO-DIMMs constant. Each configuration was booted to BIOS multiple times to ensure that the SPD information was properly parsed and the optimal frequency / timing parameters chosen. Once the OS was booted, we also checked with multiple hardware monitoring tools that the parameters indicated by the BIOS for the DRAM SO-DIMMs were indeed what the OS was also seeing.

ASRock DeskMini Z370 GTX1060 Specifications
Processor Intel Core i7-8700
Coffee Lake-S, 6C/12T, 3.2 GHz (Turbo 4.6 GHz), 14nm++, 12 MB L2, 65W TDP
Memory - Option 1 Team Group T-Force Vulcan TLRD416G2400HC15BDC-S01 DDR4 SO-DIMM
15-17-17-35 @ 2400 MT/s
2x8 GB
Memory - Option 2 Team Group T-Force Vulcan TLRD416G2666HC18F-SBK DDR4 SO-DIMM
18-18-18-38 @ 2666 MT/s
2x8 GB
Memory - Option 3 Kingston HyperX KHX3200C20S4 DDR4 SO-DIMM
17-19-19-39 @ 2933 MT/s
2x8 GB
Memory - Option 4 Kingston HyperX KHX3200C20S4 DDR4 SO-DIMM
20-22-22-42 @ 3066 MT/s
2x8 GB
Graphics NVIDIA GeForce GTX 1060 (6GB GDDR5)
Disk Drive(s) Team Group Cardea Zero TM8FP2240G0C111
(240 GB; M.2 Type 2280 PCIe 3.0 x4 NVMe; Toshiba 15nm; MLC)
Networking Intel Dual Band Wireless-AC 3165
(1x1 802.11ac - 433 Mbps)
1x Intel I219V Gigabit LAN
Audio 3.5mm Headphone Jack
Capable of 5.1/7.1 digital output with HD audio bitstreaming (HDMI)
Miscellaneous I/O Ports 2x USB 2.0
5x USB 3.0 (Type-A)
1x USB 3.0 (Type-C)
Operating System Retail unit is barebones, but we installed Windows 10 Pro x64
Pricing (As configured) $1100 - $1138
Full Specifications ASRock DeskMini Z370 GTX1060 Specifications

In the next section, we will first take a look at the specifications of the three SO-DIMM kits / four configurations that were evaluated in the ASRock DeskMini GTX Z370, along with the AIDA64 Memory Bench for each. Following this, we present the results from our updated test suite for SFF PCs and workstations - SYSmark 2018, Futuremark / UL benchmarks, SPECworkstation 3 etc. Prior to our concluding remarks, we take a look at a few miscellaneous aspects such as pricing.



DRAM Options for the ASRock DeskMini GTX (Z370)

There are a number of DDR4 SO-DIMM kits compatible with the ASRock DeskMini GTX. In addition to the frequency, the timing parameters / latency numbers also influence the performance. For our evaluation today, we are going to look at four different options with three different kits:

  • Kingston HyperX Impact DDR4 SO-DIMM 2x8GB @ 3066 MT/s, 20-22-22-42
  • Kingston HyperX Impact DDR4 SO-DIMM 2x8GB @ 2933 MT/s, 17-19-19-39
  • Team Group T-Force Vulcan DDR4 SO-DIMM 2x8GB @ 2666 MT/s, 18-18-18-38
  • Team Group T-Force Vulcan DDR4 SO-DIMM 2x8GB @ 2400 MT/s, 15-17-17-35

Out of these configurations, the HyperX Impact ones were both from the same kit rated for operation at up to 3200 MT/s. Unfortunately, the ASRock DeskMini GTX refused to boot at 3200 MT/s. However, setting that XMP profile and manually tweaking the frequency to 3066 MT/s enabled the system to operate normally. The two T-Force Vulcan configurations were enabled using distinct kits.

Kingston HyperX Impact [ 2933 MT/s, 17-19-19-39 ] [ 3066 MT/s, 20-22-22-42 ]

The Kingston HyperX Impact HX432S20IB2K2/16 SO-DIMM kit used in our review supports two XMP profiles - one at 2933 MT/s with a 17-19-19-39 timing, and another at 3200 MT/s with a 20-22-22-42 timing. While the former profile was a one-click process in the BIOS to activate, the latter caused some issues and had to be run at 3066 MT/s. We could have spent more time optimizing the latency numbers for the 3066 MT/s operating point. However, keeping in mind the number of benchmarks to process, we decided to make do with minimal changes to the BIOS after the XMP activation.

AIDA64 includes a quick memory benchmark that provides some basic bandwidth and latency numbers to ensure that the kits are operating as expected.

At 3066 MT/s, the RAM-to-RAM copy (representative of both read and write speeds) comes in at 36924 MBps, and an average latency of 54.6 ns.

The relevant BIOS screenshot for the 2933 MT/s configuration are provided below.

The AIDA64 benchmark for the 2933 MT/s configuration shows lower read and write bandwidth numbers compared to the 3066 MT/s configuration. However, the copy benchmark comes in a tad higher at 37685 MBps, and this is in part due to the 51.6 ns latency metric (compared to 54.6 ns for the DDR4-3066 configuration).

The kit is currently being sold for $168.

Team Group T-Force Vulcan [ 2666 MT/s, 18-18-18-38 ]

The Team Group T-Force Vulcan TLRD416G2666HC18FDC-S01 SO-DIMM kit supports one XMP profile at 2666 MT/s with a 18-18-18-38 timing.

The AIDA64 Cache and Memory Benchmark numbers are presented below. The RAM performance in terms of bandwidth is a bit lower than the DDR4-3066 and DDR4-2933, as expected. The tighter timing numbers (in terms of clock cycles) are unfortunately not able to make up for the loss from the frequency component, and the latency ends up being slightly worse at 55.7 ns.

The 8GB member of the kit is currently being sold for $65. The kit itself doesn't seem to be available for a reasonable price at any e-tailer in the US, but, it is fine to buy a couple of the 8GB modules for use in a system. In our pricing table, we have assumed that this kit costs $130.

Team Group T-Force Vulcan [ 2400 MT/s, 15-17-17-35 ]

The Team Group T-Force Vulcan TLRD416G2400HC15BDC-S01 SO-DIMM kit works out of the box at 2400 MT/s with a 15-17-17-35 timing. It supports a single XMP profile. With all Coffee Lake systems supporting DDR4-2400 at the minimum, it doesn't need to. However, the XMP support is useful for older platforms.

The AIDA64 Cache and Memory Benchmark numbers are presented below. As expected, this base frequency configuration comes in with the lowest bandwidth numbers and highest latency among all the tested configurations.

The 4GB single SO-DIMM version in this family is currently being sold for $37. No current listing for the 8GB kit could be found. In our pricing table, we have assumed that the kit being evaluated at 2400 MT/s costs $130, though it has got to be lower than the 2666 MT/s kit in retail pricing (as it is the set with the lower performance in the T-Force Vulcan DDR SO-DIMM product line).

A note about the performance numbers from the artificial AIDA64 memory benchmark - the bandwidth numbers are as expected, varying from 35043 MBps for the DDR4-2400 kit to 43377 MBps for the DDR4-3066 kit. The latency numbers have a surprise, though - the best seems to be 51.6 ns for the DDR-2933 kit. In the rest of this review, we will see how these numbers influence day-to-day usage and computing workloads using both artificial and real-world benchmarks.



BAPCo SYSmark 2018

The different ASRock DeskMini Z370 GTX configurations were evaluated using our Fall 2018 test suite for small-form factor PCs. In the first performance section, we will be looking at SYSmark 2018.

BAPCo's SYSmark 2018 is an application-based benchmark that uses real-world applications to replay usage patterns of business users in the areas of productivity, creativity, and responsiveness. The 'Productivity Scenario' covers office-centric activities including word processing, spreadsheet usage, financial analysis, software development, application installation, file compression, and e-mail management. The 'Creativity Scenario' represents media-centric activities such as digital photo processing, AI and ML for face recognition in photos and videos for the purpose of content creation, etc. The 'Responsiveness Scenario' evaluates the ability of the system to react in a quick manner to user inputs in areas such as application and file launches, web browsing, and multi-tasking.

Scores are meant to be compared against a reference desktop (the SYSmark 2018 calibration system, a Dell Optiplex 5050 tower with a Core i3-7100 and 4GB of DDR4-2133 memory to go with a 128GB M.2 SATA III SSD). The calibration system scores 1000 in each of the scenarios. A score of, say, 2000, would imply that the system under test is twice as fast as the reference system.

SYSmark 2018 - Productivity

SYSmark 2018 - Creativity

SYSmark 2018 - Responsiveness

SYSmark 2018 - Overall

Other than the responsiveness workload, the DDR4-2933 configuration ends up with the best scores. This correlates well with the minimum latency observed in the AIDA64 Cache and Memory Benchmark testing. The scores generally follow the expected trend, and in cases where they don't, they are within the margins of variation observed in multiples runs with the same configuration.

SYSmark 2018 also adds energy measurement to the mix. A high score in the SYSmark benchmarks might be nice to have, but, potential customers also need to determine the balance between power consumption and the efficiency of the system. For example, in the average office scenario, it might not be worth purchasing a noisy and power-hungry PC just because it ends up with a 2000 score in the SYSmark 2014 SE benchmarks. In order to provide a balanced perspective, SYSmark 2018 also allows vendors and decision makers to track the energy consumption during each workload. In the graphs below, we find the total energy consumed by the PC under test for a single iteration of each SYSmark 2018 workload. For reference, the calibration system consumes 5.36 Wh for productivity, 7.71 Wh for creativity, 5.61 Wh for responsiveness, and 18.68 Wh overall.

SYSmark 2018 - Productivity Energy Consumption

SYSmark 2018 - Creativity Energy Consumption

SYSmark 2018 - Responsiveness Energy Consumption

SYSmark 2018 - Overall Energy Consumption

The energy consumption is not reflective of the average power consumption at load or idle. Rather, the faster the benchmark is able to complete, the lesser is the energy recorded for the workload. This is the reason for the DDR4-3066 configuration (expected to be the most power-hungry of the lot) consuming the least energy in the 'Responsiveness' workload. Overall, the DDR4-2400 configuration fares the worst due to the workloads needing a longer time to complete. The DDR4-2666 configuration seems to achieve the best balance of power consumption and speed.



UL Benchmarks - PCMark and 3DMark

This section deals with a selection of the UL Futuremark benchmarks - PCMark 10, PCMark 8, and 3DMark. While the first two evaluate the system as a whole, 3DMark focuses on the graphics capabilities.

PCMark 10

UL's PCMark 10 evaluates computing systems for various usage scenarios (generic / essential tasks such as web browsing and starting up applications, productivity tasks such as editing spreadsheets and documents, gaming, and digital content creation). We benchmarked select PCs with the PCMark 10 Extended profile and recorded the scores for various scenarios. These scores are heavily influenced by the CPU and GPU in the system, though the RAM and storage device also play a part. The power plan was set to Balanced for all the PCs while processing the PCMark 10 benchmark.

The DDR4-2933 seems to perform better than the other configurations in general. However, the scores really don't vary that much across the different configurations for each scenario.

Futuremark PCMark 10 - Essentials

Futuremark PCMark 10 - Productivity

Futuremark PCMark 10 - Gaming

Futuremark PCMark 10 - Digital Content Creation

Futuremark PCMark 10 - Extended

PCMark 8

We continue to present PCMark 8 benchmark results (as those have more comparison points) while our PCMark 10 scores database for systems grows in size. PCMark 8 provides various usage scenarios (home, creative and work) and offers ways to benchmark both baseline (CPU-only) as well as OpenCL accelerated (CPU + GPU) performance. We benchmarked select PCs for the OpenCL accelerated performance in all three usage scenarios. These scores are heavily influenced by the CPU in the system.

Futuremark PCMark 8 - Home OpenCL

Futuremark PCMark 8 - Creative OpenCL

Futuremark PCMark 8 - Work OpenCL

The scores are bunched together for the most part, and a clear delineation between the different configurations is hard to find.

3DMark

UL's 3DMark comes with a diverse set of graphics workloads that target different Direct3D feature levels. Correspondingly, the rendering resolutions are also different. We use 3DMark 2.4.4264 to get an idea of the graphics capabilities of the system. In this section, we take a look at the performance of the ASRock DeskMini Z370 GTX configurations across the different 3DMark workloads.

Since the main system RAM characteristics don't affect GPU-limited workloads, we expect that most benchmark scores are going to be the same for all the tested configurations. There may be some graphics workloads that are CPU / CPU-memory bandwidth limited, but, such characteristics may be hard to target with the GPU-limited 3DMark workloads.

3DMark Ice Storm

This workload has three levels of varying complexity - the vanilla Ice Storm, Ice Storm Unlimited, and Ice Storm Extreme. It is a cross-platform benchmark (which means that the scores can be compared across different tablets and smartphones as well). All three use DirectX 11 (feature level 9) / OpenGL ES 2.0. While the Extreme renders at 1920 x 1080, the other two render at 1280 x 720. The graphs below present the various Ice Storm workloads' numbers for different configurations that we have evaluated.

UL 3DMark - Ice Storm Workloads

3DMark Cloud Gate

The Cloud Gate workload is meant for notebooks and typical home PCs, and uses DirectX 11 (feature level 10) to render frames at 1280 x 720. The graph below presents the overall score for the workload across all the systems that are being compared.

UL 3DMark Cloud Gate Score

3DMark Sky Diver

The Sky Diver workload is meant for gaming notebooks and mid-range PCs, and uses DirectX 11 (feature level 11) to render frames at 1920 x 1080. The graph below presents the overall score for the workload across all the systems that are being compared.

UL 3DMark Sky Diver Score

3DMark Fire Strike Extreme

The Fire Strike benchmark has three workloads. The base version is meant for high-performance gaming PCs. Similar to Sky Diver, it uses DirectX 11 (feature level 11) to render frames at 1920 x 1080. The Ultra version targets 4K gaming system, and renders at 3840 x 2160. However, we only deal with the Extreme version in our benchmarking - It renders at 2560 x 1440, and targets multi-GPU systems and overclocked PCs. The graph below presents the overall score for the Fire Strike Extreme benchmark across all the systems that are being compared.

UL 3DMark Fire Strike Extreme Score

3DMark Time Spy

The Time Spy workload has two levels with different complexities. Both use DirectX 12 (feature level 11). However, the plain version targets high-performance gaming PCs with a 2560 x 1440 render resolution, while the Extreme version renders at 3840 x 2160 resolution. The graphs below present both numbers for all the systems that are being compared in this review.

UL 3DMark - Time Spy Workloads

3DMark Night Raid

The Night Raid workload is a DirectX 12 benchmark test. It is less demanding than Time Spy, and is optimized for integrated graphics. The graph below presents the overall score in this workload for different system configurations.

UL 3DMark Fire Strike Extreme Score

The 'low-end' benchmarks - Sky Diver, Cloud Gate, and Ice Storm - show the DDR4-2933 configuration faring better. These benchmarks represent workloads that the GTX 1060 should have no problem with (they are actually meant for integrated graphics), and the CPU / DRAM characteristics can influence the scores easily. In the GPU-limited cases, we see no variations between the different tested configurations.



SPECworkstation 3 Benchmark

The ASRock DeskMini GTX Z370 is not specifically marketed as a workstation. However, with the increasing popularity of workstation notebooks as well as SFF systems targeting the market, it makes sense to study the effects of SO-DIMM characteristics on the performance of typical workstation workloads.

The SPECworkstation 3 benchmark measures workstation performance based on a number of professional applications. It includes more than 140 tests based on 30 different workloads that exercise the CPU, graphics, I/O and memory hierarchy. These workloads fall into different categories.

  • Media and Entertainment (3D animation, rendering)
  • Product Development (CAD/CAM/CAE)
  • Life Sciences (medical, molecular)
  • Financial Services
  • Energy (oil and gas)
  • General Operations
  • GPU Compute

Individual scores are generated for each test and a composite score for each category is calculated based on a reference machine (HP Z240 tower workstation using an Intel E3-1240 v5 CPU, an AMD Radeon Pro WX3100 GPU, 16GB of DDR4-2133, and a SanDisk 512GB SSD). This composite score, also called as the SPEC Ratio, is presented in the graphs below for different categories.

One aspect to note here is that SPECworkstation 3 requires 190GB of free space on the disk from which it is run. The 240GB SSD in our configuration couldn't meet the requirements, and we had to use a 1TB SanDisk Extreme Pro NVMe 3D SSD for the benchmarks in this section.

SPECworkstation 3 - Media and Entertainment

SPECworkstation 3 - Product Development

SPECworkstation 3 - Life Sciences

SPECworkstation 3 - Financial Services

SPECworkstation 3 - Energy

SPECworkstation 3 - General Operations

SPECworkstation 3 - GPU Compute

The official SPECworkstation 3 run results for the different configurations are linked below.

A look at the detailed scores from the links above also shows that the only anomalous number for DDR4-3066 in the Life Sciences category is due to the WPCStorage component for lammps workload (which should ideally not be affected by the DRAM in the system). Other than that, the behavior seen in the previous benchmark sections is repeated here. Either the scores are bunched very close together, or, the DDR4-2933 configuration comes out on top.



Miscellaneous Performance Metrics

This section looks at some of the other commonly used benchmarks representative of the performance of specific real-world applications.

3D Rendering - CINEBENCH R15

We use CINEBENCH R15 for 3D rendering evaluation. The program provides three benchmark modes - OpenGL, single threaded and multi-threaded. Evaluation of different PC configurations in all three modes provided us the following results.

3D Rendering - CINEBENCH R15 - Single Thread

3D Rendering - CINEBENCH R15 - Multiple Threads

3D Rendering - CINEBENCH R15 - OpenGL

This benchmark is generally CPU-limited, and we do not see any significant benefit from moving to the higher speed grades, at least within the set of configurations that we tested.

x265 Benchmark

Next up, we have some video encoding benchmarks using x265 v2.8. The appropriate encoder executable is chosen based on the supported CPU features. In the first case, we encode 600 1080p YUV 4:2:0 frames into a 1080p30 HEVC Main-profile compatible video stream at 1 Mbps and record the average number of frames encoded per second.

Video Encoding - x265 - 1080p

Our second test case is 1200 4K YUV 4:2:0 frames getting encoded into a 4Kp60 HEVC Main10-profile video stream at 35 Mbps. The encoding FPS is recorded.

Video Encoding - x265 - 4K 10-bit

x265 is again a CPU-limited benchmark, and the memory speeds have negligible impact on the performance for our benchmarking encoding operations.

7-Zip

7-Zip is a very effective and efficient compression program, often beating out OpenCL accelerated commercial programs in benchmarks even while using just the CPU power. 7-Zip has a benchmarking program that provides tons of details regarding the underlying CPU's efficiency. In this subsection, we are interested in the compression and decompression rates when utilizing all the available threads for the LZMA algorithm.

7-Zip LZMA Compression Benchmark

7-Zip LZMA Decompression Benchmark

The 7-Zip compression benchmark is probably the best real-world representative of what a good DRAM configuration can deliver. The compression rate is highly dependent on the memory latency, and we see that the DDR4-2933 configuration (that had the best latency numbers in the AIDA64 Cache and Benchmarks testing) comes out on top. Decompression is CPU-limited, and the memory speeds don't impact it much.

Cryptography Benchmarks

Cryptography has become an indispensable part of our interaction with computing systems. Almost all modern systems have some sort of hardware-acceleration for making cryptographic operations faster and more power efficient. In this sub-section, we look at two different real-world applications that may make use of this acceleration.

BitLocker is a Windows features that encrypts entire disk volumes. While drives that offer encryption capabilities are dealt with using that feature, most legacy systems and external drives have to use the host system implementation. Windows has no direct benchmark for BitLocker. However, we cooked up a BitLocker operation sequence to determine the adeptness of the system at handling BitLocker operations. We start off with a 2.5GB RAM drive in which a 2GB VHD (virtual hard disk) is created. This VHD is then mounted, and BitLocker is enabled on the volume. Once the BitLocker encryption process gets done, BitLocker is disabled. This triggers a decryption process. The times taken to complete the encryption and decryption are recorded. This process is repeated 25 times, and the average of the last 20 iterations is graphed below.

BitLocker Encryption Benchmark

BitLocker Decryption Benchmark

Due to the use of a RAM drive in this benchmark, the usually CPU-speed limited cryptographic operations of BitLocker seem to favor the DDR-2933 configuration. We move on to a couple of other benchmarks to see if the RAM drive is indeed the cause for the significant gulf seen in the above graphs.

Creation of secure archives is best done through the use of AES-256 as the encryption method while password protecting ZIP files. We re-use the benchmark mode of 7-Zip to determine the AES256-CBC encryption and decryption rates using pure software as well as AES-NI. Note that the 7-Zip benchmark uses a 48KB buffer for this purpose.

7-Zip AES256-CBC Encryption Benchmark

7-Zip AES256-CBC Decryption Benchmark

Once the data is in the CPU cache, we see that the RAM has no impact on the cryptography operations.

Yet another cryptography application is secure network communication. OpenSSL can take advantage of the acceleration provided by the host system to make operations faster. It also has a benchmark mode that can use varying buffer sizes. We recorded the processing rate for a 8KB buffer using the hardware-accelerated AES256-CBC-HAC-SHA1 feature.

OpenSSL Encryption Benchmark

OpenSSL Decryption Benchmark

As expected, this benchmark also shows that there is nothing to gain in performance by moving to SO-DIMMs with better speeds or timing characteristics.

Agisoft Photoscan

Agisoft PhotoScan is a commercial program that converts 2D images into 3D point maps, meshes and textures. The program designers sent us a command line version in order to evaluate the efficiency of various systems that go under our review scanner. The command line version has two benchmark modes, one using the CPU and the other using both the CPU and GPU (via OpenCL). We present the results from our evaluation using the CPU mode only. The benchmark (v1.3) takes 84 photographs and does four stages of computation:

  • Stage 1: Align Photographs (capable of OpenCL acceleration)
  • Stage 2: Build Point Cloud (capable of OpenCL acceleration)
  • Stage 3: Build Mesh
  • Stage 4: Build Textures

We record the time taken for each stage. Since various elements of the software are single threaded, and others multithreaded, it is interesting to record the effects of CPU generations, speeds, number of cores, and DRAM parameters using this software.

Agisoft PhotoScan Benchmark - Stage 1

Agisoft PhotoScan Benchmark - Stage 2

Agisoft PhotoScan Benchmark - Stage 3

Agisoft PhotoScan Benchmark - Stage 4

The raw bandwidth provided by the DDR4-3066 configuration seems to work well for the Agisoft Photoscan workload. The latency doesn't seem to be much of a factor.

Dolphin Emulator

Wrapping up our application benchmark numbers is the new Dolphin Emulator (v5) benchmark mode results. This is again a test of the CPU capabilities.

Dolphin Emulator Benchmark

The memory characteristics don't seem to affect the benchmark much, though we see the DDR4-2933 configuration coming up with the best performance.



Final Words

Improvements in memory bandwidth and reductions in latency are the performance benefits provided by the usage of high-performance SO-DIMM kits. In the previous pages, we looked at the performance of four different SO-DIMM configurations in a SFF desktop PC. Power / energy consumption is one of the last remaining aspects to consider.

In addition to performance, idle power consumption is also an important metric – particularly for usage in battery-constrained systems such as workstation notebooks. The higher the DRAM operating frequency, the greater the power consumption. Some high-frequency kits use 1.35V (instead of the 1.2V standard), which tends to drive up the power consumption further. Fortunately, all our tested configurations had SO-DIMMs that operated at 1.2V.

To get a full view of the impact of different memory kits on this aspect, we monitored the power consumed at the wall by the ASRock DeskMini Z370 GTX with an idle desktop and averaged it over a duration of 1 hour.

Idle Power Consumption

As expected, the higher-frequency operating points consumer more power, but not by a huge margin. While considering the above graph, it is also helpful to keep in mind the energy consumption for typical workloads. Kits that help get work done faster may consume less energy compared to the kits that consume less power even during operation.

SYSmark 2018 - Overall Energy Consumption

Coming to the business end of the article, it is clear that the choice of memory is dictated by the expected workloads. For most common workloads, the differences between DDR4-2400 and DDR4-3066 are minimal. However, there are cases where the memory characteristics can play a significant role - 7-Zip compression is a real-world example where the kits with lower latencies deliver better compression rates.

For the average user who keeps the system idle most of the time, it is probably better to save money and have lower idle power in the long run. On the other hand, power users who keep their systems busy may want to opt for kits that deliver low latencies (and, in the case of some workloads, just higher speeds).

DRAM Options for the ASRock DeskMini GTX (Z370)
Memory Kit Tested Frequency
(MT/s)
Tested Timing
(tCL-tRCD-tRP-tRAS-tRFC)
Price
(USD)
Kingston HyperX Impact
HX432S20IB2K2/16
3066 20-22-22-42-560 $168
2933 17-19-19-39-514
Team Group T-Force Vulcan
TLRD416G2666HC18FDC-S01
2666 18-18-18-38-467 $130
Team Group T-Force Vulcan
TLRD416G2400HC15BDC-S01
2400 15-17-17-35-421 $130

Our tests have also shown that a higher speed-grade kit is not necessarily better – the HyperX Impact DDR4 operating at 2933 MT/s has the best latency numbers. It edges out the same kit operating at 3066 MT/s with different timing parameters in almost all of the benchmarks. Conversely, the Team Group T-Force Vulcan SO-DIMMs offer good value for money and can boast of better idle power characteristics.

Log in

Don't have an account? Sign up now