System Performance: Multi-Tasking

One of the key drivers of advancements in computing systems is multi-tasking. On mobile devices, this is quite lightweight - cases such as background email checks while the user is playing a mobile game are quite common. Towards optimizing user experience in those types of scenarios, mobile SoC manufacturers started integrating heterogenous CPU cores - some with high performance for demanding workloads, while others were frugal in terms of both power consumption / die area and performance. This trend is now slowly making its way into the desktop PC space.

Multi-tasking in typical PC usage is much more demanding compared to phones and tablets. Desktop OSes allow users to launch and utilize a large number of demanding programs simultaneously. Responsiveness is dictated largely by the OS scheduler allowing different tasks to move to the background. The processor is required to work closely with the OS thread scheduler to optimize performance in these cases. Keeping these aspects in mind, the evaluation of multi-tasking performance is an interesting subject to tackle.

We have augmented our systems benchmarking suite to quantitatively analyze the multi-tasking performance of various platforms. The evaluation involves triggering a ffmpeg transcoding task to transform 1716 3840x1714 frames encoded as a 24fps AVC video (Blender Project's 'Tears of Steel' 4K version) into a 1080p HEVC version in a loop. The transcoding rate is monitored continuously. One complete transcoding pass is allowed to complete before starting the first multi-tasking workload - the PCMark 10 Extended bench suite. A comparative view of the PCMark 10 scores for various scenarios is presented in the graphs below. Also available for concurrent viewing are scores in the normal case where the benchmark was processed without any concurrent load, and a graph presenting the loss in performance.

UL PCMark 10 Load Testing - Digital Content Creation Scores

UL PCMark 10 Load Testing - Productivity Scores

UL PCMark 10 Load Testing - Essentials Scores

UL PCMark 10 Load Testing - Gaming Scores

UL PCMark 10 Load Testing - Overall Scores

Addition of concurrent loading obviously reduces the PCMark 10 scores, but the relative ordering remain the same.

Following the completion of the PCMark 10 benchmark, a short delay is introduced prior to the processing of Principled Technologies WebXPRT4 on MS Edge. Similar to the PCMark 10 results presentation, the graph below show the scores recorded with the transcoding load active. Available for comparison are the dedicated CPU power scores and a measure of the performance loss.

Principled Technologies WebXPRT4 Load Testing Scores (MS Edge)

The relative performance numbers stay largely the same even in the presence of additional loading. Given that most of the other configurations are also based on the same hybrid processors technology, that is not a surprise.

The final workload tested as part of the multitasking evaluation routine is CINEBENCH R23.

3D Rendering - CINEBENCH R23 Load Testing - Single Thread Score

3D Rendering - CINEBENCH R23 Load Testing - Multiple Thread Score

While the relative performance numbers remain the same for the multi-threaded case, the Mind Premium systems suffer greatly in the single-threaded scenario - likely due to the limited power budget compared to other systems in the mix.

After the completion of all the workloads, we let the transcoding routine run to completion. The monitored transcoding rate throughout the above evaluation routine (in terms of frames per second) is graphed below.

ffmpeg Transcoding Rate and Processor Usage

Khadas Mind Premium (Core i7-1360P) ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 3 11.42 40
PCMark 10 0 10.33 36
WebXPRT 4 3 10.41 20
Cinebench R23 1 10.39 37
Transcode End Pass 3 11.35 39.5
Khadas Mind Premium + Mind Dock (Core i7-1360P) ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 2.5 11.44 37.5
PCMark 10 0 10.32 35
WebXPRT 4 2 10.51 20
Cinebench R23 1 10.46 34.5
Transcode End Pass 2.5 11.47 39.5

On the positive side, the drop in transcoding frame rate for the Khadas Mind configurations is not as heavy as what was seen for systems such as the ASRock 4X4-BOX-7735U. Overall, the RPL-P systems seem to prioritize foreground task better compared to AMD systems, but with the right power budget, the end user may not even notice the aspect (the Beelink GTR7 with its 65W Ryzen 7 7840HS has a much lower delta).

GPU Performance: Synthetic Benchmarks HTPC Credentials
Comments Locked

20 Comments

View All Comments

  • peterfares - Friday, September 15, 2023 - link

    This is really cool, but who is this for?
  • abufrejoval - Friday, September 15, 2023 - link

    I’d say it’s mostly for commuters, who’ll oscillate between two or more workplaces with high frequency and regularity, but don’t want to carry more than the “soul” of the computer with them.

    When I do that, I tend to make that a VM I keep on a high-speed USB stick and I then suspend the VM when I commute. Get’s the job done with a bit of overhead but in a smaller form factor but it means having a physical computer at every workplace and other compromises.

    Having a full-sized GPU dock on every location might be somewhat difficult in terms of budget, but a power primary and somewhat after-hours secondary, might be enough to satisfy a large part of the user base.

    Being able to just pick up the running machine right in the middle of something and then try catching a train or plane running might seem attractive, but Windows tends to glitch in far too many ways to make that realistic.

    I’ve had far too many Windows laptops being woken up from some powersave or even hibernation slumber in the middle of a flight, ostensibly for scheduled maintenance, only to then have them cook themselves and their battery to death for lack of cooling in the onboard luggage: I guess I should be glad they didn’t go as far as combusting, but generally I wound up without a working machine on the busy end of the trip…

    If you own a tiny home, operate in a boat, trailer or some other space constrained place this could be cool, but with an eye on longevity I’d not risk anything that wasn’t standards based and if TB isn’t enough, including dGPU, it’s really just tough luck.

    BTW, I do believe they offer TB and not just USB4, they just don’t have the certification done, because they do mention eGPU on their website.

    Pre-configured only: 32GB is certainly better than 16, even if I’d go for 64, especially at current prices for DRAM. The M.2 slots are only 30mm length so there wasn’t that much variety in terms of updates anyway, but that is currently changing because of these Steam console class devices. I guess the reason they won’t let you open the device is mostly to cut down on customer service issues, because there is just too many people out there who overestimate their technical skills and dexterity.

    I guess mostly it just inspired me into looking using a NUC for this commute style use case instead. They aren’t really that much bigger than this, especially if you don’t carry a power brick. Having to make sure they are properly hibernated, isn’t that much of an issue and they are far more economical.
  • brucethemoose - Friday, September 15, 2023 - link

    > 64-48-48-112 @ 5200

    Whoa, is this a typo?

    The timings seem awfully loose, like waay above default JEDEC.
  • meacupla - Friday, September 15, 2023 - link

    I can't even find what JEDEC specifies for LPDDR5
    Timings for LPDDR have always been looser than their regular counterparts.

    They clock higher at lower voltages, and the timings are loose as a result.
  • Kamen Rider Blade - Friday, September 15, 2023 - link

    So, instead of creating a "Proprietary Standard"?

    Why don't they use the existing PC/104 stacking Board standard that has been around for decades?
  • meacupla - Saturday, September 16, 2023 - link

    PC/104 is meant for internal only. It has exposed and unsupported pins, which makes it easy to bend the pins. It's fragile.
    This slot connector is a more robust design.

    Having said that, oculink and TB4 are plenty robust and have an existing market.
  • Kamen Rider Blade - Saturday, September 16, 2023 - link

    You do know that PC/104 has updated to PCIe/104 and uses PCIe connectors that are plenty strong. Version 3 of the spec has been ratified since Feb 17, 2015.

    Also it would be pretty easy for them to figure out how to create a base board to stack modules onto given the modular nature.

    They could've used a EPIC or EBC MoBo base board and stack modules on top.
  • meacupla - Saturday, September 16, 2023 - link

    not my fault you named the wrong spec
  • sjkpublic@gmail.com - Sunday, September 17, 2023 - link

    One main difference between this and other NUC's is the LPDDR5. This could have been a show stopper if they broke the 64GB barrier. Would consider a 128GB LPDDR5 memory version. Otherwise not much to see here.
  • xol - Wednesday, September 20, 2023 - link

    Polite reminder that "industrial design" means designing a product to be mass produced, not edgelord brushed titanium designed to convinced suckers a product is worth twice its equivalent value

Log in

Don't have an account? Sign up now