Today Futuremark is pulling the covers off of their new Time Spy benchmark, which is being released today for all Windows editions of 3DMark. A showcase of sorts of the last decade or so of 3DMark benchmarks, Time Spy is a modern DirectX 12 benchmark implementing a number of the API's important features. All of this comes together in a demanding test for those who think their GPU hasn’t earned its keep yet.

DirectX 12 support for game engines has been coming along for a few months now. To join in the fray Futuremark has written the Time Spy benchmark on top of a pure DirectX 12 engine. This brings features such as asynchronous compute, explicit multi-adapter, and of course multi-threading/multi-core work submission improvements. All of this comes together into what I think is not only visually interesting, but also borrows a large number of gaming assets from benchmarks of 3DMarks past.

For those who haven’t been following the 3DMark franchise for more than a decade, there are portions of the prior benchmarks showcased as shrunken museum exhibits. These exhibits come to life as the titular Time Spy wanders the hall, giving a throwback to past demos. I must admit a bit of fun was had watching to see what I recognized. I personally couldn’t spot anything older than 3DMark 2005, but I would be interested in hearing about anything I missed.

Unlike many of the benchmarks exhibited in this museum, the entirety of this benchmark takes place in the same environment. Fortunately, the large variety of eye candy present gives a varied backdrop for the tests presented. To add story in, we see a crystalline ivy entangled with the entire museum. In parts of the exhibit there are deceased in orange hazmat suits demonstrating signs of a previous struggle. Meanwhile, the Time Spy examines the museum with a handheld time portal. Through said portal she can view a bright and clean museum, and view bustling air traffic outside. I’ll not spoil the entire brief story here, but the benchmark makes good work of providing both eye candy for the newcomers and tributes for the enthusiasts that will spend ample time watching the events unroll.

From a technical perspective, this benchmark is, as you might imagine, designed to be the successor to Fire Strike. The system requirements are higher than ever, and while Fire Strike Ultra could run at 4K, 1440p is enough to bring even the latest cards to their knees with Time Spy.

Under the hood, the engine only makes use of FL 11_0 features, which means it can run on video cards as far back as GeForce GTX 680 and Radeon HD 7970. At the same time it doesn't use any of the features from the newer feature levels, so while it ensures a consistent test between all cards, it doesn't push the very newest graphics features such as conservative rasterization.

That said, Futuremark has definitely set out to make full use of FL 11_0. Futuremark has published an excellent technical guide for the benchmark, which should go live at the same time as this article, so I won't recap it verbatim. But in brief, everything from asynchronous compute to resource heaps get used. In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that "the asynchronous compute workload per frame varies between 10-20%." On the work submission front, they're making full use of multi-threaded command queue submission, noting that every logical core in a system is used to submit work.

Meanwhile on the multi-GPU front, Time Spy is also mGPU capable. Futuremark is essentially meeting the GPUs half-way here, using DX12 explicit multi-adapter's linked-node mode. Linked-node mode is designed for matching GPUs - so there isn't any Ashes-style wacky heterogeneous configurations supported here - trading off some of the fine-grained power of explicit multi-adapter for the simplicity of matching GPUs and useful features that can only be done with matching GPUs such as cross-node resource sharing. For their mGPU implementation Futuremark is using otherwise common AFR, which for a non-interactive demo should offer the best performance.

3DMark Time Spy Benchmark: 1440p

3DMark Time Spy Benchmark: 1440p

To take a quick look at the benchmark, we ran the full test on a small number of cards on the default 1440p setting. In our previous testing AMD’s RX 480 and R9 390 traded blows with each other and NVIDIA’s GTX 970. Here though, the RX 480 pulls a small lead over the R9 390 while they both leave a slightly larger gap ahead of the GTX 970. Only to then see the GeForce GTX 1070 appropriately zip past the lot of them.

The graphics tests scale similarly to the overall score in this case, and if these tests were a real game anything less than the GTX 1070 would provide a poor gameplay experience with framerates under 30 fps. While we didn’t get any 4K numbers off our test bench, I ran a GTX 1080 in my personal rig (i7-2600k @4.2GHz) and saw 4K scores that were about half of my 1440p scores. While this is a synthetic test, the graphical demands this benchmark can place on a system will provide a plenty hefty workload for any seeking it out.

Meanwhile, for the Advanced and Professional versions of the benchmark there's an interesting ability to run it with async compute disabled. Since this is one of the only pieces of software out right now that can use async on Pascal GPUs, I went ahead and quickly ran the graphics test on the GTX 1070 and RX 480. It's not an apples-to-apples comparison in that they have much different performance levels, but for now it's the best look we can take at async on Pascal.

3DMark Time Spy Benchmark: Async Compute

Both cards pick up 300-400 points in score. On a relative basis this is a 10.8% gain for the RX 480, and a 5.4% gain for the GTX 1070. Though whenever working with async, I should note that the primary performance benefit as implemented in Time Spy is via concurrency, so everything here is dependent on a game having additional work to submit and a GPU having execution bubbles to fill.

The new Time Spy test will be coming today to Windows users of 3DMark. This walk down memory lane not only puts demands on the latest gaming hardware but also provides another showcase of the benefits DX12 can bring to our games. To anyone who’s found FireStrike too easy of a benchmark, keep an eye out for Time Spy in the near future.

Comments Locked

75 Comments

View All Comments

  • Ryan Smith - Thursday, July 14, 2016 - link

    "Wait, isn't Nvidia doing async, just via pre-emption? "

    No. They are doing async - or rather, concurrency - just as AMD does. Work from multiple tasks is being executed on GTX 1070's various SMs at the same time.

    Pre-emption, though a function of async compute, is not concurrency, and is best not discussed in the same context. It's not what you use to get concurrency.
  • bcronce - Friday, July 15, 2016 - link

    Nvidia is doing inter-SM concurrency, AMD support intra-SM concurrency. AMD can do single clock-cycle context switching to fill in pipeline holes in the SMs. This comes as a transistor cost and is only a benefit if there are a lot of "holes".

    My limited understanding is that the graphics pipeline is riddled with holes, leaving an average of 10%-30% of untapped compute power even if all of the SMs are in use. AMD's async allows their compute engine to fill in these holes.

    Based on how competitive Nvidia is, the nature of graphics processing may better benefit from focusing on making the holes smaller rather than filling them. But it's hard to tell if one architecture or the other is better or if it's the game engines of the drivers.
  • Scali - Friday, July 15, 2016 - link

    "AMD can do single clock-cycle context switching to fill in pipeline holes in the SMs."

    Pascal can do this as well.

    "My limited understanding is that the graphics pipeline is riddled with holes, leaving an average of 10%-30% of untapped compute power even if all of the SMs are in use."

    All signs point to nVidia having better efficiency than AMD does. If you look at hardware with the same TFLOPS rating, you'll find that nVidia's hardware delivers significantly better performance than AMD.
    Which would imply that nVidia has less 'holes' to begin with... Which implies that async compute may not get the same gains on their hardware as AMD would.
  • powerarmour - Thursday, July 14, 2016 - link

    Does look slightly odd, especially considering the recent Doom/Vulkan numbers.
  • Eden-K121D - Thursday, July 14, 2016 - link

    BTW Doom is a real world scenario instead of being a benchmark
  • donkay - Thursday, July 14, 2016 - link

    Keep in mind that AMD opengl drivers are notoriously bad. So Vulkan bringing a big boost wasn't that huge of a surprise. Dx 11 and Dx 12 vs Vulkan is an entirely different story.
  • powerarmour - Thursday, July 14, 2016 - link

    AMD's OpenGL performance has nothing to do with their Vulkan results, compare Vulkan to Vulkan, not OpenGL to Vulkan.

    I'd be very surprised if when fully optimised, DX12 was any faster than Vulkan in a real world scenario.
  • donkay - Thursday, July 14, 2016 - link

    You said Doom/Vulkan numbers. Comparing Doom is comparing OpenGL and Vulkan performance, so obviously OpenGL optimization is a huge part of how big those numbers were, which is why I find it silly people even compare it to this benchmark with async on/off. Both vulkan and dx12 have way more optimizations than just async, Doom vulkan async on vs off numbers are very close to what happens in this bench.
  • powerarmour - Thursday, July 14, 2016 - link

    Vulkan has been built from the ground up as a new API, I still don't get your point?
  • donkay - Thursday, July 14, 2016 - link

    I don't get your point saying that these numbers look odd considering the Doom/Vulkan numbers. For the reasons I just explained.

Log in

Don't have an account? Sign up now