Battlefield 4 Mantle Performance Preview
by Ryan Smith on February 1, 2014 12:40 PM ESTAfter a false start or two, AMD is finally getting the first beta of Mantle out the door. With EA DICE having shipped their Mantle patch for Battlefield 4 and developer Oxide having released their Star Swarm technical demo, the first Mantle-enabled applications have landed. Meanwhile AMD for their part is still hammering out an installation issue on their new Mantle-enabled Catalyst drivers, which has led to them missing their previously scheduled January release date.
In the interim, AMD has released a slightly finickier set of drivers to the press for us to play around with ahead of the public Mantle driver release. These drivers should be functionally and performance identical to the public drivers, they just have an outstanding installation bug that requires a workaround, something that AMD doesn’t want in the shipping version. AMD hasn’t provided a public release date for these drivers – at this point it’s in their best interest to avoid providing release dates they don’t know if they can keep – but given the fact that this is the sole showstopper issue in our press drivers, we certainly don’t expect they’ll take much longer.
In any case, we’re hard at work at the moment putting together our full evaluation of this first version of Mantle. That article won’t be ready until next week, but in the meantime given the immense interest in Mantle, we wanted to quickly publish our first batch of numbers for Battlefield 4. We will have a much wider selection of benchmarks for our full article, including many more video cards and results for Star Swarm, but we wanted to quickly bring you what’s almost certainly going to be the most interesting set of data: Mantle performance with a high-end video card.
For that we’re turning to AMD’s Radeon R9 290X, testing the performance of that card under both Direct3D and Mantle in EA’s Battlefield 4. Battlefield 4 is Mantle’s showcase title and accordingly the first real world use case for AMD’s new API, making it the best place to start. As an application retrofitted with Mantle support we don’t expect Battlefield 4 to tap the complete potential of Mantle right out of the door – certainly not when the Mantle SDK and driver stack itself is still in development – but it can give us an idea of what kind of performance gains we can expect if developers chase the low-hanging fruit offered by Mantle.
What is that low-hanging fruit? For the most part that is going to be CPU bottlenecks, specifically bottlenecking in issuing draw calls. Of all of the bottlenecks that can impact a high performance GPU, keeping it fed can be the biggest bottleneck, and in turn bottlenecking in the draw call submission phase can be the biggest culprit. In the long term Mantle will also benefit GPU performance more directly by optimizing workflows within a GPU, and we already see a small bit of that today in Battlefield 4, but the bulk of the optimizations for these earliest titles have been made around the draw call bottleneck.
For our Mantle preview we’re taking a look at two sections of the Battlefield 4 single player game, the first being from the Tashgar mission and the second being from the South China Sea mission. As was the case with Battlefield 3 the use of single player is less than ideal, but as Battlefield 4 lacks a formal benchmark or for that matter the ability to record multiplayer matches, we’re left with single player if we want to have reasonably repeatable benchmarks. And we’ll definitely want a high degree of repeatability if we’re to be able to distinguish Mantle gains from variability in GPU bound scenarios.
Meanwhile to cover a wider spectrum of possibilities, we’re running our 290X against 3 CPU configurations on our GPU testbed. The first of which is our standard configuration, which is our i7-4960X with all cores and HypterThreading enabled (6C/12T), running at 4.2GHz. Our second configuration drops that down to 4C/4T at 2GHz, to test for the benefits of Mantle on a still relatively large core count at lower clockspeeds. Our final configuration takes the core count down further, to 2C/4T at 3GHz, so that we can see what performance is like for processors with fewer cores but higher clockspeeds.
Finally, on a quick note, for measuring Battlefield 4's performance we're using the game's newly built in PerfOverlay.FrameFileLogEnable feature, which replaces FRAPS in this game due to the fact that FRAPS only works with Direct3D and OpenGL. FrameFileLogEnable logs frame times for later analysis, and from this we can reconstruct the minimum and average framerates, and even the full frame pacing performance of the game (but only from the perspective of the game, not the video card). Today we'll be looking at just the average framerates, but be sure to come back next week for our full evaluation, where we'll have frame pacing data and minimum framerates ready to go.
CPU: | Intel Core i7-4960X @ 4.2GHz |
Motherboard: | ASRock Fatal1ty X79 Professional |
Power Supply: | Corsair AX1200i |
Hard Disk: | Samsung SSD 840 EVO (750GB) |
Memory: | G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26) |
Case: | NZXT Phantom 630 Windowed Edition |
Monitor: | Sharp PN-K321 |
Video Cards: | AMD Radeon R9 290X (Uber) |
Video Drivers: | AMD Catalyst 14.1 Beta |
OS: | Windows 8.1 Pro |
SP-Tashgar
Our first test comes from the Tashgar mission, and is the benchmark we will be using for day-to-day GPU benchmarking. This benchmark takes place immediately at the start of the mission, with our character driving out of the mountains and into the city of Tashgar. This benchmark has a limited CPU load and is GPU-bound in most situations, which potentially limits the benefits of Mantle in alleviating CPU bottlenecks, but gives us an idea of what kind of performance benefits we can expect in GPU-bound scenarios.
Battlefield 4 Tashgar: Mantle Performance Gains | |||||
Ultra | High | Low | |||
i7-4960X 6C/12T @ 4.2GHz | 8% | 10% | -14% | ||
i7-4960X 4C/4T @ 2GHz | 8% | 13% | 26% | ||
i7-4960X 2C/4T @ 3GHz | 8% | 13% | 28% |
Even at 1080p Ultra, where the Radeon R9 290X is clearly GPU-bound, we can see that switching to Mantle offers some performance improvements. With our i7-4960X fully powered up, this leads to an 8% performance increase, and we see similar performance increases even with other CPU configurations. Since we don’t appear to be CPU-bound in any appreciable way, this gives us a decent idea of what kind of GPU performance benefits Mantle can offer.
Meanwhile if we switch to High and Low settings, the higher framerates are able to tease out the CPU benefits of Mantle. With BF4’s High settings this is 10-13% depending on the CPU configuration, which indicates we’re still significantly GPU bound here.
Using Low quality settings on the other hand significantly widens the gap in both directions, with the minimum gain being -14%, and the maximum gain being 28%. In the case of our 6C/12T CPU configuration, Mantle actually has a detrimental impact on performance, bringing down our framerate from a positively absurd 216fps to a slightly less absurd 181fps. This was unexpected to say the least, and while we’re not particularly concerned about it given the fact that we have little reason to use this setting in day-to-day gaming, but it does point to a weakness in the current builds of BF4 and the Mantle drivers.
Otherwise if we move to our slower CPU configurations, the benefits are 26% and 28% for 4C/4T and 2C/4T respectively. Despite the fact that the 4C/4T setup has more real cores to work with, which under normal circumstances would be the stronger setup for a highly threaded application, it’s the 2C/4T setup that technically benefits the most. The difference is quite small, but it’s an interesting outcome none the less.
SP-South China Sea
Our second test comes from the South China Sea mission of Battlefield 4, where our character and his squad are on the quickly disintegrating USS Titan. Whereas our first test is rather uniformly GPU-bound, the breakup of the USS Titan offers us the chance to look at a more CPU-bound scenario. Even this scene isn’t exclusively CPU-bound, but with ship parts and other debris flying around everywhere, it’s going to be one of the more strenuous CPU workloads in the single player game.
Battlefield 4 South China Sea: Mantle Performance Gains | |||||
Ultra | High | Low | |||
i7-4960X 6C/12T @ 4.2GHz | 7% | 8% | 7% | ||
i7-4960X 4C/4T @ 2GHz | 10% | 26% | 17% | ||
i7-4960X 2C/4T @ 3GHz | 10% | 30% | 28% |
Starting once again at 1080p Ultra, even with the greater CPU workload presented by this test, we are unsurprisingly still GPU-bound on Ultra settings. The benefits aren’t as uniform as last time – they now range from 7% to 10% – but it’s safe to say that we’re once again seeing what are mostly the GPU performance benefits of Mantle.
However shifting to High quality shows much greater performance gains, indicating that we’re at least partially (if not fully) CPU-bound here. Once we reduce our CPU performance from 6C/12T to 4C/4T, the performance gains from using Mantle jump from 8% to 26%, and then to 30% when using our 2C/4T configuration. For a game that’s not immensely CPU bound in the first place and has been retrofitted for Mantle, this is towards the upper bound of what we would expect.
Finally switching over to our Low quality settings causes our performance gains to actually taper off some. We’re still CPU-bound on our 4C/4T setup leading to a 17% performance gain, but we’re not as CPU-bound as we were at High quality settings, apparently. Meanwhile the performance gains for 2C/4T remain similar to last time, at 28%. Battlefield 4 has multiple CPU tasks going on here, not the least of which is the simulation itself, so in the case of our 4C/4T setups it’s likely we’ve stumbled onto a situation where the game is more strongly CPU-bound by the simulation and other aspects of the game than it is the submission of draw calls.
First Thoughts
As this is only a brief preview of our results we don’t intend to read too much into this limited data set, but even just looking at the 290X does provide us with some interesting data. For the pure high-end scenario – a 290X or similar GPU with a high-end CPU – Mantle can still offer performance benefits from the GPU workflow optimizations it provides. A 7-10% performance increase is not a dramatic difference, but it is 7-10% better performance than AMD had yesterday.
Meanwhile it comes as little surprise that the greatest performance benefits in our limited BF4 testing come in the mixed performance scenarios, pairing up a high-end GPU with slower CPUs. Since the lowest hanging fruit for Mantle optimizations is going to be CPU draw call bottlenecks, it’s going to be the weaker CPUs that have the most to gain here. In this case we still need to go out of our way to create CPU-bound scenarios – the 290X is rarely held back by the CPU on Ultra quality settings – but when we do create them we can see some of potential that Mantle can offer. At High and Low quality settings, and excluding our one Mantle performance regression, we see performance gains anywhere between 7% and 30%. This shows (if nothing else) that even a retrofit game with a highly optimized Direct3D rendering path, like Battlefield, can still be bottlenecked by draw call performance. And that consequently some of Mantle’s CPU overhead reduction capabilities do in fact pan out.
As for whether all of this is worth the costs and tradeoffs of Mantle from both a consumer perspective and a developer perspective is a longer discussion that we’ll be having next week, alongside our expanded benchmark results. But at first glance it looks like AMD has cleared the first hurdle, which is showcasing that there are tangible benefits to having a low-level graphics API. Now AMD just needs to further hammer out their Mantle drivers and get them into a public-consumable state, so that the wider community of end-users can test and evaluate AMD’s Mantle offering. Outside of the known installation issue we have not encountered any issues with Mantle thus far – this being despite the fact that AMD is being very explicit about the beta nature of the Mantle stack – so hopefully this is a good omen for the company after the delays leading up to this point.
AMD's Official Performance Data
Finally, we’ll quickly close with some of AMD’s performance numbers, which they’ve published in their reviewer’s guide. We feel that vendor-provided should always be taken with a grain of salt, but they do serve their purpose, especially for getting an idea of what performance is like under a best case scenario. To that end we can quickly see that AMD was able to top out at a 41% performance improvement on a 290X paired with an A10-7700K. This is a greater performance gain than the peak gain of 30% we’ve seen in our own results, but not immensely so. More importantly it can give us a good idea of what to reasonably expect for performance under Battlefield 4. If AMD’s results are accurate, then a 40% performance improvement is the most we should be expecting out of Battlefield 4’s Mantle renderer.
135 Comments
View All Comments
Plesman - Saturday, February 1, 2014 - link
Objectively: I think it was EA who caused the delay. They wanted to fix the other bugs in Battlefield 4, before they wanted to finish implementing Mantle.Objectively: Everybody promises the biggest "up to" performance boosts all of the time don't they?
TheJian - Saturday, February 1, 2014 - link
No, AMD's beta driver came 2nd, the BF4 Mantle patch was first. And since it's STILL a beta driver we're still waiting on AMD's finished product here right?I'm unimpressed. I have no intention of buying anything less than quad core Intel today, and a high end card on top so would yield about what every driver rev gives in games.
Johan Andersson said they worked for 2years with AMD on Mantle for Frostibe3, so I doubt many will use this if AMD isn't paying them. Such a small market, and earns you NOTHING above everyone else. You can't charge more for an AMD game buyer vs. everyone else. You are essentially coding to speed up some people that won't give you a dime. Maybe if they start selling Mantle patches for games for $5-10 we'll see it used. AMD can't afford $8mil per game, or even $2mil.
I'm guessing Nvidia is prepping their response to AMD's proprietary Mantle as we speak. I don't think they want it used really, but it's what you'd do to get Mantle killed. Just put something out there so nobody wants to code for either then let them both get killed. We need games in OpenGL, WebGL, HTML5, Java etc that are portable to everywhere in days or weeks. Not MONTHS or years. Making anything Mantle is the same as DirectX. It isn't easy to port from either to anywhere else. Nobody wants to code for DirectX (xbox1), Mantle, then to OpenGL (ps3/ps4), then to whatever on mobile, then port to PC (a game made on consoles is hard to port from Directx, but easier from PS3/PS4 OpenGL now that K1 is coming out for mobile).
Any time wasted on Mantle junk, takes away from the GAME itself. Think about it. If you could possibly code ONCE (say openGL) and run everywhere you get a ton more sales, and have a lot more resources to say, making a 30hr game vs. today's 7-15hr games. You might have the money to put into a decent AI for your game etc.
Now that steam will be pushing linux, you have yet another reason to use OpenGL etc that can port easily to anywhere. This was a waste of money by AMD. They should have spent this on avoiding the phase1,2,3 (more?) drivers this year and last, and on making better GPU's (290/290x was a botched launch and hardocp says this chip wasn't really designed for 28nm and needs 20) and also enhancing the CPU's. Instead we got console crap (which also stole from gpu/cpu/drivers) and Mantle that they just can't afford to support, along with Free-Sync which may never even come out as AMD even said it isn't a product for market yet, if ever.
Inteli - Saturday, February 1, 2014 - link
Last I checked, AMD offered Mantle to Nvidia for use. It really isn't proprietary.chizow - Saturday, February 1, 2014 - link
It's proprietary, the SDK and source code isn't available for download and AMD has said it might be open to releasing/licensing it to other IHVs at some point in the future.AMD's talk about opening up to other IHVs is just lip service so they don't sound hypocritical after beating the Open standard drum for so long and to help alleviate industry concerns by likening it to an "open standard".
rarson - Sunday, February 2, 2014 - link
Mantle can run on anyone's hardware, it's just that no one else has an architecture design like GCN that would actually support it. It's the same reason that Mantle doesn't run on AMD's older GPUs.chizow - Sunday, February 2, 2014 - link
So in other words, it's proprietary to GCN, just as I stated.Mstngs351 - Sunday, February 2, 2014 - link
Andy why wouldn't they offer it to Nvidia to use? It would be like handing someone who only reads/writes English an instruction manual written in German. They cant use it.If Mantle offered the widespread support of DX or Ogl then I'd be more excited. Sadly I fear that with only moderate gains (when you need them) it will fall by the wayside like other fast API's already have. I suppose if it's build is close enough to console API's then we might see enough support (developers tend to do the least work they can get away with) for it to be successful.
Mstngs351 - Sunday, February 2, 2014 - link
Andy should have been "And"... Please tell me your name is Andy, because that would likely be the most serendipitous moment of my day!daniel142005 - Saturday, February 1, 2014 - link
Can OpenGL implement these optimizations? If it's abstracted by a good graphics engine/api then it wouldn't affect the game designer. Frostbite is a full blown game "engine" for game designers, but if they can do it why can't it be implemented into OpenGL for game developers?jwcalla - Saturday, February 1, 2014 - link
John Carmack made the claim that you can get similar (to Mantle) draw call performance in OpenGL with certain NVIDIA extensions... FWIW.One of the problems for developers is that there's a lot of pressure to target the lowest common denominator. Even in OpenGL we see this; e.g., targeting something like OpenGL 3.x, etc.