The Snapdragon 855 Performance Preview: Setting the Stage for Flagship Android 2019
by Andrei Frumusanu on January 15, 2019 8:00 AM EST- Posted in
- Mobile
- Qualcomm
- Smartphones
- SoCs
- 7nm
- Snapdragon 855
Inference Performance: Good, But Missing Tensor APIs
Beyond CPU and GPU, the one aspect of the Snapdragon 855 that Qualcomm made a lot of noise about is the new Hexagon 690 accelerator block.
The new unit doubles its vector pipelines, essentially doubling performance for traditional image processing tasks as well as machine inferencing workloads. Most importantly, Qualcomm now includes a dedicated “Tensor Accelerator” block which promises to even better offload inferencing tasks.
I’ve queried Qualcomm about the new Tensor Accelerator, and got some interesting answers. First of all- Qualcomm isn’t willing to disclose more about the performance of this IP block; the company had advertised a total of “7 TOPS” computing power on the part of the platform, but they would not dissect this figure and attribute it individually to each IP block.
What was actually most surprising however was the API situation for the new Tensor accelerator. Unfortunately, the block will not be exposed to the NNAPI until sometime later in the year for Android Q, and for the time being the accelerator is only exposed via in-house frameworks. What this means is that none of our very limited set of “AI” benchmarks is able to actually test the Tensor block, and most of what we’re going to see in terms of results are merely improvements on the side of the Hexagon’s vector cores.
Inference Performance
First off, we start off with “AiBenchmark” – we first starred the new workload in our Mate 20 review, to quote myself:
“AI-Benchmark” is a new tool developed by Andrey Ignatov from the Computer Vision Lab at ETH Zürich in Switzerland. The new benchmark application, is as far as I’m aware, one of the first to make extensive use of Android’s new NNAPI, rather than relying on each SoC vendor’s own SDK tools and APIs. This is an important distinction to AIMark, as AI-Benchmark should be better able to accurately represent the resulting NN performance as expected from an application which uses the NNAPI.
Andrey extensive documents the workloads such as the NN models used as well as what their function is, and has also published a paper on his methods and findings.
One thing to keep in mind, is that the NNAPI isn’t just some universal translation layer that is able to magically run a neural network model on an NPU, but the API as well as the SoC vendor’s underlying driver must be able to support the exposed functions and be able to run this on the IP block. The distinction here lies between models which use features that are to date not yet supported by the NNAPI, and thus have to fall back to a CPU implementation, and models which can be hardware accelerated and operate on quantized INT8 or FP16 data. There’s also models relying on FP32 data, and here again depending on the underlying driver this can be either run on the CPU or for example on the GPU.
In the first set of workloads which I’ve categorised by being run on the CPU, we see the Snapdragon 855 perform well, although it’s not exactly extraordinary. Performance here is much more impacted by the scheduler of the system and exactly how fast the CPU is allowed to get to its maximum operating performance point, as the workload is of a short burst nature.
Moving onto the 8-bit integer quantised models, these are for most devices hardware accelerated. The Snapdragon 855’s performance here is leading in all benchmarks. In the Pioneers benchmark we more clearly see the doubling of the performance of the HVX units as the new hardware posts inference times little under half the time of the Snapdragon 845.
The Cartoons benchmark here is interesting as it showcases the API and driver aspect of NNAPI benchmarks: The Snapdragon 855 here seems to have massively better acceleration compared to its predecessors and competing devices. It might be that Qualcomm has notably improved its drivers here and is much better able to take advantage of the hardware, compared to the past chipset.
The FP16 workloads finally see some competition for Qualcomm as the Kirin’s NPU exposes support for its hardware here. Qualcomm should be running these workloads on the GPU, and here we see massive gains as the new platform’s NNAPI capability is much more mature.
The FP32 workload sees a similar improvement for the Snapdragon 855; here Qualcomm finally is able to take full advantage of GPU acceleration which gives the new chipset a considerable lead.
AIMark
Alongside AIBenchmark, it still might be useful to have comparisons with AIMark. This benchmark rather than using NNAPI, uses Qualcomm’s SNPE framework for acceleration. Also this gives us a rare comparison against Apple’s iPhones where the benchmark makes use of CoreML for acceleration.
Overall, the Snapdragon 855 is able to post 2.5-3x performance boosts over the Snapdragon 845.
At the event, Qualcomm also showcased an in-house benchmark running InceptionV3 which was accelerated by both the HVX units as well as the new Tensor block. Here the phone was able to achieve 148 inferences/s – which although maybe apples to oranges, represents a 26% boost compared to the same model run in AIMark.
Overall, even though the Tensor accelerator wasn’t directly tested in today’s benchmark results, the Snapdragon 855’s inference performance is outstanding due to the overall much improved driver stack as well as the doubling of the Hexagon’s vector execution units. It will be interesting to see what vendors do with this performance and we should definitely see some exciting camera applications in the future.
132 Comments
View All Comments
Spunjji - Wednesday, January 16, 2019 - link
Sorry, but that's just not true. I have yet to use a phone that feels consistently faster than the OnePlus 6 I'm currently using as a daily driver, and I've done a whole bunch of messing with custom ROMs / kernels, starting back with Cyanogenmod 6 on a Dell Streak.gijames1225 - Tuesday, January 15, 2019 - link
Sounds very positive given that phones already perform great at the flagship level. The single core improvement is greatly welcomed given how much that matters for javascript.fred666 - Tuesday, January 15, 2019 - link
I like their performance over time graph on page 1.It shows the 855 to be faster than the 845, which is faster than the 835, which is slower than the 820. What? Their performance dropped in that generation?
yeeeeman - Wednesday, January 16, 2019 - link
Yes. In floating point, the SD820 based on their own custom cores (built on an evolution of Krait cores called Kryo) was much better than everything, including next gen SD835 which used an IP from ARM the cortex A72.fred666 - Wednesday, January 16, 2019 - link
so it pretty much means their graph is worthless. Floating point should not be the primary indicator of performance, integers are much more used by most popular use casesSpunjji - Wednesday, January 16, 2019 - link
He didn't say the graph shows FP performance, he just mentioned that 820 was unusually strong in that area. My guess is it's a representation of overall performance based on some or other standard benchmark. That doesn't make it "worthless", because it's literally only there to show a rough comparison between historical chipsets.cpkennit83 - Thursday, January 17, 2019 - link
Actually it was the A73. The A72 is actually stronger in fp but slower in integer workloadsstennan - Tuesday, January 15, 2019 - link
Please do a podcast soon. There has been so much going on with pc Cpu/gpu and now incoming mobile cpu that I miss having the anandtech deep dive!melgross - Tuesday, January 15, 2019 - link
Well, it’s all very interesting, but still the elephant in the room is Apple’s A series, no matter what. Take that out, and the 855 and 980 are excellent chips, but with it in, they are just mediocre.cpkennit83 - Tuesday, January 15, 2019 - link
They are excellent chips no matter what. A12 big cores are twice as large or more than a76 cores.No android Oem is willing to pay a big premium for their flagship socs, so the qualcomms and huaweis of the world don't pressure arm to spend the big $$ needed to fund the development of truly wide cores. The only one who seems interested in going big is Samsung, but they can't get their act together.
Still performance is more than adecuate in the a76 flagship SOCS, and efficiency is slightly better than a12, so for me this generation is the best in the android space since the SD800.