06:04PM EDT - This is the first talk on edge computing

06:05PM EDT - Xuantie-910 of Alibaba

06:06PM EDT - Innovating Cloud and Edge Computing by RISC-V

06:06PM EDT - Xuantie refers to a heavy sword from Chinese folklore made of Iron

06:07PM EDT - T-Head semiconductor - a young Alibaba organization specializing in circuit design specialising next gen compute for various areas with a strong commitment to Open Source

06:08PM EDT - RISC-V is very attractive for the IoT era

06:08PM EDT - Extensibility and modularity allows for customization for the domain specific workloads

06:09PM EDT - RISC-V Mainline platform in Linux, fully supported in AlibabaOS

06:09PM EDT - Xuantie goal is to contribute to the oepn source community

06:09PM EDT - AI Vector Engine

06:10PM EDT - Similar in performance to Arm 73

06:10PM EDT - Xuantie-902 (M0+ like) with hardware TEE up to Xuantie-910

06:10PM EDT - 903, 907,908 coming

06:11PM EDT - 4 cores per cluster in 910

06:11PM EDT - HMP cluster

06:11PM EDT - Each core supports 32-64 KB L1 D and 32-64 KB L1 I

06:11PM EDT - Each single core is 3-decode 8-issue OoO

06:11PM EDT - Hybrid branch predictor

06:11PM EDT - vector engine

06:12PM EDT - One of the first commercial processors to use RISC-V vector extension proposals

06:12PM EDT - Performance on Coremark 7.1 per MHz. This workload is a full cache hit only

06:13PM EDT - Highest performance RISC-V on market now

06:13PM EDT - SiFive has U84 processor which might be higher performance, but no details of yet

06:13PM EDT - waiting for info to become available

06:13PM EDT - X910 supports RISC-V 0.7.1 Vector Extension

06:13PM EDT - FP16-FP64, INT8-INT64

06:14PM EDT - MMX, Clint, PPC

06:14PM EDT - MMU*

06:14PM EDT - Supports unaligned memory data access

06:14PM EDT - Supports custom extensions

06:14PM EDT - RISC-V Turbo extensions

06:15PM EDT - bit operations, memory access, core sync

06:15PM EDT - Can be disabled to be completely compatible with RISC-V

06:15PM EDT - but Alibaba toolchain can use the new instructions

06:16PM EDT - Two vector pipes, 1 ALU/MUL, 1 ALU/DIV, 1 Branch, 1 dual issue Load/Store units

06:16PM EDT - 128-bit instruction fetch unit

06:16PM EDT - can fetch 8 instructions at once

06:17PM EDT - Hybrid multi-mode branch prediction

06:17PM EDT - Cache Way prediction

06:17PM EDT - Loop accelerator

06:18PM EDT - Can do one load and one store in parallel

06:18PM EDT - 3-cycle load-to-use

06:19PM EDT - Unique multi-mode and multi-stream prefetch mode for RISC-V by pattern matching and backfills the L1/L2 cache

06:19PM EDT - 4 cores per cluster, up to 4 clusters

06:20PM EDT - All Clusters shares L2, up to 8MB

06:20PM EDT - Two 128-bit Vector ALU ops/cycle

06:21PM EDT - More than 300 GFLOPs FP16 per cluster (32 FLOPs/core/cycle x 2.5 GHz x 4-cores)

06:21PM EDT - FP32 perf is 0.5x FP16

06:21PM EDT - So 150 GFLOP of FP32 per cluster - up to 600 GFLOP of FP32 in a 4-cluster design

06:22PM EDT - Also integrated IDE with profiling for Xuantie-910

06:22PM EDT - Compiler has been co-optimized for the hardware improvements

06:22PM EDT - Compared to Arm A73

06:23PM EDT - A73 CPU is from Huawei Kirin 970

06:23PM EDT - Xuantie is configured to same L1 cache sizes

06:24PM EDT - 'on par in this config'

06:24PM EDT - Benchmarks doesn't mean that Xuantie-910 is up to the perfection of A73, as it's still new, and needs more collaboration

06:25PM EDT - Here's an AI workload

06:25PM EDT - on an FPGA simulation of X910

06:25PM EDT - Here's a floor plan

06:26PM EDT - TSMC 12FF

06:26PM EDT - FPGA X910 already deployed in Alibaba cloud

06:27PM EDT - FPGA runs at 200 MHz

06:27PM EDT - 2020 July, 28 HPC version at 1.6 GHz, 0.3 mW/MHz

06:27PM EDT - September, 12nm FinFET due

06:28PM EDT - Help external customers with X910 with Wujian SoC platform

06:30PM EDT - Now for Q&A

06:32PM EDT - Q: What applications are you using it for?

06:33PM EDT - A: It's a full chip - a high-end core for embedded SoCs

06:34PM EDT - Q: Source code? A: we are actively working on open source procedures. It's not straight forward for a high performance core - legal required. We are talking to open source companies to find the best way to do this. Also repository management and such. Once it is available, we will let you know!

06:34PM EDT - Q: plans to support RVV 1.0? A: 0.7.1 for now - when we designed, it was still at that level. We are following and working ont hat yes.

06:36PM EDT - That's a wrap. My next live blog will be NVIDIA A100 at 5pm PT.

Comments Locked

6 Comments

View All Comments

  • eSyr - Monday, August 17, 2020 - link

    Suddenly, C-SKY.
  • watersb - Tuesday, August 18, 2020 - link

    Pandemic distancing borders in the surreal right now, kids returning to school while staying at home.

    Strange times, but as always I get a lot out of your Hot Chips coverage, even though you are unlikely to be able to insert any wafers into your mouth this year.

    Thanks!
  • name99 - Wednesday, August 19, 2020 - link

    What is "RISC-V Turbo"? The web shows nothing useful.
  • MetalPenguin - Wednesday, August 19, 2020 - link

    That is the name that Alibaba gave to their own custom instruction extension. The slide around "6:14PM EDT" kind of gives a brief overview. They basically added custom instructions to accelerate certain functions.
  • name99 - Wednesday, August 19, 2020 - link

    OK, so point is it's an Alibaba extension, not a "standard" RISC-V extension.
    Oh RISC-V, you and your crazy extensions. It's hard to imagine this will play out well long term...

    So do we validate these as good choices (ie they should have been in the base spec!) if other people copy them? Or are they (C) Alibaba so one step on the way to Balkanization?
  • green.holden - Tuesday, September 20, 2022 - link

    300 GFLOPS of FP16 and 150 of FP32.

    That's got to be enough for a basic GPU.

Log in

Don't have an account? Sign up now