BenchCouncil’s View On Benchmarking AI and Other Emerging Workloads (Technical Report, Slides presented by Prof. Jianfeng Zhan at BenchCouncil SC BoF). This paper outlines BenchCounci’s view on the challenges, rules, and vision of benchmarking modern workloads.

BenchCouncil AI Benchmarks

BenchCouncil AI benchmarks cover the benchmarking of datacenter AI (AIBench), high performance computing (HPC AI500), edge computing (Edge AIBench), and AIoT (AIoTBench). The relationship of these four AI benchmarks is as follows.

AIBench proposes a scenario-distilling benchmarking methodology and an industry-scale AI benchmark framework and suite. It abstracts nine representative application scenarios from seventeen industry parntners, and identify seventeen prominent AI related tasks. The AIBench reusing framework includes the AI and non-AI component libraries, data input, online inference, offline training, and deployment tool modules. Based on the framework, the benchmark suite includes two scenario benchmarks---E-commerce Search and Online Translation, seventeen AI component benchmarks that not only reflect the AI model performance and quality targets, but also represent a wide variety of workload characteristics, and fourteen micro benchmarks for fine-grained optimization.

HPC AI500 proposes a comprehensive HPC AI benchmarking methodology that achieves the goal of being equivalent, relevant, representative, affordable, and repeatable. Following this methodology, HPC AI500 presents open-source benchmarks, and Roofline performance model to benchmarking and optimizing the systems. Two innovative metrics are proposed: Valid FLOPS, and valid FLOPS per watt to rank the performance and energy-efficiency of HPC AI systems. The evaluations show our methodology, benchmarks, performance models, and metrics can measure, optimize, and rank the HPC AI systems in a scalable, simple, and affordable way.

AIoTBench provides an open-source benchmark suite for Mobile and Embedded device Intelligence. AIoTBench provides 3 representative real-world data sets and 12 benchmarks. The benchmarks cover 3 application domains including image recognition, speech recognition and natural language processing. It consists of 9 micro benchmarks and 3 component benchmarks. It covers different platforms, including Android devices and Raspberry Pi. It covers different development tools, including TensorFlow and Caffe2.

Edge AIBench provides a comprehensive end-to-end edge computing benchmark suite. Edge AIBench provides 5 representative real-world data sets and 16 benchmarks. The benchmarks cover 4 application scenarios including ICU Patient Monitor, Surveillance Camera, Smart Home, and Autonomous Vehicle. It consists of 8 micro benchmarks and 8 component benchmarks. Moreover, it provides an edge computing AI testbed combined with federated learning.