Tutorial: BigDataBench 5.0

--- a Scalable Big Data and AI (Artificial Intelligence) Benchmark suite for IoT, Edge, Datacenter and HPC, BenchCouncil

This tutorial is aimed at presenting BigDataBench 5.0---a scalable big data and data-driven artificial intelligence (in short, AI) benchmark suite. We are glad to introduce the following interesting topics:

(1) The challenges and motivation for characterizing modern Big Data and AI workloads.

(2) Why scalable benchmarking methodology matters? What is a data motif? How many data motifs can characterize and compose modern comprehensive Big Data and AI workloads? Micro benchmarks of Modern Big Data and AI workloads.

(3) The essentials of modern big data and AI workloads, e.g., Image classification, Image generation, Text-to-Text Translation, Image-to-text, Image-to-image, Speech-to-text, Word embedding, Face embedding, Object detection, Recommendation, Graph Model, clustering, classification, PageRank, Feature extraction, Search engine indexing. Their data sets, algorithms, and involved data motifs.

(4) Two end-to-end application benchmarks: Mixed data center workloads, and E-commerce search (Joint work with Alibaba).

(5) The proxy benchmarks for modern Big Data and AI workloads that shorten the execution time by 100s times.

(6) How to use BigDataBench 5.0?

(7) How to contribute to BigDataBench? How to join BenchCouncil (http://www.benchcouncil.org).

Location and Date

We will give a tutorial on BigDataBench at HPCA 2019 in Washington D.C., USA.

February 16, 2019 (Saturday),08:30 - 17:10 (Full Day)

ROOM:University of DC

Organizers and Presenters

Organizer: Dr. Jianfeng Zhan ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Dr. Jianfeng Zhan ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Dr. Chen Zheng ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Dr. Wanling Gao ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences

Abstract

As a multi-discipline, i.e., system, architecture, data management and machine learning, research and engineering effort from both industry and academia, BigDataBench (IISWC' 13, HPCA' 14, PACT' 16, TPDS' 17, PACT' 18, IISWC' 18) is a scalable big data and AI benchmark suite. It is roughly estimated that there are 600+ published paper citing or using BigDataBench since 2014.

We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different initial or intermediate data inputs, each of which we call a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs— including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic computation, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. Significantly different from the traditional kernels, a data motif’s behaviors are affected by the sizes, patterns, types, and sources of different data inputs; Moreover, it reflects not only computation patterns, memory access patterns, but also disk and network I/O patterns.

The current version — BigDataBench 5.0 is significant upgrade, and provides comprehensive Big Data and AI benchmark suite, including online services, offline analytics, graph analytics, AI, data warehouse, NoSQL, and streaming workloads. Our benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consists of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks. Also, we propose using the combinations of the eight data motifs with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems and they are qualified for both earlier architecture design and later system evaluation across different architectures.

Schedule

Time Agenda Presenter Resources
08:30-09:00 Challenges and motivation for characterizing modern Big Data and AI workloads Jianfeng Zhan [Slides]
09:00-09:30 Benchmarking Methodology, models, and metrics Jianfeng Zhan [Slides]
09:30-10:00 Summary of Benchmarks Jianfeng Zhan [Slides]
10:00-10:30 Micro benchmarks Wanling Gao [Slides]
10:30-10:50 Coffee Break
10:50-12:00 Essentials of modern big data and AI workloads Wanling Gao [Slides]
12:00-14:00 Lunch
14:00-14:30 Two end-to-end application benchmarks Jianfeng Zhan [Slides Available Soon]
14:30-15:40 How to use BigDataBench Chen Zheng [Slides Available Soon]
15:40-16:00 Coffee Break
16:00-16:50 Big Data and AI proxy benchmarks Wanling Gao [Slides]
16:50-17:00 How to contribute to BigDataBench? How to join BenchCouncil (http://www.benchcouncil.org) Chen Zheng [Slides]
17:00-17:10 Wrap-Up Jianfeng Zhan

Publications

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng, Xu Wen, Xiwen He, Hainan Ye and Rui Ren. The 27th International Conference on Parallel Architectures and Compilation Techniques (PACT 2018).

BigDataBench: a Big Data Benchmark Suite from Internet Services. [PDF]
Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, WanlingGao, Zhen Jia, Yingjie Shi, Shujie Zhang, Cheng Zhen, Gang Lu, Kent Zhan, Xiaona Li, and Bizhu Qiu. The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, USA.

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Zhen Jia, Daoyi Zheng, Chen Zheng, Xiwen He, Hainan Ye, Haibin Wang, and Rui Ren. 2018 IEEE International Symposium on Workload Characterization (IISWC 2018).

BigDataBench: a Scalable and Unified Big Data and AI Benchmark Suite. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Rui Ren, Chen Zheng, Gang Lu, Jingwei Li, Zheng Cao, Shujie Zhang, and Haoning Tang. Technical Report, arXiv preprint arXiv:1802.08254, January 27, 2018.

BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing. [PDF]
Lei Wang, Jianfeng Zhan, Wanling Gao, Zihan Jiang, Rui Ren, Xiwen He, Chunjie Luo, Gang Lu, Jingwei Li. Technical Report, arXiv preprint arXiv:1801.09212, May 3, 2018.

Understanding Big Data Analytics Workloads on Modern Processors. [PDF]
Zhen Jia, Jianfeng Zhan, Lei Wang, Chunjie Luo, Wanling Gao, Yi Jin, Rui Han and Lixin Zhang. IEEE Transactions on Parallel and Distributed Systems, 28(6), 1797-1810, 2017.

Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers. [PDF]
Zhen Jia, Wanling Gao, Yingjie Shi, Sally A. McKee, Jianfeng Zhan, Lei Wang, Lixin Zhang. IEEE Transactions on Big Data, 2017.

A Dwarf-based Scalable Big Data Benchmarking Methodology. [PDF]
Wanling Gao, Lei Wang, Jianfeng Zhan, Chunjie Luo, Daoyi Zheng, Zhen Jia, Biwei Xie, Chen Zheng, Qiang Yang, and Haibin Wang. arXiv preprint arXiv: 1711.03229

Characterizing data analysis workloads in data centers. [PDF]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013) (Best paper award).

Characterizing and Subsetting Big Data Workloads.[PDF]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo, Ninghui Sun. 2014 IEEE International Symposium on Workload Characterization (IISWC 2014)

Identifying Dwarfs Workloads in Big Data Analytics.  [PDF]
W Gao, C Luo, J Zhan, H Ye, X He, L Wang, Y Zhu, X Tian. 
arXiv preprint arXiv:1505.06872

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. [PDF]
Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, and Jianfeng Zhan. In Advancing Big Data Benchmarks (pp. 138-154). Springer International Publishing.

Biographies

Jianfeng Zhan
Jianfeng Zhan is a Professor of Computer Science and Engineering at Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. Since 2002, he has been working on datacenter computing benchmarks, OS, resource management, programming models, performance optimization, and system availability. He has published over 150 papers in major journals and international conferences related to these research areas, and filed 40 patents. From 2004 to 2010, he leaded the R&D efforts of innovative cluster and cloud systems software for the dawning-series super computers (which ranked top 2 and top 10 on the top 500 list in 2010 and 2004, respectively). Among them, GridView was transferred to Sugon, which is a premier supercomputing company in China, and becomes its popular software product. Currently, he is leading the research efforts for modern datacenter software stacks, including BigDataBench---an open source big data and AI benchmark suite, and RainForest--- an operating system for datacenter computing. He received the second-class Chinese National Technology Promotion Prize in 2006, the Distinguished Achievement Award of the Chinese Academy of Sciences in 2005, IISWC Best paper award in 2013, and Huawei Contribution Prize in 2013, respectively. More details about Prof. Zhan are available at http://prof.ict.ac.cn/jfzhan

Chen Zheng
Chen Zheng is a post doc researcher at the Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. His research focuses on Operating System, Virtualization, benchmarks, and data center workload characterization. He received his PHD degree in 2017 from Institute of Computing Technology in China.

Wanling Gao
Wanling Gao is an Assistant Professor in computer science at the Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. Her research interests focus on big data benchmark and big data analytics. She received her B.S. degree in 2012 from Huazhong University of Science and Technology and her PhD degree in 2019 from Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences in China.

Relate Links

BigDatabench http://prof.ict.ac.cn/BigDataBench/