Tutorial: BenchCouncil 5.0

---a Scalable Big Data and Al(Artificial Intelligence)
Benchmark suite for IoT, Edge, Datacenter and HPC,
BenchCouncil

This tutorial is aimed at presenting BigDataBench 5.0--a scalable big data and data-driven artificial inteligence (inshort, Al) benchmark suite, We are glad to introduce the following interesting topics:

(1)The challenges and motivation for characterizing modern Big Data and Al workloads.
(2)Why scalable benchmarking methodology matters? What is a data motif? How many data motifs cancharacterize and compose modern comprehensive Big Data and Al workloads? Micro benchmarks of Modern BigData and Ai workloads.
(3)The essentials of modern big data and Al workloads, e.g., lmage classification, lmage generation, Text-to-TextTranslation, lmage-to-text, lmage-to-image, Speech-to-text, Word embedding, Face embedding, Object detectionRecommendation, Graph Model, clustering, classification, PageRank, Feature extraction, Search engine indexing.Their data sets,algorithms, and involved data motifs.

Location and Date

We will give a tutorial on BigDataBench at ASPLOS 2018 in Williamsburg, VA, USA.

March 24, 2018 (Saturday),09:00 - 12:00 (Half Day)

ROOM:TBD

Organizers and Presenters

Organizer: Jianfeng Zhan ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Jianfeng Zhan ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Chen Zheng ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Wanling Gao ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences

Abstract

As a multi-discipline, i.e., system, architecture, data management and machine learning, research and engineering effort from both industry and academia, BigDataBench (IISWC’13, HPCA’14, PACT’16, TPDS’17) is an open-source big data and AI benchmark suite. It is roughly estimated that there are 400+ published paper citing or using BigDataBench since 2014.

In this tutorial, first, we will present the challenges in defining modern datacenter workloads that are the foundation of system and architecture research;Second, we report our summary work on identifying eight data dwarfs from comprehensive big data and AI workloads, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. Third, we present BigDataBench --- a data dwarf-based Big Data and AI benchmark suite. Our benchmark suite includes micro benchmarks, each of which is a single data dwarf, components benchmarks, which consists of the data dwarf combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks.

The current version — BigDataBench 4.0 is significant upgrade, and provides 13 representative real-world data sets and 47 benchmarks, including online services, offline analytics, graph analytics, AI, data warehouse, NoSQL, and streaming workloads. Finally, we propose using the combinations of the eight data dwarfs with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems while they are qualified for both earlier architecture design and later system evaluation across different architectures.

Schedule

Time Agenda Presenter Resources
08:30-09:00 Challenges and motivation for characterizing modernBig Data and Al workloads Jianfeng Zhan [Slides]
09:00-09:30 Benchmarking Methodology, models, and metrics Jianfeng Zhan [Slides]
09:30-10:00 Summary of Benchmarks Jianfeng Zhan [Slides]
10:00-10:30 Micro benchmarks Wanling Gao [Slides]
10:30-10:50 Coffee Break
10:50-12:00 Essentials of modern big data and Al workloads Wanling Gao [Slides]
12:00-14:00 Lunch
14:00-14:30 Two end-to-end application benchmarks Jianfeng Zhan [Slides Available Soon]
14:30-15:40 How to use BigDataBench Chen Zheng [Slides Available Soon]
15:40-16:00 Coffee Break
16:00-16:50 Big Data and Al proxy benchmarks Wanling Gao [Slides]

Publications

BigDataBench: a Dwarf-based Big Data and AI Benchmark Suite. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Rui Ren, Chen Zheng, Gang Lu, Jingwei Li, Zheng Cao, Shujie Zhang, and Haoning Tang. Technical Report, arXiv preprint arXiv:1802.08254, January 27, 2018.

BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing. [PDF]
Lei Wang, Jianfeng Zhan, Wanling Gao, Zihan Jiang, Rui Ren, Xiwen He, Chunjie Luo, Gang Lu, Jingwei Li. Technical Report, arXiv preprint arXiv:1801.09212, May 3, 2018.

Data Dwarfs: A Lens Towards Fully Understanding Big Data and AI Workloads. [PDF]
Wanling Gao, Lei Wang, Jianfeng Zhan, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng and Qiang Yang. Technical Report, arXiv preprint arXiv:1802.00699, May 3, 2018.

BigDataBench: a Big Data Benchmark Suite from Internet Services. [PDF]
Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, WanlingGao, Zhen Jia, Yingjie Shi, Shujie Zhang, Cheng Zhen, Gang Lu, Kent Zhan, Xiaona Li, and Bizhu Qiu. The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, USA.

Understanding Big Data Analytics Workloads on Modern Processors. [PDF]
Zhen Jia, Jianfeng Zhan, Lei Wang, Chunjie Luo, Wanling Gao, Yi Jin, Rui Han and Lixin Zhang. IEEE Transactions on Parallel and Distributed Systems, 28(6), 1797-1810, 2017.

Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers. [PDF]
Zhen Jia, Wanling Gao, Yingjie Shi, Sally A. McKee, Jianfeng Zhan, Lei Wang, Lixin Zhang. IEEE Transactions on Big Data, 2017.

A Dwarf-based Scalable Big Data Benchmarking Methodology. [PDF]
Wanling Gao, Lei Wang, Jianfeng Zhan, Chunjie Luo, Daoyi Zheng, Zhen Jia, Biwei Xie, Chen Zheng, Qiang Yang, and Haibin Wang. arXiv preprint arXiv: 1711.03229

Characterizing data analysis workloads in data centers. [PDF]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013) (Best paper award).

Characterizing and Subsetting Big Data Workloads.[PDF]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo, Ninghui Sun. 2014 IEEE International Symposium on Workload Characterization (IISWC 2014)

Identifying Dwarfs Workloads in Big Data Analytics.  [PDF]
W Gao, C Luo, J Zhan, H Ye, X He, L Wang, Y Zhu, X Tian. 
arXiv preprint arXiv:1505.06872

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. [PDF]
Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, and Jianfeng Zhan. In Advancing Big Data Benchmarks (pp. 138-154). Springer International Publishing.

Biographies

Jianfeng Zhan
Jianfeng Zhan is a Professor of Computer Science and Engineering at Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. Since 2002, he has been working on datacenter computing benchmarks, OS, resource management, programming models, performance optimization, and system availability. He has published over 100 papers in major journals and international conferences related to these research areas, and filed 40 patents. From 2004 to 2010, he leaded the R&D efforts of innovative cluster and cloud systems software for the dawning-series super computers (which ranked top 2 and top 10 on the top 500 list in 2010 and 2004, respectively). Among them, GridView was transferred to Sugon, which is a premier supercomputing company in China, and becomes its popular software product. Currently, he is leading the research efforts for modern datacenter software stacks, including BigDataBench---an open source big data and AI benchmark suite, and RainForest--- an operating system for datacenter computing. He received the second-class Chinese National Technology Promotion Prize in 2006, the Distinguished Achievement Award of the Chinese Academy of Sciences in 2005, IISWC Best paper award in 2013, and Huawei Contribution Prize in 2013, respectively. More details about Prof. Zhan are available at http://prof.ict.ac.cn/jfzhan

Chen Zheng
Chen Zheng is a post doc researcher at the Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. His research focuses on Operating System, Virtualization, benchmarks, and data center workload characterization. He received his PHD degree in 2017 from Institute of Computing Technology in China.

Wanling Gao
Wanling Gao is a Ph.D candidate in computer science at the Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. Her research interests focus on big data benchmark and big data analytics. She received her B.S. degree in 2012 from Huazhong University of Science and Technology.

Relate Links

BigDatabench http://prof.ict.ac.cn/BigDataBench/