OVERVIEW

News: Bench18 Call for Papers (Co-located with Big Data 2018). BigDataBench 4.0 released on April 1. Technical Report on BigDataBench 4.0, Big Data and AI Motif, and BOPS: new metric for Datacenter Computing.

Summary

As architecture, system, data management, and machine learning communities pay greater attention to innovative big data and data-driven artificial intelligence (in short, AI) algorithms, architecture, and systems, the pressure of benchmarking rises. However, complexity, diversity, frequently changed workloads, and rapid evolution of big data, especially AI systems raise great challenges in benchmarking. First, for the sake of conciseness, benchmarking scalability, portability cost, reproducibility, and better interpretation of performance data, we need understand what are the most time-consuming classes of unit of computation among big data and AI workloads. Second, for the sake of fairness, the benchmarks must include diversity of data and workloads. Third, for co-design of software and hardware, the benchmarks should be consistent across different communities.

We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different initial or intermediate data inputs, each of which we call a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs— including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic computation, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. Significantly different from the traditional kernels, a data motif’s behaviors are affected by the sizes, patterns, types, and sources of different data inputs; Moreover, it reflects not only computation patterns, memory access patterns, but also disk and network I/O patterns.

As a multi-discipline research and engineering effort, i.e., architecture, system, data management, and machine learning communities from both industry and academia, we set up an open-source big data and AI benchmark suite—BigDataBench. The current version BigDataBench 4.0 provides 13 representative real-world data sets and 47 benchmarks. Other than creating a new benchmark or proxy for every possible workload, we propose using data motif-based benchmarks—the combination of eight data motifs—to represent diversity of big data and AI workloads. Our benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consist of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks.

The benchmarks cover seven workload types including AI, online services, offline analytics, graph analytics, data warehouse, NoSQL, and streaming from important application domains, i. e., search engines, social networks, e-commerce, multimedia processing and bioinformatics. Meanwhile, data sets have great impacts on workloads behaviors and running performance (CGO’18). Hence, data varieties are considered with the whole spectrum of data types including structured, semi-structured, and unstructured data. Currently, the included data sources are text, graph, table, and image data. Using real data sets as the seed, the data generators—BDGS— generate synthetic data by scaling the seed data while keeping the data characteristics of raw data.

To achieve the consistency of benchmarks across different communities, we absorb state-of-the-art algorithms from the machine learning communities that considers the model’s prediction accuracy. For the benchmarking requirements of system and data management communities, we provide diverse implementations using the state-of-the-art techniques. For offline analytics, we provide Hadoop, Spark, Flink and MPI implementations. For graph analytics, we provide Hadoop, Spark GraphX, Flink Gelly and GraphLab implementations. For AI, we provide TensorFlow and Caffe implementations. For data warehouse, we provide Hive, Spark-SQL and Impala implementations. For NoSQL, we provide MongoDB and HBase implementations. For streaming, we provide Spark streaming and JStorm implementations.

For the architecture community, whatever early in the architecture design process or later in the system evaluation, it is time-consuming to run a comprehensive benchmark suite. The complex software stacks of the big data and AI workloads aggravate this issue. To tackle this challenge, we propose the data motif-based simulation benchmarks for architecture communities, which speed up runtime 100 times while preserving system and micro-architectural characteristic accuracy. Also, we propose another methodology to reduce the benchmarking cost, we select a small number of representative benchmarks, called the BigDataBench subset according to workload characteristics from an architecture perspective. We provide the BigDataBench architecture subset on the MARSSx86, gem5, and Simics simulator versions, respectively.

On a typical state-of-practice processor: Intel Xeon E5- 2620 V3, we also perform comprehensive characterizations on the benchmarks of seven workload types in BigDataBench 4.0 in addition to traditional benchmarks like SPECCPU, PARSEC and HPCC, in a hierarchical manner and drill down on five levels, using the Top-Down analysis from an architecture perspective. We have the following observations. First, as listed in Figure 1, the ILP (instruction-level parallelism) of the AI benchmarks 2 is 1.26 on average, slightly lower than SPECCPU (1.32). The MLP (memory-level parallelism) of AI is 2.65, similar with HPCC (2.78). Big data has lower ILP (0.85 on average) and MLP (1.86 on average) than AI for almost all types, except that Hive based data warehouse has slightly higher ILP than AI. Further, their performance vary across workload types and software stacks.

Figure 1 Average Execution Performance.

Second, as listed in Figure 2, in terms of uppermost-level breakdown, AI reflect similar pipeline behaviors with the traditional benchmarks, with approximately equal retiring (35% v.s. 39.8%), bad speculation (6.3% v.s. 6.1%), frontend bound (both about 9%), and backend bound percentages (49.7% v.s. 45.1%). The frontend bound of big data is more severe than that of traditional benchmarks (9% on average). However, we notice that the frontend bound varies across different workload types. NoSQL has the highest percentage of 35%, while data warehouse has 25% and the other has only 15% on average. Please see our technical report for more details.

Figure 2 Uppermost Level Breakdown of All Benchmarks.

 

 

To model and reproduce multi-application or multi-user scenarios on Cloud or datacenters, we provide the multi-tenancy version of BigDataBench, which allows flexible setting and replaying of mixed workloads according to the real workload traces—the Facebook, Google and Sogou traces.

Together with several industry partners, including Telecom Research Institute Technology, Huawei, Intel (China), Microsoft (China), IBM CDL, Baidu, Sina, INSPUR, ZTE and etc, we also release China’s first industry standard big data benchmark suite—-BigDataBench-DCA, which is a subset of BigDataBench.

Why BigDataBench?

As shown in Table 1, among seven desired properties, we can find that BigDataBench is more sophisticated than the other state-of-art big data benchmarks.

Table 1: The Differences of BigDataBench from Other Benchmarks Suites.

Benchmarking Target Methodology Application domains Workload types Workloads Real data sets and scalable data sets Software stacks
BigDataBench Big data and AI systems and architecture Data Motif-based five Seven [1] forty seven 13 real data sets; 6 scalable data sets sixteen
BigBench 2.0 Big data systems Application model one five Proposal Proposal Proposal
CloudSuite 3.0 Cloud services Popularity N/A four eight 3 data generators three
HiBench 6.0 Big data systems Popularity N/A six nineteen Random generate or with specific distribution five
CALDA MapReduce system and parallel DBMSs Popularity N/A one five N/A three
YCSB Cloud serving systems Performance model N/A one six N/A four
LinkBench Database system Application model N/A one ten one data generator two
AMP Benchmarks Data analytic systems Popularity N/A one four N/A five
Fathom AI systems Popularity N/A one eight N/A one

[1]The seven workload types are online service, offline analytics, graph analytics, artificial intelligence, data warehouse, NoSQL, and streaming.

What’s New?

BigDataBench 4.0 proposes a data motif-based benchmarking methodology, and provides a set of micro, component and end-to-end application benchmarks, to fulfill different benchmarking requirements. Currently, BigDataBench includes 13 real-world data sets, and 47 big data benchmarks, covering seven workload types, including AI, online service, offline analytics, graph analytics, data warehouse, NoSQL, and streaming. Also, for simulation-based architecture research, we also provide data motif-based simulation benchmarks that preserve high system and micro-architectural characteristic accuracy (above 90% on average), and largely shorten (100 times speedup) the simulation time.

Methodology

Figure 7 summarizes our data motif-based benchmarking methodology for BigDataBench 4.0, separating the specification from implementation. Circling around the data motifs, we define the specification of micro, component, and end-to-end application benchmarks.

Figure 7 BigDataBench Benchmarking Methodology.

Micro Benchmark Specification. Data motifs are fundamental concepts and units of computation among a majority of big data and AI workloads. We design a suite of micro benchmarks, each of which is a single data motif, as listed in Table 2. We also notice that these micro benchmark implementations have different characteristics, i.e., CPU-intensive, memory-intensive or I/O-intensive.

Component Benchmark Specification. Data motif combinations can compose original complex workloads. In addition to micro benchmarks consisting of a single data motif, we also consider component benchmarks, which are representative workloads in different application domains. Component benchmarks are combinations of one or more data motifs using a DAG-like structure, as listed in Table 3. For example, SIFT is a combination of five data motifs, including matrix, sampling, transform, sort and statistic computations.

Application Benchmark Specification. To model an application domain, we define end-to-end application benchmark specification considering user characteristics and processing logic, based on the real process of an application domain. Due to the complexity and difficulty to benchmark a real application domain, we simplify and model the primary process of an application domain, and provide portable and usable end-to-end benchmarks. We use the combination of component benchmarks to represent the processing logic. For example, for online service, we generate queries considering query number, rate, distribution and locality to reflect the user characteristics.

Benchmarks

BigDataBench is in fast expansion and evolution. Currently, we proposed benchmarks specifications modeling five typical application domains. The current version BigDataBench 4.0 includes 13 real-world data sets and 47 big data workloads, covering seven types. Table 2 summarizes the real-world data sets and scalable data generation tools included into BigDataBench 4.0, covering the whole spectrum of data types, including structured, semi-structured, and unstructured data, and different data sources, including text, graph, image, audio, video and table data. Table 3 and Table 4 present the micro benchmarks and component benchmarks in BigDataBench 4.0 from perspectives of involved data motif, application domain, workload type, data set and software stack. For some end users, they may just pay attention to big data application of a specific type. For example, they want to perform an apples-to- apples comparison of software stacks for offline analytics. They only need to choose benchmarks with the type of offline analytics. But if the users want to measure or compare big data systems and architecture, we suggest they cover all benchmarks.

Table 2.The summary of data sets and data generation tools

DATA SETS DATA SIZE SCALABLE DATA SET
1 Wikipedia Entries 4,300,000 English articles(unstructured text) Text Generator of BDGS
2 Amazon Movie Reviews 7,911,684 reviews(semi-structured text) Text Generator of BDGS
3 Google Web Graph 875713 nodes, 5105039 edges(unstructured graph) Graph Generator of BDGS
4 Facebook Social Network 4039 nodes, 88234 edges (unstructured graph) Graph Generator of BDGS
5 E-commerce Transaction Data table1:4 columns,38658 rows.

table2: 6columns, 242735 rows(structured table)

Table Generator of BDGS
6 ProfSearch Person Resumes 278956 resumes(semi-structured table) Table Generator of BDGS
7 CIFAR-10 60000 color images with the dimension of 32*32 Ongoing development
8 ImageNet ILSVRC2014 DET image dataset(unstructured image) Ongoing development
9 LSUN One million labelled images, classified into 10 scene categories and 20 object categories Ongoing development
10 TED Talks Translated TED talks provided by IWSLT evaluation campaign Ongoing development
11 SoGou Data the corpus and search query data from So-Gou Labs(unstructured text) Ongoing development
12 MNIST handwritten digits database which has 60,000 training examples and 10,000 test examples(unstructured image) Ongoing development
13 MovieLens Dataset User’s score data for movies, which has 9,518,231 training examples and 386,835 test examples(semi-structured text) Ongoing development

Table 3. The summary of the micro benchmarks in BigDataBench 4.0

Micro Benchmark

Involved Data Motif

Application Domain

Workload Type

Date Set

Software Stack

Sort

Sort

SE, SN, EC, MP, BI [1]

Offline analytics

Wikipedia entries

Hadoop, Spark, Flink, MPI

Grep

Set

SE, SN, EC, MP, BI

Offline analytics

Wikipedia entires

Hadoop, Spark, Flink, MPI

Streaming

Random generate

Spark streaming

WordCount

Basic statistics

SE, SN, EC, MP, BI

Offline analytics

Wikipedia entires

Hadoop, Spark, Flink, MPI

MD5

Logic

SE, SN, EC, MP, BI

Offline analytics

Wikipedia entires

Hadoop, Spark, MPI

Connected Component

Graph

SN

Graph analytics

Facebook social network

Hadoop, Spark, Flink, GraphLab, MPI

RandSample

Sampling

SE, MP, BI

Offline analytics

Wikipedia entires

Hadoop, Spark, MPI

FFT

Transform

MP

Offline analytics

Two-dimensional matrix

Hadoop, Spark, MPI

Matrix Multiply

Matrix

SE, SN, EC, MP, BI

Offline analytics

Two-dimensional matrix

Hadoop, Spark, MPI

Read

Set

SE, SN, EC

NoSQL

ProfSearch resumes

HBase, MongoDB

Write

Set

SE, SN, EC

NoSQL

ProfSearch resumes

HBase, MongoDB

Scan

Set

SE, SN, EC

NoSQL

ProfSearch resumes

HBase, MongoDB

Convolution

Transform

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Fully Connected

Matrix

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Relu

Logic

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Sigmoid

Matrix

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Tanh

Matrix

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

MaxPooling

Sampling

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

AvgPooling

Sampling

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

CosineNorm

Basic Statistics

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

BatchNorm

Basic Statistics

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Dropout

Sampling

SN, EC, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

[1]SE (Search Engine), SN (Social Network), EC (e-commerce), BI (Bioinformatics), MP (Multimedia Processing).
Table 4. The summary of the component benchmarks in BigDataBench 4.0

Component Benchmark

Involved Data Motif

Application Domain

Workload Type

Date Set

Software Stack

Xapian Server

Get, Put, Post

SE

Online service

Wikipedia entries

Xapian

PageRank

Matrix, Sort, Basic statistics, Graph

SE

Graph analytics

Google web graph

Hadoop, Spark, Flink, GraphLab, MPI

Index

Logic, Sort, Basic statistics, Set

SE

Offline analytics

Wikipedia entries

Hadoop, Spark

Rolling top words

Sort, Basic statistics

SN

Streaming

Random generate

Spark streaming, JStorm

Kmeans

Matrix, Sort, Basic statistics

SE, SN, EC, MP, BI

Offline analytics

Facebook social network

Hadoop, Spark, Flink, MPI

Streaming

Random generate

Spark streaming

Collaborative Filtering

Graph, Matrix

EC

Offline analytics

Amazon movie review

Hadoop, Spark

Streaming

MovieLens dataset

JStorm

Naive Bayes

Basic statistics, Sort

SE, SN, EC

Offline analytics

Amazon movie review

Hadoop, Spark, Flink, MPI

SIFT

Matrix, Sampling, Transform, Sort

MP

Offline analytics

ImageNet

Hadoop, Spark, MPI

LDA

Matrix, Graph, Sampling

SE

Offline analytics

Wikipedia entries

Hadoop, Spark, MPI

OrderBy

Set, Sort

EC

Data warehouse

E-commerce transaction

Hive, Spark-SQL, Impala

Aggregation

Set, Basic statistics

EC

Data warehouse

E-commerce transaction

Hive, Spark-SQL, Impala

Project

Set

EC

Data warehouse

E-commerce transaction

Hive, Spark-SQL, Impala

Filter

Set

EC

Data warehouse

E-commerce transaction

Hive, Spark-SQL, Impala

Select

Set

EC

Data warehouse

E-commerce transaction

Hive, Spark-SQL, Impala

Union

Set

EC

Data warehouse

E-commerce transaction

Hive, Spark-SQL, Impala

Alexnet

Matrix, Transform, Sampling, Logic, Basic statistics

SN, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Googlenet

SN, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Resnet

SN, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

Inception Resnet V2

SN, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

VGG16

SN, MP, BI

AI

Cifar, ImageNet

TensorFlow, Caffe

DCGAN

SN, MP, BI

AI

LSUN

TensorFlow, Caffe

WGAN

SN, MP, BI

AI

LSUN

TensorFlow, Caffe

GAN

Matrix, Sampling, Logic, Basic Statistics

SN, MP, BI

AI

LSUN

TensorFlow, Caffe

Seq2Seq

SN, EC, BI

AI

TED Talks

TensorFlow, Caffe

Word2vec

Matrix, Basic statistics, Logic

SE, SN, EC

AI

Wikipedia entries, Sogou data

TensorFlow, Caffe

Evolution

As shown in Figure 2, the evolution of BigDataBench has gone through three major stages: At the first stage, we released three benchmarks suites, BigDataBench 1.0 (6 workloads from Search engine), DCBench 1.0 (11 workloads from data analytics), and CloudRank 1.0(mixed data analytics workloads).

At the second stage, we merged the previous three benchmark suites and release BigDataBench 2.0 , through investigating the top three important application domains from internet services in terms of the number of page views and daily visitors. BigDataBench 2.0 includes 6 real-world data sets, and 19 big data workloads with different implementations, covering six application scenarios: micro benchmarks, Cloud OLTP, relational query, search engine, social networks, and e-commerce. Moreover, BigDataBench 2.0 provides several big data generation tools–BDGS– to generate scalable big data, e.g, PB scale, from small-scale real-world data while preserving their original characteristics.

BigDataBench 3.0 is a multidisciplinary effort. It includes 6 real-world, 2 synthetic data sets, and 32 big data workloads, covering micro and application benchmarks from typical application domains, e. g., search engine, social networks, and e-commerce. As to generating representative and variety of big data workloads, BigDataBench 3.0 focuses on units of computation that frequently appear in Cloud OLTP, OLAP, interactive and offline analytics.

Figure 2: BigDataBench Evolution

Previous releases

BigDataBench 3.2 http://prof.ict.ac.cn/BigDataBench/old/3.2/

BigDataBench 3.1 http://prof.ict.ac.cn/BigDataBench/old/3.1/

BigDataBench 3.0 http://prof.ict.ac.cn/BigDataBench/old/3.0/

BigDataBench 2.0 http://prof.ict.ac.cn/BigDataBench/old/2.0/

BigDataBench 1.0 http://prof.ict.ac.cn/BigDataBench/old/1.0/

DCBench 1.0 http://prof.ict.ac.cn/DCBench/

CloudRank 1.0 http://prof.ict.ac.cn/CloudRank/

Handbook

Handbook of BigDataBench [BigDataBench-handbook]

Q &A

More questions & answers are available from the handbook of BigDataBench.

Contacts (Email)

  • gaowanling@ict.ac.cn
  • lijingwei@mail.bafst.com
  • zhanjianfeng@ict.ac.cn
  • wl@ncic.ac.cn

People

  • Prof. Jianfeng Zhan, ICT, CAS
  • Dr. Lei Wang, ICT, CAS
  • Wanling Gao, ICT, CAS
  • Chunjie Luo, ICT, CAS
  • Qiang Yang, BAFST
  • Jingwei Li, BAFST
  • Xinhui Tian, ICT, CAS
  • Dr. Gang Lu, BAFST
  • Xinlong Lin, BAFST
  • Rui Ren, ICT, CAS
  • Dr, Rui Han, ICT, CAS
  • Daoyi Zheng, ICT, CAS
  • Dr. Chen Zheng, ICT, CAS
  • Zheng Cao, Alibaba
  • Shujie Zhang, Huawei
  • Haoning Tang, Tencent
  • Alumni

  • Dr. Zhen Jia, Princeton University
  • Hainan Ye, BAFST
  • Dr. Yingjie Shi,
  • Zijian Ming, Tencent
  • Yuanqing Guo, Sohu
  • Yongqiang He, Dropbox
  • Kent Zhan, WUBA
  • Xiaona Li
  • Bizhu Qiu, Yahoo

License

BigDataBench is available for researchers interested in big data. Software components of BigDataBench are all available as open-source software and governed by their own licensing terms. Researchers intending to use BigDataBench are required to fully understand and abide by the licensing terms of the various components. BigDataBench is open-source under the Apache License, Version 2.0. Please use all files in compliance with the License. Our BigDataBench Software components are all available as open-source software and governed by their own licensing terms. If you want to use our BigDataBench you must understand and comply with their licenses. Software developed externally (not by BigDataBench group)

Software developed internally (by BigDataBench group) BigDataBench_4.0 License BigDataBench_4.0 Suite Copyright (c) 2013-2018, ICT Chinese Academy of Sciences All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistribution of source code must comply with the license and notice disclaimers
  • Redistribution in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided by the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ICT CHINESE ACADEMY OF SCIENCES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.