(1) BigDataBench
The latest version BigDataBench 5.0 provides 13 representative real-world data sets and 27 big data benchmarks. The benchmarks cover six workload types including online services, offline analytics, graph analytics, data warehouse, NoSQL, and streaming from three important application domains, Internet services (including search engines, social networks, e-commerce), recognition sciences, and medical sciences. The benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consist of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks. Meanwhile, data sets have great impacts on workloads behaviors and running performance. Hence, data varieties are considered with the whole spectrum of data types including structured, semi-structured, and unstructured data. Currently, the included data sources are text, graph, table, and image data. Using real data sets as the seed, the data generators—BDGS—generate synthetic data by scaling the seed data while keeping the data characteristics of raw data.
Homepage: /BigDataBench/
(2) TPCx-BB (BigBench)
The TPCx-BB (Transaction Processing Performance Council's BigBench) benchmark suite is a standardized performance evaluation test designed by the Transaction Processing Performance Council (TPC) to measure the capabilities of big data systems. The benchmark includes a set of workloads that simulate real-world big data processing tasks, such as data ingestion, data transformation, and data analysis. TPCx-BB consists of a suite of 30 queries and workloads, simulating real-world big data scenarios such as customer behavior analysis, social network analysis, and text processing. The benchmark uses a comprehensive dataset, including structured and semi-structured data, to test a system's ability to handle various data types and formats. TPCx-BB is a benchmark for assessing the performance, scalability, and price-performance of big data systems.
Homepage: https://www.tpc.org/tpcx-bb/
(3) HiBench
HiBench is a benchmark suite developed by Intel to evaluate the performance of big data frameworks. It is designed to provide a comprehensive evaluation of the system's performance across various workloads commonly found in big data applications. The HiBench benchmark suite includes a set of micro and macro benchmarks that test the performance of the system in areas such as data generation, data sorting, machine learning, graph processing, and web search. The benchmarks are designed to simulate real-world workloads and provide a standardized way to compare the performance of different big data systems.
Homepage: https://github.com/Intel-bigdata/HiBench
(4) CloudSuite
CloudSuite is a benchmark suite for cloud services. The fourth release consists of eight first-party applications that have been selected based on their popularity in today's datacenters. The benchmarks are based on real-world software stacks and represent real-world setups.
Homepage: https://github.com/parsa-epfl/cloudsuite
(5) CALDA
CALDA is a benchmarking effort targeting MapReduce systems and parallel DBMSes. Its workloads are from the original MapReduce paper [34] and add four complex analytical tasks.
Reference: https://www.cs.cmu.edu/~pavlo/papers/benchmarks-sigmod09.pdf
(6) YCSB
YCSB released by Yahoo! is a benchmark for data storage systems and only includes online service workloads, i.e. Cloud OLTP. The workloads are mixes of read/write operations to cover a wide performance space.
Homepage: https://github.com/brianfrankcooper/YCSB
(7) AMP Benchmarks
AMP benchmark is a big data benchmark proposed by UC Berkeley, which focuses on real-time analytic applications. This benchmark measures response time on a handful of relational queries: scans, aggregations, joins, and UDF's, across different data sizes.
Homepage: https://amplab.cs.berkeley.edu/benchmark/
(8) SPEC Cloud® IaaS 2018
The SPEC Cloud® IaaS 2018 benchmark addresses the performance of infrastructure-as-a-service (IaaS) cloud platforms. IaaS cloud platforms can either be public or private.
Homepage: https://www.spec.org/cloud_iaas2018/