BigDataBench: A Big Data Benchmark Suite, BenchCouncil


Selective Research Papers Using or Citing BigDataBench

1. Architecture

Lee, B. C. (2016). Datacenter Design and Management: A Computer Architect’s Perspective. Synthesis Lectures on Computer Architecture, 11(1), 1-121.

Tseng, H. W., Zhao, Q., Zhou, Y., Gahagan, M., & Swanson, S. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. ISCA 2016.

Dong, J., Hou, R., Huang, M., Jiang, T., Zhao, B., McKee, S. A., … & Zhang, L. (2016, March). Venice: Exploring server architectures for effective resource sharing. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 507-518). IEEE.

Kanev S, Darago J P, Hazelwood K, et al. Profiling a warehouse-scale computer[C]//Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 2015: 158-169.

Halpern Y Z D R M, Reddi V J. Microarchitectural Implications of Event-driven Server-side Web Applications[J]. 2015. in Micro’15

Jung, D.; Li, S.; Ahn, J., “Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications,” in Computer Architecture Letters. 2015

Xie, B., Liu, X., Zhan, J., Jia, Z., Zhu, Y., Wang, L., & Zhang, L. (2015, October). Characterizing Data Analytics Workloads on Intel Xeon Phi. In Workload Characterization (IISWC), 2015 IEEE International Symposium on(pp. 114-115). IEEE.

Wang, W. (2015). Addressing Processor Over-provisioning on Large-scale Multi-core Platforms (Doctoral dissertation, University of Virginia).

Huang, Jen-Cheng. “Efficient Simulation Techniques For Large-scale Applications.” (2015). (Doctoral dissertation, Georgia Institute of Technology).

Manu Awasthi, Tameesh Suri, Zvika Guz, Anahita Shayesteh, Mrinmoy Ghosh, Vijay Balakrishnan. System-Level Characterization of Datacenter Applications. ICPE’15, Jan. 31–Feb. 4, 2015, Austin, Texas, USA. ACM.

Zou Q, Poremba M, He R, et al. Heterogeneous architecture design with emerging 3D and non-volatile memory technologies[C]//Design Automation Conference (ASP-DAC), 2015 20th Asia and South Pacific. IEEE, 2015: 785-790.

Zhen Jia, Jianfeng Zhan, Wang Lei, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, and Jingwei Li. Characterizing and subsetting big data workloads [PDF]. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014.

Tao Jiang, Qianlong Zhang, Rui Hou, Lin Chai, Sally A. Mckee, Zhen Jia, and Ninghui Sun. Understanding the Behavior of In-Memory Computing Workloads [PDF]. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014.

Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo. Characterizing data analysis workloads in data centers. [PDF] [Slides]. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013)Best paper award

Ryoo, Jee Ho; LeBeane, Michael; Iqbal, Muhammad Faisal; John, Lizy K., “Control flow behavior of cloud workloads,” Workload Characterization (IISWC), 2014 IEEE International Symposium on , vol., no., pp.71,73, 26-28 Oct. 2014

2. Data Management

Rabl T, Danisch M, Frank M, et al. Just can’t get enough: Synthesizing Big Data[C]//Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015: 1457-1462.

Baru C, Bhandarkar M, Curino C, et al. Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data[M]//Performance Characterization and Benchmarking. Traditional to Big Data. Springer International Publishing, 2014: 44-63.

Mesmoudi, A., Hacid, M. S., & Toumani, F. (2015). Benchmarking SQL on MapReduce systems using large astronomy databases. Distributed and Parallel Databases, 1-32.

Liang, F., Feng, C., Lu, X., & Xu, Z. (2014). Performance benefits of DataMPI: a case study with BigDataBench. In Big Data Benchmarks, Performance Optimization, and Emerging Hardware (pp. 111-123). Springer International Publishing.

3. Systems

Lin, F. X., & Liu, X. (2016, March). memif: Towards Programming Heterogeneous Memory Asynchronously. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 369-383). ACM.

Lu, G., Zhan, J., Lin, X., Tan, C., & Wang, L. (2016). On Horizontal Decomposition of the Operating System. arXiv preprint arXiv:1604.01378.

Zhang, Y., Meisner, D., Mars, J., & Tang, L. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference. ISCA 2016.

Yazdanbakhsh, A., Mahajan, D., Lotfi-Kamran, P., & Esmaeilzadeh, H. AXBENCH: A Multi-Platform Benchmark Suite for Approximate Computing.

Cui Y, Chen Q, Yang J. Automatic In-vivo Evolution of Kernel Policies for Better Performance[J]. arXiv preprint arXiv:1508.06356, 2015.

Adhinarayanan, V., & Feng, W. C. (2015). An Automated Framework for Characterizing and Subsetting GPGPU Workloads.

Han, R., Wang, J., Huang, S., Shao, C., Zhan, S., Zhan, J., & Vazquez-Poletti, J. L. (2015, September). PCS: Predictive Component-level Scheduling for Reducing Tail Latency in Cloud Online Services. In Parallel Processing (ICPP), 2015 44th International Conference on (pp. 490-499). IEEE.

Nai L, Xia Y, Tanase I, et al. GraphBIG: Understanding graph computing in the context of industrial solutions[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC. 2015, 15.

Huang J, Zhang X, Schwan K. Understanding issue correlations: a case study of the Hadoop system[C]//Proceedings of the Sixth ACM Symposium on Cloud Computing. ACM, 2015: 2-15.

Li M, Tan J, Wang Y, et al. SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark[C]//Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, 2015: 53.

Malik, M., Rafatirah, S., Sasan, A., & Homayoun, H. (2015). System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures. In IEEE International Conference on Big Data. IEEE BigData.

Geoffrey C. FOX, Shantenu JHA, Judy QIU, Saliya EKANAYAKE and Andre LUCKOW. Towards a Comprehensive Set of Big Data Benchmarks Technical Report submitted for publication February 15 2015

Sadasivam, S. K., & Selvi, S. T. (2015, July). Performance evaluation of Data Mining algorithms on three generations of Intel® microarchitecture. InHigh Performance Computing & Simulation (HPCS), 2015 International Conference on (pp. 334-341). IEEE.

Spicuglia S, Chen L Y, Birke R, et al. Optimizing capacity allocation for big data applications in cloud datacenters[C]//Integrated Network Management (IM), 2015 IFIP/IEEE International Symposium on. IEEE, 2015: 511-517.

Sangroya, A., Bouchenak, S., & Serrano, D. (2016). Experience with benchmarking dependability and performance of MapReduce systems. Performance Evaluation.

Han, R., Wang, J., Ge, F., Vazquez-Poletti, J. L., & Zhan, J. (2015, May). SARP: producing approximate results with small correctness losses for cloud interactive services. In Proceedings of the 12th ACM International Conference on Computing Frontiers (p. 22). ACM.

Zeng, X., Ranjan, R., Strazdins, P., Garg, S. K., & Wang, L. (2015, May). Cross-Layer SLA Management for Cloud-hosted Big Data Analytics Applications. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on (pp. 765-768). IEEE.

Rezgui, A., White, M., Rezgui, S., & Malik, Z. (2014, December). Evaluation of Linux I/O Schedulers for Big Data Workloads. In Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on (pp. 227-234). IEEE.

Ning, F., Weng, C., &Luo, Y. (2013, October). Virtualization I/O optimization based on shared memory.[PDF]. In Big Data, 2013 IEEE International Conference on (pp. 70-77). IEEE.

4. Storage

Raúl Gracia-Tinedo, Universitat Rovira i Virgili; Danny Harnik, Dalit Naor, and Dmitry Sotnikov, IBM Research Haifa; Sivan Toledo and Aviad Zuck, Tel-Aviv University. SDGen: Mimicking Datasets for Content Generation in Storage Benchmarks. in the Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST ’15). February 16–19, 2015 • Santa Clara, CA, USA.

Yunpeng Chai; Zhihui Du; Xiao Qin; Bader, D.A. “WEC: Improving Durability of SSD Cache Drives by Caching Write-Efficient Data”, Computers, IEEE Transactions on, On page(s): 3304 – 3316 Volume: 64, Issue: 11, Nov. 1 2015

Choi I S, Kee Y S. Energy Efficient Scale-In Clusters with In-Storage Processing for Big-Data Analytics[C]//Proceedings of the 2015 International Symposium on Memory Systems. ACM, 2015: 265-273.

Zhou, W., Feng, D., Tan, Z., & Zheng, Y. (2015). PAHDFS: Preference-Aware HDFS for Hybrid Storage. In Algorithms and Architectures for Parallel Processing (pp. 3-17). Springer International Publishing.

Chen, X.; Chen, W.; Lu, Z.; Long, P.; Yang, S.; Wang, Z., “A Duplication-Aware SSD-Based Cache Architecture for Primary Storage in Virtualization Environment,” in Systems Journal, IEEE , vol.PP, no.99, pp.1-12 doi: 10.1109/JSYST.2015.2494377

Xu, T. C., & Leppanen, V. (2015, October). Analysing emerging memory technologies for big data and signal processing applications. In Digital Information Processing and Communications (ICDIPC), 2015 Fifth International Conference on (pp. 104-109). IEEE.

Liu, J., Chai, Y., Qin X., & Xiao, Y. PLC-Cache: Endurable SSD Cache for Deduplication-based Primary Storage. [pdf]. In Proceeding of MSST 2014 (30th International Conference on Massive Storage Systems and Technology).

Zujie Ren, Weisong Shi and Jian Wan. Towards Realistic Benchmarking in Cloud File Systems: Early Experiences, in Procceedings of the 2014 IEEE International Symposium on Workload Characterization (IISWC ’14), Raleigh, NC, Oct. 26-28, 2014.

5. Security & Privacy

Interlandi, M., Shah, K., Tetali, S. D., Gulzar, M. A., Yoo, S., Kim, M., … & Condie, T. (2015). Titian: Data Provenance Support in Spark. Proceedings of the VLDB Endowment, 9(3).

Sherif Akoush, Lucian Carata, Ripduman Sohan and Andy Hopper, Computer Laboratory, University of Cambridge, MrLazy: Lazy Runtime Label Propagation for MapReduce. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14)

Whitham B. Towards a set of metrics to guide the generation of fake computer file systems[J]. 2014.

6. Networking

Shankar, D., Lu, X., Wasi-ur-Rahman, M., Islam, N., & Panda, D. K. D. (2014). A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks. In Big Data Benchmarks, Performance Optimization, and Emerging Hardware (pp. 19-33). Springer International Publishing.

Lu, X., Wasi-ur-Rahman, M., Islam, N. S., & Panda, D. K. D. (2014). A Micro-benchmark suite for evaluating hadoop RPC on high-performance networks. In Advancing Big Data Benchmarks (pp. 32-42). Springer International Publishing.

7. Algorithms

Xu, T. C., & Leppänen, V. (2015). DBFS: Dual Best-First Search Mapping Algorithm for Shared-Cache Multicore Processors. In Algorithms and Architectures for Parallel Processing (pp. 185-198). Springer International Publishing.

Chatzigeorgakidis, G., Karagiorgou, S., Athanasiou, S., & Skiadopoulos, S. (2015, October). A MapReduce based k-NN joins probabilistic classifier. In Big Data (Big Data), 2015 IEEE International Conference on (pp. 952-957). IEEE.

Corrales, D. C., Ledezma, A., & Corrales, J. C. A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal.