BPOE-6 | The Sixth Workshop on Big data benchmarks
Introduction
Big data has emerged as a strategic property of nations and organizations. There are driving needs to generate values from big data. However, the sheer volume of big data requires significant storage capacity, transmission bandwidth, computations, and power consumption. It is expected that systems with unprecedented scales can resolve the problems caused by varieties of big data with daunting volumes. Nevertheless, without big data benchmarks, it is very difficult for big data owners to make choice on which system is best for meeting with their specific requirements. They also face challenges on how to optimize the systems and their solutions for specific or even comprehensive workloads. Meanwhile, researchers are also working on innovative data management systems, hardware architectures, operating systems, and programming systems to improve performance in dealing with big data.
This workshop, the sixth its series, focuses on architecture and system support for big data systems, aiming at bringing researchers and practitioners from data management, architecture, and systems research communities together to discuss the research issues at the intersection of these areas.
Topics
The workshop seeks papers that address hot topic issues in benchmarking, designing and optimizing big data systems. Specific topics of interest include but are not limited to:
The workshop seeks papers that address hot topic issues in benchmarking, designing and optimizing big data systems. Specific topics of interest include but are not limited to:
- Big data workload characterization and benchmarking
- Performance analysis of big data systems
- Workload-optimized big data systems
- Innovative prototypes of big data infrastructures
- Emerging hardware technologies in big data systems
- Operating systems support for big data systems
- Interactions among architecture, systems and data management
- Hardware and software co-design for big data
- Practice report of evaluating and optimizing large-scale big data systems
Papers should present original research. As big data spans many disciplines, papers should provide sufficient background material to make them accessible to the broader community.
Papers must be submitted in PDF. We will accept 12-page papers in Springer LNCS style: http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0. The submissions will be judged based on their originality, significance, relevance, and clarity of presentation. The workshop proceeding will be published by Springer LNCS (will be indexed by EI).
Submission site: https://easychair.org/conferences/?conf=bpoe6
Best paper award will be announced at the end of the workshop!
Important dates
Papers due June 7, 2015
Papers due June 15, 2015
Notification of acceptance July 1, 2015
Camera-ready copies July 30, 2015
Workshop session September 4, 2015
Best paper award will be announced according to audiences’ feedback online at the end of the workshop !
Every accepted paper is automatically eligible for this award.
The winner of this award will receive a certificate with the name of the award, to be awarded at the BPOE Workshop that year.
Program
9:00-9:05 |
Speaker:Prof. Jianfeng Zhan, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences. |
Keynote I | Title:Accelerating and Benchmarking Big Data Processing on Modern HPC Clusters HPC and Big Data |
9:05-10:05 |
Speaker:Prof. Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 350 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, iWARP and RoCE) software packages for modern clusters, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,400 organizations worldwide (in 75 countries). More than 272,000 downloads of this software have taken place from the project’s site. These software packages have enabled a large number of clusters to achieve TOP 500 ranking during the last 13 years. Examples in the latest TOP500 list includes the 7th, 11th and 15th ranked ones. The new RDMA-enabled Apache Hadoop package, RDMA-enabled Memcached package, and OSU HiBD benchmarks (OHB) are publicly available from the High-Performance Big Data (HiBD) project site (http://hibd.cse.ohio-state.edu). These packages are currently being used by more than 100 organizations in 19 countries. More than 6,000 downloads of this software have taken place from the project’s site. Prof. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including IBM, Intel, Cisco, Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda. Abstract:Modern HPC clusters are having many advanced features, such as multi-/many-core architectures, high-performance RDMA-enabled interconnects, SSD-based storage devices and parallel file systems. However, current generation Big Data middleware (such as Hadoop, Spark, and Memcached) have not fully exploited the benefits of the advanced features on modern HPC clusters, which has become a big hurdle of processing Big Data on modern HPC clusters efficiently.This talk will first provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. Then an in-depth overview of advanced designs based on RDMA and heterogeneous storage architecture for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Benefits of these designs on various cluster configurations will be shown. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these middleware. |
Session I: Performance Optimization and Evaluation
Session Chair: Dr. Xiaoyi Lu, The Ohio State University |
|
Presentation I :Evolution From Shark To Spark SQL: Preliminary Analysis and Qualitative Evaluation |
|
10:05-10:35 |
Speaker:Xinhui Tian (ICT, Chinese Academy of Sciences; University of Chinese Academy of Sciences), Xiexuan Zhou (ICT, Chinese Academy of Sciences; University of Chinese Academy of Sciences) |
10:35-11:00 |
Coffee Break |
Session II: Benchmarks and Workload Characterization
Session Chair: Dr. Xiaoyi Lu, The Ohio State University |
|
11:00-11:30 |
Presentation II:Towards a Big Data Benchmarking and Demonstration Suite for the Online Social Networks Era with Realistic Workloads and Live Data Speaker: Rui Zhang (IBM research), , Irene Manotas (University of Delaware), Min Li (IBM research), and Dean Hildebrand (IBM research) |
11:30-12:00 |
Presentation III:On Statistical Characteristics of Real-life Knowledge Graphs Speaker: Wenliang Cheng (East China Normal University), Bing Xiao (East China Normal University), Chengyu Wang (East China Normal University), Yonghao Su (East China Normal University), Weining Qian (East China Normal University), Minqi Zhou (East China Normal University) and Aoying Zhou (East China Normal University) |
12:00-12:30 |
Presentation IV:Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads
Speaker: Gang Lu (ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences) |
12:30–14:00 |
Lunch Break |
Keynote II | Title:Fast Data Accesses in Memory and Storage |
14:00-15:00 |
Speaker:Prof. Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering and Chair of the Computer Science and Engineering Department at the Ohio State University. His research interests focus on data management in computer and distributed systems. He has made strong efforts to transfer his academic research into advanced technology to update the design and implementation of major general-purpose computing systems. He received his Ph.D. in Computer Science from University of Colorado at Boulder, where he received Distinguished Engineering Alumni Award in 2011. He is a Fellow of the ACM, and a Fellow of the IEEE. Abstract:The basic requirement of data processing is to read and write data either in memory or/and in storage in fast speed at a low cost. As the data volume generated in the society continues to grow in an increasingly rapid way, conventional data accessing methods, such as LSM-tree, key-value store and others show their limits and inability to handle the big data with a sustained high speed in a scalable fashion. In this talk, I will present several critical issues under the big data environment for conventional data accessing methods. I will also present two new research results: (1) providing both fast online data accesses by reserving buffer cache locality and retaining efficient writes to storage in logged and sequential structures; and (2) maximizing the throughput of in-memory key-value store by GPUs. |
15:00-15:30 | Coffee Break |
Session III: Emerging Hardware Session Chair: Ahsan Javed Awan (KTH Royal Institute of Technology) |
|
15:30-16:00 |
Presentation V:A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Speaker:Adithya Bhat (The Ohio State University), Nusrat Islam (The Ohio State University), Xiaoyi Lu (The Ohio State University), Md. Wasi-Ur- Rahman (The Ohio State University), Dipti Shankar (The Ohio State University) and Dhabaleswar K Panda (The Ohio State University) |
16:00-16:30 |
Presentation VI:Stream-based Lossless Data Compression Hardware using Adaptive Frequency Table Management Speaker:Shinichi Yamagiwa (Faculty of Engineering, Information and Systems/Department of Computer Science University of Tsukuba), Koichi Marumo (Faculty of Engineering, Information and Systems/Department of Computer Science University of Tsukuba) and Hiroshi Sakamoto (Graduate School of Computer Science and Systems Engineering Kyushu Institute of Technology) |
Session IV: Performance Analysis and Optimization Session Chair: Gang Lu (ICT, Chinese Academy of Sciences) |
|
16:30-17:00 |
Presentation VII:How Data Volume Affects Spark Based Data Analytics on a Scale-up Server Speaker:Ahsan Javed Awan (KTH Royal Institute of Technology, Software and Computer Systems Department), Mats Brorsson (KTH Royal Institute of Technology, Software and Computer Systems Department), Vladimir Vlassov (KTH Royal Institute of Technology, Software and Computer Systems Department ) and Eduard Ayguade (Technical University of Catalunya (UPC), Computer Architecture Department). |
17:00-17:30 |
Presentation VIII:An Optimal Reduce Placement Algorithm for Data Skew based on Sampling Speaker:Zhuo Tang (College of Computer Science and Electronic Engineering, Hunan University), Wen Ma (College of Computer Science and Electronic Engineering, Hunan University) , Kenli Li (College of Computer Science and Electronic Engineering, Hunan University), and Keqin Li (The Department of Computer Science, State University of New York) |
17:30-17:40 |
Announcement of Best Paper Award and Closing remarks Speaker:Prof. Jianfeng Zhan, Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences. |
Organization
Steering committee:
- Christos Kozyrakis, Stanford
- Xiaofang Zhou, University of Queensland
- Dhabaleswar K Panda, Ohio State University
- Aoying Zhou, East China Normal University
- Raghunath Nambiar, Cisco
- Lizy K John, University of Texas at Austin
- Xiaoyong Du, Renmin University of China
- Ippokratis Pandis, IBM Almaden Research Center
- Xueqi Cheng, ICT, Chinese Academy of Sciences
- Bill Jia, Facebook
- Lidong Zhou, Microsoft Research Asia
- H. Peter Hofstee, IBM Austin Research Laboratory
- Alexandros Labrinidis, University of Pittsburgh
- Cheng-Zhong Xu, Wayne State University
- Jianfeng Zhan, ICT, Chinese Academy of Sciences
- Guang R. Gao, University of Delaware.
- Yunquan Zhang, ICT, Chinese Academy of Sciences
PC Co-Chair:
- Prof. Jianfeng Zhan, Institute of Computing Technology (ICT), Chinese Academy of Sciences and University of Chinese Academy of Sciences
- Prof. Roberto V. Zicari, Frankfurt Big Data Lab, Goethe University, Frankfurt, Germany.
- Dr. Rui Han, ICT, Chinese Academy of Sciences
Web Chair:
- Lei Wang, ICT, Chinese Academy of Sciences
Publicity Chair:
- Zhen Jia, ICT, Chinese Academy of Sciences
- Yingjie Shi, Shangdong University of Science and Technology
Program committee (Confirmed) :
- Bingsheng He, Nanyang Technological University
- Xu Liu, College of William and Mary
- Rong Chen, Shanghai Jiao Tong University
- Weijia Xu, Texas Advanced Computing Center, University of Texas at Austin
- Lijie Wen, School of Software, Tsinghua University
- Xiaoyi Lu, The Ohio State University
- Yuqing Zhu, Institute of computing technology, Chinese Academy of Sciences
- Yueguo Chen, Renmin University
- Edwin Sha, Chongqing University
- Mingyu Chen, Institute of Computing Technology, Chinese Academy of Sciences
- Zhenyu Guo, Microsoft
- Tilmann Rabl, University of Toronto
- Farhan Tauheed, EPFL
- Chaitan Baru, San Diego Supercomputer Center, UC San Diego
- Seetharami Seelam, IBM
- Rene Mueller, IBM Research
- Cheqing Jin, East China Normal University
October 7, 2013 |
IEEE BigData Conference, San Jose, CA |
|
October 31,2013 |
CCF HPC China, Guilin, China |
|
December 5,2013 |
CCF Big Data Technology Conference 2013, BeiJing, China |
|
March 1, 2014 |
ASPLOS 2014, Salt Lake City, Utah, USA |
|
September 5, 2014 |
VLDB 2014, Hangzhou, Zhejiang Province, China |