Introduction

Big data has emerged as a strategic property of nations and organizations. There are driving needs to generate values from big data. However, the sheer volume of big data requires significant storage capacity, transmission bandwidth, computations, and power consumption. It is expected that systems with unprecedented scales can resolve the problems caused by varieties of big data with daunting volumes. Nevertheless, it is very difficult for big data owners to make choice on which system is most suited for their specific requirements. They also face challenges on how to optimize the systems and their solutions. Meanwhile, system researchers are working on new hardware architecture, operating systems, programming systems, and data management systems to improve performance in dealing with big data.

This workshop, the fourth in its series, aims at bringing researchers and practitioners in related areas together to discuss the research issues at the intersection of these areas, and also to draw much attention from architecture, systems, programming, and data management research communities to this new and highly promising field.

Topics

The workshop seeks papers that address hot topic issues in benchmarking, designing and optimizing big data systems. Early stage work, new ideas, unconventional approaches are encouraged. Specific topics of interest include but are not limited to:

  • Big data workload characterization and benchmarking
  • Innovative computer and memory architecture for big data
  • Emerging hardware technologies in big data systems
  • Innovative operating systems and programming systems for big data
  • Interactions among architecture, systems and data management
  • Performance analysis and optimization of big data systems
  • Innovative prototypes of big data infrastructures
  • Practice report of evaluating and optimizing large-scale big data systems

Papers should present original research. As big data spans many disciplines, papers should provide sufficient background material to make them accessible to the broader community.

Venue Information

Salt Lake City, Utah, USA

Important dates

Submission Deadline: Extended to January 14, 2014 (11:59pm PST)

Author Notification: January 25, 2014

Final Copy Due: February 17, 2014

Workshop: Saturday, March 1, 2014

Workshop home page:

Submission Web page: https://www.easychair.org/conferences/?conf=bpoe4

Organization

Steering committee:

  • Christos Kozyrakis, Stanford
  • Xiaofang Zhou, University of Queensland
  • Dhabaleswar K Panda, Ohio State University
  • Aoying Zhou, East China Normal University
  • Raghunath Nambiar, Cisco
  • Lizy K John, University of Texas at Austin
  • Xiaoyong Du, Renmin University of China
  • Ippokratis Pandis, IBM Almaden Research Center
  • Xueqi Cheng, ICT, Chinese Academy of Sciences
  • Bill Jia, Facebook
  • Lidong Zhou, Microsoft Research Asia
  • H. Peter Hofstee, IBM Austin Research Laboratory
  • Haibo Chen, Shanghai Jiaotong University
  • Alexandros Labrinidis, University of Pittsburgh
  • Cheng-Zhong Xu, Wayne State University
  • Jianfeng Zhan, ICT, Chinese Academy of Sciences

 

Program Chair:

  • Jianfeng Zhan, ICT, Chinese Academy of Sciences
  • Chuliang Weng, Shannon (IT) Lab, Huawei

 

Web Chair:

  • Lei Wang, ICT, Chinese Academy of Sciences

 

Publicity Chairs:

  • Yuqing Zhu (Data management), ICT, CAS
  • Gang Lu (Operating systems), ICT, CAS
  • Zhen Jia (Architecture), ICT, CAS

 

Program Committee

  • Onur Mutlu, Carnegie Mellon University
  • Xu Liu, Rice University
  • Yunquan Zhang, ICT, Chinese Academy of Sciences
  • Meikel Poess, Oracle Corporation
  • Dejun Jiang, ICT, Chinese Academy of Sciences
  • Yueguo Chen, Renmin University
  • Rene Mueller, IBM
  • Xiaoyi Lu, Ohio State University
  • Yongqiang He, Dropbox
  • Edwin Sha, University of Texas at Dallas
  • Kun Wang, IBM Research China
  • Rong Chen, Shanghai Jiaotong University
  • Jens Teubner, Tu Dortmund University
  • Yinliang Yue, ICT, Chinese Academy of Sciences
  • Mauricio Breternitz, AMD Research
  • Seetharami Seelam, IBM
  • Zhenyu Guo, MSRA
  • Farhan Tauheed, EPFL
  • Gansha Wu, Intel
  • Bingsheng He, Nanyang Technological University
  • Zhibin Yu, SIAT, Chinese Academy of Sciences
  • Lei Wang, ICT, Chinese Academy of Sciences
  • Yuanchun Zhou, CNIC, Chinese Academy of Sciences
  • Tilmann Rabl, University of Toronto
  • Weijia Xu, TACC, University of Texas at Austin
  • Mingyu Chen, ICT, Chinese Academy of Sciences
  • Jian Ouyang, Baidu
  • Wentao Qu, Google, US

PAPER SUBMISSION

Online submission site: https://www.easychair.org/conferences/?conf=bpoe4

 

Papers must be submitted in PDF, and be no more than 6 pages in standard two-column SIGPLAN conference format including figures and tables but not including references. The submissions will be judged based on the merit of the ideas rather than the length. Submissions must be made through the on-line submission site. Final papers and presentations will be accessible from the workshop website, but to facilitate resubmission to more formal venues, no archival proceedings will be published, and papers will not be sent to the ACM Digital Library. After proceeding, revised papers will be published by Springer LNCS (www.springer.com/lncs, indexed by EI).

logs

Program

March 1, 2014

Morning

Session chair: Professor Jianfeng Zhan

Time Topic People
9:00-9:05 Opening remark[.PPT] Professor Jianfeng Zhan
9:05-10:00 Keynote presentation: Big Data Workloads: An Architect’s Perspective[.PPT] Professor Lizy K John University of Texas at Austin
10:00-10:30 Morning break
10:30-11:25 Keynote presentation: Accelerating Big Data Processing with RDMA-Enhanced Apache Hadoop[.PDF] Professor Dhabaleswar K. (DK) Panda The Ohio State University
11:25-11:45 On Big Data Benchmarking[PDF] Rui Han and Xiaoyi Lu
11:45-12:05 Performance Benefits of DataMPI: A Case Study with BigDataBench[PDF] Liang Fan, Feng Chen, Lu Xiaoyi and XuZhiwei

 

Afternoon

Session chair: Dr. ChuliangWeng

Time Topic People
14:00-14:55 Keynote presentation: Resource Efficient Cloud Computing[PDF]
Professor Christos Kozyrakis Stanford
15:00-15:30 Afternoon break
15:30-16:25 Keynote presentation: Power Technology For a Smarter Future Dr. Jeff Stuecheli IBM
16:25-16:45 Exploring Opportunities for Non-Volatile Memories in Big Data Applications[PDF] Wei Wei, Dejun Jiang, Jin Xiong and Mingyu Chen
16:45-17:05 Tuning Hadoop map slot value using CPU metric[PDF] Kamal Kc and Vincent Freeh
17:05-17:25 I/O Characterization of Big Data Workloads in Data Centers[PDF] Fengfeng Pan, YinliangYue and Jin Xiong
17:25-17:45 Characterizing Workload of Web Applications on Virtualized Servers[PDF] Xiajun Wang, Song Huang, Song Fu and Krishna Kavi
17:45-18:00 Benchmarking Trajectory Data for Trip Recommendation[PDF] Kuien Liu, Yaguang Li, Shuo Shang and Kai Zheng
18:00-18:05 Closing remark Dr. ChuliangWeng

Keynote Speakers

1. Big Data Workloads: An Architect’s Perspective

Professor Lizy Kurian John

University of Texas at Austin

http://users.ece.utexas.edu/~ljohn/

Abstract:

Much of the modern big data is unstructured data as opposed to traditional table-based structured data. While the traditional relational data base may not disappear for years to come, it is clear that other computing paradigms processing unstructured (and semi-structured) data will gain momentum in the light of the nature of the emerging big data. Applications that perform advanced analytics and machine learning on graphs will become more prevalent. Analysis of web-scale graphs from social networks will become important due to commercial significance. Understanding of emerging big data computing workloads is required in order to drive hardware and software development. Several questions need to be answered in order to design appropriate computing systems to support these applications in a performance, power and energy-efficient fashion. It would be interesting to analyze communication patterns in these applications and study the relevance for mechanisms supporting localized communications (fog computing as opposed to cloud). This talk will describe our ongoing research to investigating these and like questions. Efforts in workload characterization and big-data performance evaluation will be described.

Bioigraphy:

Lizy Kurian John is B. N. Gafford Professor in the Electrical and Computer Engineering at UT Austin. She received her Ph. D in Computer Engineering from the Pennsylvania State University. Her research interests include workload characterization, performance evaluation, benchmarking, and high performance processor and memory architectures for emerging workloads. She is recipient of NSF CAREER award, UT Austin Engineering Foundation Faculty Award, Halliburton, Brown and Root Engineering Foundation Young Faculty Award , University of Texas Alumni Association (Texas Exes) Teaching Award, The Pennsylvania State University Outstanding Engineering Alumnus, etc. She has coauthored a book on Digital Systems Design using VHDL (Thomson Publishers) and has edited 4 books including a book on Computer Performance Evaluation and Benchmarking. She holds 8 US patents and is a Fellow of IEEE.

 

2. Accelerating Big Data Processing with RDMA-Enhanced Apache Hadoop

Professor Dhabaleswar K. (DK) Panda

The Ohio State University

http://www.cse.ohio-state.edu/~panda/

Abstract:

The Hadoop framework has become one of the most popular open-source solution for Big Data processing. Traditionally, Hadoop communication calls are implemented over sockets and do not deliver best performance on modern clusters with high performance interconnects. This talk will examine opportunities and challenges in accelerating Hadoop with Remote DMA (RDMA) support, as available with InfiniBand, RoCE (RDMA over Converged Enhanced Ethernet) and other modern interconnects. The talk will start with an overview of the RDMA for Apache Hadoop project (http://hadoop-rdma.cse.ohio-state.edu). Then, high-performance designs using RDMA to accelerate the Hadoop framework on InfiniBand and RoCE clusters will be demonstrated. Specific designs and case-studies to accelerate multiple components of Hadoop (such as HDFS, MapReduce, RPC and HBase) will be presented. In-depth performance results and their trends using a range of low-level micro-benchmarks (OSU Hadoop Micro-Benchmarks), higher-level benchmarks from BigDataBench/PUMA/SWIM suites will be presented.

Bioigraphy:

Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. His research interests include parallel computer architecture, high performance networking, InfiniBand, exascale computing, Big Data, programming models, GPUs and accelerators, high performance file systems and storage, virtualization, and cloud computing. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X software libraries, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,100 organizations worldwide (in 71 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 202,000 downloads of this software have taken place from the project’s website alone. This software package is also available with the software stacks of many network and server vendors, and Linux distributors. The new RDMA for Apache Hadoop package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from http://hadoop-rdma.cse.ohio-state.edu. Dr. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.

 

3. Resource Efficient Cloud Computing

Professor Christos Kozyrakis

Stanford University

http://csl.stanford.edu/~christos/

 

Abstract:

Cloud computing promises flexibility, high performance, and cost efficiency for both users and operators. Nevertheless, users frequently experience high variability in performance and most cloud facilities operate at very low utilization, greatly reducing their cost effectiveness. The primary sources of low utilization and performance jitter are the poor understanding of the relationship between resource usage and performance, the interference that makes it difficult to share resources between workloads, and the hardware heterogeneity in modern datacenters. Moreover, neither user-level applications nor the system software stack are structured in a manner that helps alleviates these issues. This talk will discuss techniques to improve resource efficiency in cloud facilities. We will focus primarily on cluster management challenges such as resource provisioning and allocation, workload co-scheduling, and cluster-level energy management. We will also discuss briefly how resource efficiency interacts with benchmarking and workload evaluation.

Bioigraphy:

Christos Kozyrakis is an Associate Professor of Electrical Engineering & Computer Science at Stanford University. He works on architectures, runtime environments, and programming models for parallel computing systems. At Berkeley, he developed the IRAM architecture, a novel media-processor system that combined vector processing with embedded DRAM technology. At Stanford, he co-led the Transactional Coherence and Consistency (TCC) project at Stanford that developed hardware and software mechanisms for programming with transactional memory. He also led the Raksha project, that developed practical hardware support and security policies to deter high-level and low-level security attacks against deployed software. Dr. Kozyrakis is currently working on hardware and software techniques for resource efficient cloud computing. He is also a member of the Pervasive Parallelism Lab at Stanford, a multi-faculty effort to make parallel computing practical for the masses.

Christos received a BS degree from the University of Crete (Greece) and a PhD degree from the University of California at Berkeley (USA), both in Computer Science. He is the Willard R. and Inez Kerr Bell faculty scholar at Stanford and a senior member of the ACM and the IEEE. Christos has received the NSF Career Award, an IBM Faculty Award, the Okawa Foundation Research Grant, and a Noyce Family Faculty Scholarship..

 

4. Power Technology For a Smarter Future

DrJeff Stuecheli

STSM POWER Systems Hardware Architect

http://www.linkedin.com/pub/jeff-stuecheli/2/664/a0a

 

Abstract:

TBA.

Bioigraphy:

Jeff is a computer architect on the POWER line of systems. His primary area of focus is the development of performance enhancements of the memory subsystem. This includes caches, prefetch, networks, memory, and IO structures in the system. Day-to-day Jeff works closely with research peers, bringing new ideas and suggestions to the development organization. Thanks to his leadership in the design and development of POWER products, he has helped IBM clients around the world innovate in a myriad of commercial and scientific realms.

Previous event

Workshop

Dates

Location

BPOE-1 October 7, 2013 IEEE BigData Conference, San Jose, CA
BPOE-2 October 31,2013 CCF HPC China, Guilin, China
BPOE-3 December 5,2013 CCF Big Data Technology Conference 2013, BeiJing, China

Next event

Workshop Dates

Location

BPOE-5 September 5, 2014 VLDB 2014, Hangzhou, Zhejiang Province, China