Venue of Bench'19

2019 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'19)

Denver, Colorado, USA
Nov 14-16, 2019

Program

Go to schedule for: Thursday, Friday, Saturday, Program Details
Download the conference program in PDF format.

The Best Paper Award will be accompanied by a prize of $1,000.


Bench'19 Location: Copper 1
Registration: 7:30am - 17:00pm


Thursday, November 14th, 2019

Time Event
07:30-17:00 Registration @ Copper 1 meeting room
08:35-08:45 Opening Remarks (Dr. Dan Stanzione & Dr. Xiaoyi Lu)
08:45-09:15 BenchCouncil: Present and Future
Prof. Jianfeng Zhan
09:15-09:55 Keynote 1: Benchmarking Supercomputers in the Post-Moore era
Dr. Dan Stanzione, Associate Vice President for Research at The University of Texas at Austin
Abstract: In this talk, we will cover the increasing gaps between headline performance and application performance on Frontera and the last several generations of TACC supercomputers. We will also discuss the challenges of developing a new benchmark suite for the upcoming Leadership-Class Computing Facility, and solicit community input on capability benchmarks.
Bio: Dr. Dan Stanzione, Associate Vice President for Research at The University of Texas at Austin since 2018 and Executive Director of the Texas Advanced Computing Center (TACC) since 2014, is a nationally recognized leader in high performance computing. He is the principal investigator (PI) for a National Science Foundation (NSF) grant to deploy Frontera, which is the fastest supercomputer at any U.S. university. Stanzione is also the PI of TACC's Stampede2 and Wrangler systems, supercomputers for high performance computing and for data-focused applications, respectively. For six years he was co-PI of CyVerse, a large-scale NSF life sciences cyberinfrastructure. Stanzione was also a co-PI for TACC's Ranger and Lonestar supercomputers, large-scale NSF systems previously deployed at UT Austin. Stanzione received his bachelor's degree in electrical engineering and his master's degree and doctorate in computer engineering from Clemson University.
09:55-10:10 Coffee Break
10:10-11:00 Best Paper Session I (2 papers)
Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads by Yujie Hui (The Ohio State University), Jeffrey Lien (NovuMind Inc.) and Xiaoyi Lu (The Ohio State University)

GraphBench: A Benchmark Suite for Graph Computing Systems by Lei Wang and Minghe Yu (Institute of computing technology, Chinese Academy of Sciences)
11:00-11:50 Session I: Scientific Computing (2 regular papers)
Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing by Ben Blamey, Andreas Hellander and Salman Zubair Toor (Uppsala University )

Benchmark researches from the perspective of Metrology by Kun Yang, Tong Wu and Qingfei Shen (National Institute of Metrology)
12:00-13:30 Lunch @ Copper 2
13:30-14:00 Invited Talk I: FloraBench:an end-to-end application benchmark suite for datacenter
Dr. Zheng Cao, Alibaba
14:00-14:50 Session II: Performance Analysis (2 regular papers)
NTP : A Neural Net Topology Profiler by Pravin Chandran (Intel), Raghavendra Bhat (Intel), Juby Jose (Intel), Viswanath Dibbur (Ex Intel) and Prakash Sirra Ajith (Ex Sasken)

MCC: a Predictable and Scalable Massive Client Load Generator by Wenqing Wu, Xiao Feng, Wenli Zhang and Mingyu Chen (ICT, Chinese Academy of Sciences; University of Chinese Academy of Sciences)
14:50-15:05 Coffee Break
15:05-16:20 Best Paper Session II (3 papers)
Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories by Khaled Ibrahim, Samuel Williams and Leonid Oliker (Lawrence Berkeley National Laboratory )

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster by Rui Ren (Institute of computing technology, Chinese Academy of Sciences)

SSH-Backed API Performance Case Study by Anagha Jamthe, Mike Packard, Joe Stubbs (Texas Advanced Computing Center, Austin TX), Gilbert Curbelo (California State University of Monterey Bay, Marina CA), Roseline Shapi (Mississippi Valley State University, Itta Bena, MS) and Elias Chalhoub (The University of Texas at Austin)
16:20-17:10 Session III: Benchmark (2 regular papers)
Building the DataBench Workflow and Architecture by Todor Ivanov, Timo Eichhorn (Goethe University Frankfurt) and Arne Berre (SINTEF AS)

Benchmarking Solvers for The One Dimensional Cubic Nonlinear Klein Gordon Equation on a Single Core by Benson Muite (University of Tartu) and Samar Aseeri (KAUST)

Friday, November 15th, 2019

Time Event
07:30-17:00 Registration @ Copper 1 meeting room
08:35-09:15 Keynote 2: Benchmarks and Middleware for Designing Convergent HPC, Big Data and Deep Learning Software Stacks for Exascale Systems
Prof. Dhabaleswar K. (DK) Panda, OSU
Abstract: This talk will focus on challenges in designing benchmarks and middleware for convergent HPC, Deep Learning, and Big Data Analytics Software stacks for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the OSU Micro-Benchmarks (OMB) Suite and associated middleware for designing runtime environments for MPI+X programming models by taking into account support for multi-core systems (x86, OpenPOWER, and ARM), high-performance networks, and GPGPUs (including GPUDirect RDMA. Features and sample performance numbers from the MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu) will be presented. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached, together with the OSU HiBD benchmarks (http://hibd.cse.ohio-state.edu) will be presented for Big Data Analytics. For the Deep Learning domain, we will focus on a set of different benchmarks and profiling tools to deliver scalable DNN training with Horovod and TensorFlow using MVAPICH2-GDR MPI library (http://hidl.cse.ohio-state.edu).
Bio: Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 450 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,025 organizations worldwide (in 89 countries). More than 600,000 downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 3rd, 5th, 8th, 15th, 16th, 19th, and 31st ranked ones) in the TOP500 list. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group (http://hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 315 organizations in 35 countries. More than 31,300 downloads of these libraries have taken place. High-performance and scalable versions of the Caffe and TensorFlow framework are available from https://hidl.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.
09:15-09:45 Invited Talk II: by Dr Dong Li
09:45-10:00 Coffee Break
10:00-10:50 Session IV: Big Data (2 regular papers)
Benchmarking Database Ingestion Ability with Real-Time Big Astronomical Data by Qing Tang, Chen Yang, Xiaofeng Meng (Renmin University) and Zhihui Du (Tsinghua University)

A Practical Data Repository for Causal Learning with Big Data by Lu Cheng (Arizona State University), Ruocheng Guo (Arizona State University), Raha Moraffah (Arizona State University), K.S. Candan (Arizona State University), Adrienne Raglin (US Army Research Laboratory) and Huan Liu (Arizona State University)
10:50-12:00 Challenge Session I(4 talks)
Track 1: International AI System Challenge based on RISC-V
RVTensor: A light-weight neural network inference framework based on the RISC-V architecture by Yu Jiageng (Institute of Software, CAS)

Track 2: International AI System Challenge based on Cambricon Track
XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips by Guangli Li (Institute of Computing Technology, CAS)

Track 3: International AI System Challenge based on X86 Platform
Exploiting Parallelism, Sparsity and Locality to Accelerate ALS-WR on x86 Platforms by Pengyu Wang (Shanghai Jiao Tong University)

Track 4: International 3D Face Recognition Algorithm Challenge
Improving RGB-D face recognition via transfer learning from a pretrained 2D network by Xingwang Xiong (Institute of Computing Technology, CAS)
12:00-13:30 Lunch
13:30-14:00 Invited Talk III: Harmonizing High-Level Abstraction and High Performance for Graph Mining
Prof. Bo Wu, Colorado School of Mines
Abstract: Graph mining algorithms that aim at identifying structural patterns in graphs are typically more complex than graph computation algorithms such as breadth first search. Researchers have implemented several systems with high-level and flexible interfaces customized for tackling graph mining problems. However, we found that for triangle counting, one of the simplest graph mining problems, such systems can be several times slower than a single-threaded implementation of a straightforward algorithm. In this talk, I will reveal the root causes of the severe inefficiency of state-of-the-art graph mining systems and the challenges to address the performance problems. I will describe AutoMine, a system we developed to automatically generate both specialized algorithms and high-performance low-level code for arbitrary patterns.
Bio: Bo Wu is an Associate Professor in the Department of Computer Science at Colorado School of Mines. His research focuses on leveraging compiler and runtime techniques to build efficient software systems for large-scale graph analytics and machine learning applications on heterogeneous platforms. He received the best paper award at SC’15, an NSF CRII Award, an NSF Early Career Award, and an NSF SPX Award.
14:00-14:30 Invited Talk IV: Deep Learning on HPC: performance factors and lessons learned
Dr. Weijia Xu, University of Texas at Austin
Abstract: In this talk, we report several ongoing efforts for deploying and running deep learning applications using high performance computing clusters at Texas Advanced Computing Center. From both lessons learned through practices and designed experiments, we discuss several factors affecting the deep learning performances, both accuracy and execution time, at various stages of analysis pipeline from low level data storage to high level deep learning framework. The talk will end with discussions and future outlooks on development, deployment and benchmark deep learning applications at scale.
Bio: Dr. Weijia Xu is a research scientist and lead the Scalable Computational Intelligence group at Texas Advanced Computing Center at the University of Texas at Austin. He received his Ph.D. from Computer Science Department at UT Austin and has been an experienced data scientist. Dr. Xu's main research interest is to enable data-driven discoveries through developing new computational methods and applications that facilitate the data-to-knowledge transfer process. Dr. xu leads the group that supports large scale data driven analysis and machine learning applications using computing resources at TACC. His projects have been funded through various federal and state agencies including NIH, NSF, City of Austin, and USDA. He has served in program committees for several workshops and conferences in Big Data, Cloud Computing and HPC areas.
14:30-14:45 Coffee Break
14:45-15:10 Session V: Data Center (1 regular papers)
LCIO: Large Scale Filesystem Aging by Matthew Bachstein (University of Tennessee, Knoxville), Feiyi Wang and Sarp Oral (Oak Ridge National Laboratory)
15:10-16:25 Session VI: AI and Edge (3 regular papers)
Deep Reinforcement Learning for Auto-optimization of I/O Accelerator Parameters by Trong-Ton Pham and Dennis Djan (Bull - ATOS technologies)

Causal Learning in Question Quality Improvement by Yichuan Li, Ruocheng Guo, Weiying Wang and Huan Liu (Arizona State University)

SparkAIBench: A Benchmark to Generate AI Workloads on Spark by Zifeng Liu, Xiaojiang Zuo, Zeqing Li and Rui Han (Beijing institute of technology)
16:25-17:55 Tutorial: AIBench, Edge benchmark & HPC AI500
16:25-16:55 AIBench: An Industry Standard AI Benchmark Suite
By Dr. Wanling Gao, Fei Tang, ICT, CAS
16:55-17:25 Edge AIBench: Towards Comprehensive End-to-End Edge
By Tianshu Hao, ICT, CAS
17:25-17:55 HPC AI500: A Benchmark Suite for HPC AI Systems
By Zihan Jiang, ICT, CAS
18:15 Banquet (on-site)
Award Ceremony (Prof. Jianfeng Zhan, Dr. Dan Stanzione, Prof. Geoffrey Fox, & Prof. D.K. Panda)
BenchCouncil Achievement Award Lecture
Best paper award.
BenchCouncil Contribution Award Lecture
Challenges Awards

Saturday, November 16th, 2019

Time Event
07:30-17:00 Registration @ Copper 1 meeting room
08:40-09:20 BenchCouncil Achievement Award Presentation
By Award Winner
09:20-10:00 Keynote 3: InfiniBand In-Network Computing Technology for Scalable HPC/AI
Dr. Gilad Shainer, Senior VP, Mellanox
Abstract: The ever increasing demands for higher computation performance drive the creation of new datacenter accelerators and processing units. Previously CPUs and GPUs were the main sources for compute power. The exponential increase in data volume and in problems complexity, drove the creation of a new processing unit – the I/O processing unit or IPU. IPUs are interconnect elements that include In-Network Computing engines, engines that can participate in the application run time, and analyze application data as it being transferred within the data center, or at the edge. The combination of CPUs, GPUs, and IPUs, creates the next generation of data center and edge computing architectures. The first generations of IPUs are already in use in leading HPC and Deep learning data centers, have been integrated into multiple MPI frameworks, NVIDIA NCCL, Charm++ and others, and have demonstrated accelerate performance by nearly 10X.
Bio: Gilad Shainer serves as Mellanox's senior vice president of marketing, focusing on high- performance computing. Mr. Shainer joined Mellanox in 2001 as a design engineer and later served in senior marketing management roles since 2005. Mr. Shainer serves as the chairman of the HPC-AI Advisory Council organization, he serves as the president of UCF and CCIX consortiums, a board member in the OpenCAPI and OpenFabrics organizations, a member of IBTA and contributor to the PCISIG PCI-X and PCIe specifications. Mr. Shainer holds multiple patents in the field of high-speed networking. He is a recipient of 2015 R&D100 award for his contribution to the CORE-Direct In-Network Computing technology and the 2019 R&D100 award for his contribution to the UCX technology. Gilad Shainer holds MSc degree and BSc degree in Electrical Engineering from the Technion Institute of Technology in Israel.
10:00-10:15 Coffee Break
10:15-10:45 Invited Talk V: Towards a Methodology for Benchmarking Edge Processing Frameworks
Dr. Gabriel Antoniu, INRIA
Abstract: With the spectacular growth of the Internet of Things, edge processing emerged as a relevant means to offload data processing and analytics from centralized Clouds to the devices that serve as data sources (often provided with some processing capabilities). While a large plethora of frameworks for edge processing were recently proposed, the distributed systems community has no clear means today to discriminate between them. Some preliminary surveys exist, focusing on a feature-based comparison. We claim that a step further is needed, to enable a performance-based comparison. To this purpose, the definition of a benchmark is a necessity. In this talk, we make this step by discussing the definition of a methodology for benchmarking Edge processing frameworks.
Bio: Gabriel Antoniu is a Senior Research Scientist at Inria, Rennes. He leads the KerData research team, focusing on storage and I/O management for Big Data processing on scalable infrastructures (clouds, HPC systems). His main current interests regard HPC-Big Data convergence for data storage and processing aspects. He currently serves as Vice Executive Director of JLESC – Joint Inria- Illinois- ANL-BSC-JSC-RIKEN/AICS Laboratory for Extreme-Scale Computing on behalf of Inria. He received his Ph.D. degree in Computer Science in 2001 from ENS Lyon. He leads several international projects in partnership with Microsoft Research, IBM, Argonne National Lab, the University of Illinois at Urbana Champaign, Huawei. He served as Program Chair for the IEEE Cluster conference in 2014 and 2017 and regularly serves as a PC member of major conferences in the area of HPC, cloud computing and Big Data (SC, HPDC, CCGRID, Cluster, Big Data, etc.). He has acted as advisor for 19 PhD theses and has co-authored over 140 international publications in the aforementioned areas.
10:45-11:25 Keynote 4: Benchmarking Perspectives on emerging HPC workloads
Prof. Geoffrey Fox, Indiana University, APS and ACM Fellow
Bio: Geoffrey Charles Fox (https://www.engineering.indiana.edu/, http://www.dsc.soic.indiana.edu/, gcf@indiana.edu) Fox received a Ph.D. in Theoretical Physics from Cambridge University where he was Senior Wrangler. He is now a distinguished professor of Engineering, Computing, and Physics at Indiana University where he is director of the Digital Science Center. He previously held positions at Caltech, Syracuse University, and Florida State University after being a postdoc at the Institute for Advanced Study at Princeton, Lawrence Berkeley Laboratory, and Peterhouse College Cambridge. He has supervised the Ph.D. of 73 students and published around 1300 papers (over 500 with at least ten citations) in physics and computing with an hindex of 78 and over 35000 citations. He is a Fellow of APS (Physics) and ACM (Computing) and works on the interdisciplinary interface between computing and applications. Current work is in Biology, Pathology, Sensor Clouds and Ice-sheet Science, Image processing, Deep Learning and Particle Physics. His architecture work is built around High-performance computing enhanced Software Defined Big Data Systems on Clouds and Clusters. The analytics focuses on scalable parallel machine learning. He is an expert on streaming data and robot-cloud interactions. He is involved in several projects to enhance the capabilities of Minority Serving Institutions. He has experience in online education and its use in MOOCs for areas like Data and Computational Science
11:25-12:05 Keynote 5: Lightweight Requirements Engineering for Exascale Co-design
Prof. Felix Wolf
12:05-12:10 Closing Remarks

Program Details

Length of Presentations (including Q&A):

Keynotes: 40 minutes

Invited Talks: 30 Minutes

Regular Papers: 25 minutes

Challenge talks : (TBD)

Best Paper Session I

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads by Yujie Hui (The Ohio State University), Jeffrey Lien (NovuMind Inc.) and Xiaoyi Lu (The Ohio State University)

GraphBench: A Benchmark Suite for Graph Computing Systems by Lei Wang and Minghe Yu (Institute of computing technology, Chinese Academy of Sciences)

Best Paper Session II

Performance Analysis of GPU Programming Models using the Roofline Scaling Trajectories by Khaled Ibrahim, Samuel Williams and Leonid Oliker (Lawrence Berkeley National Laboratory )

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster by Rui Ren (Institute of computing technology, Chinese Academy of Sciences)

SSH-Backed API Performance Case Study by Anagha Jamthe, Mike Packard, Joe Stubbs (Texas Advanced Computing Center, Austin TX), Gilbert Curbelo (California State University of Monterey Bay, Marina CA), Roseline Shapi (Mississippi Valley State University, Itta Bena, MS) and Elias Chalhoub (The University of Texas at Austin)

Session I: Scientific Computing

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing by Ben Blamey, Andreas Hellander and Salman Zubair Toor (Uppsala University )

Benchmark researches from the perspective of Metrology by Kun Yang, Tong Wu and Qingfei Shen (National Institute of Metrology)

Session II: Performance Analysis

NTP : A Neural Net Topology Profiler by Pravin Chandran (Intel), Raghavendra Bhat (Intel), Juby Jose (Intel), Viswanath Dibbur (Ex Intel) and Prakash Sirra Ajith (Ex Sasken)

MCC: a Predictable and Scalable Massive Client Load Generator by Wenqing Wu, Xiao Feng, Wenli Zhang and Mingyu Chen (ICT, Chinese Academy of Sciences; University of Chinese Academy of Sciences)

Session III: Benchmark

Building the DataBench Workflow and Architecture by Todor Ivanov, Timo Eichhorn (Goethe University Frankfurt) and Arne Berre (SINTEF AS)

Benchmarking Solvers for The One Dimensional Cubic Nonlinear Klein Gordon Equation on a Single Core by Benson Muite (University of Tartu) and Samar Aseeri (KAUST)

Session IV: Big Data

Benchmarking Database Ingestion Ability with Real-Time Big Astronomical Data by Qing Tang, Chen Yang, Xiaofeng Meng (Renmin University) and Zhihui Du (Tsinghua University)

A Practical Data Repository for Causal Learning with Big Data by Lu Cheng (Arizona State University), Ruocheng Guo (Arizona State University), Raha Moraffah (Arizona State University), K.S. Candan (Arizona State University), Adrienne Raglin (US Army Research Laboratory) and Huan Liu (Arizona State University)

Session V: Data Center

LCIO: Large Scale Filesystem Aging by Matthew Bachstein (University of Tennessee, Knoxville), Feiyi Wang and Sarp Oral (Oak Ridge National Laboratory)

Session VI: AI and Edge

Deep Reinforcement Learning for Auto-optimization of I/O Accelerator Parameters by Trong-Ton Pham and Dennis Djan (Bull - ATOS technologies)

Causal Learning in Question Quality Improvement by Yichuan Li, Ruocheng Guo, Weiying Wang and Huan Liu (Arizona State University)

SparkAIBench: A Benchmark to Generate AI Workloads on Spark by Zifeng Liu, Xiaojiang Zuo, Zeqing Li and Rui Han (Beijing institute of technology)