News: Call for BenchCouncil (HPC AI) System Award Nomination. The Deadline extended to October 29. The referecence implementation of the EWA workload now is avaliable on BenchHub. Papers on HPC AI500. (Bench18).
In recent years, with the trend of applying deep learning (DL) in scientific computing, the physical simulation is no longer the only class of problems to be solved in the HPC community. The unique characteristics of emerging scientific DL workloads raise great challenges in benchmarking and thus the community needs a new yard stick for evaluating the future HPC systems.
Consequently, we propose HPC AI500---a benchmark suite for evaluating HPC systems that running scientific DL workloads. Each workload from HPC AI500 bases on real scientific DL applications and covers the most representative scientific fields, namely climate analysis, cosmology, high energy physics, gravitational wave physics and computational biology. Currently, we choose 18 scientific DL benchmarks (For details, see Specification) from application scenarios, datasets, and software stack. Furthermore, we propose a set of metrics of comprehensively evaluating the HPC systems, considering both accuracy, performance as well as power and cost. In addition, we provide a scalable reference implementation of HPC AI500.
Our benchmarking methodology is shown in the Figure 1. As HPC AI is an emerging and evolving domain, we take an incremental and iterative approach. First of all, we investigate the scientific fields that use DL widely (Five areas mentioned earlier). Then, we pay attention to the typical DL workloads and data sets in these application fields.
In order to cover workloads diversity, we extract 4 important component benchmarks that represent modern HPC AI: Image Recognition, Object Detection, Image Generation, and Sequence Predicting. In each component, we chose the state-of-art software stack and model. we also select the hotspot DL operators as the micro benchmark for evaluating the upper bound performance of the system.
We chose 5 representative scientific dataset from aforementioned scientidic fileds and consider their diversity from the perspective of data formats. Therefore, we classify these matrices into three kinds of formats: 2D sparse matrix, 2D dense matrix, and 3 dimensional matrix. In each matrix format, we also consider the unique characteristics (e.g., multichannel that more than RGB, high resolution) in the scientific data.
The full description of specification and associated metrics are in section Specification.
Prof. Jianfeng Zhan, ICT, Chinese Academy of Sciences, and BenchCouncil
Zihan Jiang, ICT, Chinese Academy of Sciences
Dr Wanling Gao, ICT, Chinese Academy of Sciences
Dr Lei Wang, ICT, Chinese Academy of Sciences
Xingwang Xiong, ICT, Chinese Academy of Sciences
Yuchen Zhang, State University of New York at Buffalo
Xu Wen, ICT, Chinese Academy of Sciences
Chunjie Luo, ICT, Chinese Academy of Sciences
Hainan Ye, BenchCouncil and Beijing Academy of Frontier Sciences and Teconology
Xiaoyi Lu, The Ohio State University
Yunquan Zhang, National Supercomputing Center in Jinan, China
Shengzhong Feng, National Supercomputing Center in Shenzhen, China
Kenli Li, National Supercomputing Center in Changsha, China
Weijia Xu, Texas Advanced Computing Center, The Texas University at AustiN
HPC AI500 is available for researchers interested in HPC and AI. Software components of HPC AI500 are all available as open-source software and governed by their own licensing terms. Researchers intending to use HPC AI500 are required to fully understand and abide by the licensing terms of the various components. HPC AI500 is open-source under the Apache License, Version 2.0. Please use all files in compliance with the License. Our HPC AI500 Software components are all available as open-source software and governed by their own licensing terms. If you want to use our HPC AI500 you must understand and comply with their licenses. Software developed externally (not by HPC AI500 group)
- TensorFlow: https://www.tensorflow.org
- PyTorch: https://pytorch.org/
- Horovod: https://github.com/horovod/horovod
- Open MPI: https://www.open-mpi.org/
- Redistribution of source code must comply with the license and notice disclaimers
- Redistribution in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided by the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ICT CHINESE ACADEMY OF SCIENCES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.