Biography
I am an assistant professor at the Institute of Computing Technology,
Chinese Academy of Sciences since 2019.
My research focuses on big data and AI benchmarking, computer architecture, and proxy benchmarks for simulation.
I received my Ph.D. degree in 2019 from Institute of Computing Technology,
Chinese Academy of Sciences, and University of Chinese Academy of Sciences.
Prof. Jianfeng Zhan is my advisor.
I received my B.S. degree in 2012 from Huazhong University of Science and Technology in China. |
Research
My research interests focus on the following points of Data center computing and benchmarking:
- Big data and AI benchmarking.
- Computer Architecture.
- Profiling and tracing in data centers.
- Proxy benchmarks for simulation.
- AIBench: An End-to-end Datacenter AI Benchmark Suite. AIBench is the first industry scale AI benchmark suite. First, we present a highly extensible, configurable, and flexible benchmark framework, containing multiple loosely coupled modules like data input, prominent AI problem domains, online inference, offline training and automatic deployment tool modules. We analyze typical AI application scenarios from three most important Internet services domains, including search engine, social network, and e-commerce, and then we abstract and identify sixteen prominent AI problem domains, including classification, image generation, text-to-text translation, image-to-text, image-to- image, speech-to-text, face embedding, 3D face recognition, object detection, video prediction, image compression, recommendation, 3D object reconstruction, text summarization, spatial transformer, and learning to rank. AIBench consists of 12 micro benchmarks, 16 component benchmarks, and an end-to-end application benchmarks: E-commerce AI---a business AI benchmark. The benchmarks have the ability to run not only collectively as a whole end-to-end application to discover the time breakdown of different modules but also individually as a micro or component benchmark for fine tuning hot spot functions or kernels.
- Data Motifs: A Lens towards Fully Understanding Big Data and AI Workloads. Identifying abstractions of time-consuming units of computation is an important step toward domain-specific hardware and software co-design. Straightforwardly, we can tailor the architecture to characteristics of an application, several applications, or even a domain of applications. Data motif is a new approach to modelling and characterizing big data and AI workloads. We consider each big data and AI workload as a pipeline of one or more classes of unit of computation performed on different initial or intermediate data inputs, each of which captures the common requirements while being reasonably divorced from individual implementations. We call this abstraction a data motif. Significantly different from the traditional kernels, a data motif’s behaviors are affected by the sizes, patterns, types, and sources of different data inputs; Moreover, it reflects not only computation patterns, memory access patterns, but also disk and network I/O patterns.
- BigDataBench 4.0 project. BigDataBench 4.0 adopts a scalable data motif based benchmarking methodology and contains 13 representative real-world data sets and 47 benchmarks. The latest version of BigDataBench is available from http://www.benchcouncil.org/BigDataBench/index.html.
- Proxy Benchmark Generating for Big Data and AI. We propose a data motif-based proxy benchmark generating methodology which combines data motifs with different weights to mimic the big data and AI workloads. The proxy benchmarks shorten the execution time by 100s times on real systems while maintaining the average system and micro-architecture performance data accuracy above 90%, even changing the input data sets or cluster configurations. Moreover, the generated proxy benchmarks reflect consistent performance trends across different architectures.
- High Throughput Computers project, Institute of Computing Technology, Chinese Academy of Sciences, Beijing. System and micro-architectural level analysis of high throughput workloads.
Publications
-
AIBench: An Industry Standard Internet Service AI Benchmark Suite. [pdf].arXiv preprint arXiv:1908.08998, 2019.
-
AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking. [pdf].2018 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench18).
-
Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads. [pdf].The 27th International Conference on Parallel Architectures and Compilation Techniques (PACT18).
-
Data Motif-based Proxy Benchmarks for Big Data and AI Workloads. [pdf].IEEE International Symposium on Workload Characterization (IISWC 2018).
-
BigDataBench: a Data Motif-based Big Data and AI Benchmark Suite [pdf].arXiv preprint arXiv:1802.08254.
-
BigDataBench: a Big Data Benchmark Suite from Web Search Engines [pdf].Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with The 40th International Symposium on Computer Architecture, May 2013.
-
BigDataBench:An Open Source Big Data Benchmark Suite [pdf].CHINESE JOURNAL OF COMPUTERS, 2016.
-
Landscape of big medical data: a pragmatic survey on prioritized tasks. [pdf].IEEE Access, 2019.
-
Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers. [pdf].IEEE Transactions on Parallel and Distributed Systems, 28(6), 1797-1810, 2017.
-
CVR: Efficient Vectorization of SpMV on X86 Processors. [pdf].CGO 2018.
-
BigDataBench: a Big Data Benchmark Suite from Internet Services. [pdf].HPCA 2014, Industry Session.
-
Understanding Big Data Analytics Workloads on Modern Processors. [pdf].TPDS 2016.
-
Trends on Methods for Prediction of Tandem Mass Spectra of Peptides. [pdf].Progress in Biochemistry and Biophysics, 2018.
Invited Talks & Tutorials
Invited Talks
- Data Motif: A Benchmark Proposal for Big Data and AI
2018 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench18). - BigDataBench: A Dwarf-based Big Data and AI Benchmark Suite
BPOE-9: The ninth workshop on Big data benchmarks, Performance, Optimization, and Emerging hardware, in conjunction with Architectural Support for Programming Languages and Operating Systems (ASPLOS 2018) - Big Data Dwarfs: Methodology, Dwarf Library and Simulation Benchmarks
BPOE-8: The eighth workshop on Big data benchmarks, Performance, Optimization, and Emerging hardware.
- AIBench Tutorial in conjunction with High Performance Computer Architecture (HPCA 2020)
- BigDataBench Tutorial on 2018 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'18)
- BigDataBench Tutorial in conjunction with Architectural Support for Programming Languages and Operating Systems (ASPLOS 2018).
- High Volume Computing: The Motivations, Metrics, and Benchmarks Suites for Data Center Computer Systems.
in conjunction with the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013)
Professional Activities
Honors
- 2019.1 Outstanding Graduates of UNIVERSITY OF CHINESE ACADEMY OF SCIENCE
- 2018.6 Merit Student of UNIVERSITY OF CHINESE ACADEMY OF SCIENCE
- 2017.10 National Scholarship for Ph.D.
- 2015.12 Doctoral scholarship, ICT, CAS, Beijing