WPC: Whole-picture Workload Characterization



We present a whole-picture workload characterization (in short, WPC) methodology and the tool. WPC integrates microarchitecture-dependent, microarchitecture-independent, and ISA-independent characterization methodologies. It performs a whole-picture analysis on hierarchical profile data across Intermediate Representation (IR), ISA, and microarchitecture to sum up the inherent workload characteristics and understand the reasons behind the numbers.


As shown in Fig. 1, our methodology covers three levels across IR, ISA, and microarchitecture.

We collect the IR stream with the LLVM tool, feed them to a logical processor model with no cache and pipeline model. We collect the binary stream with the Pin tool, feed them to a perfect processor model with no pipeline model but a perfect cache model, which always completes all memory references in a single cycle. We use a performance monitoring unit (PMU) tool like Perf to obtain microarchitecture metrics on specific processors. The IR level analysis is beyond the runtime environment and OS and ISA-independent, while the ISA level analysis is beyond the OS and micro-architecture independent. The micro-architecture level analysis is affected by the OS, and micro-architecture independent. We use a series of performance metrics at each level, e.g., instructions execution behavior, instruction locality, to depict workload characteristics in a combined and comprehensive way.


To facilitate our analysis, we develop the WPC tool to collect, analyze, and visualize performance metrics. It mainly consists of two parts: multi-level profiler and performance data analyzer. The profiler profiles the workloads and gathers performance data, and then the performance data analyzer processes those data with different modules. The profiler transforms Java or Python bytecode into the LLVM bitcode and profiles the LLVM bitcode on the IR level. We use Pin and Perf tools at the ISA and microarchitecture levels, respectively. After running each workload, the performance data analyzer's collector module will collect all the data from the profiler and store them into the database. The analyzer reads raw data from the database and calculates metrics. The figure plotter plots different kinds of figures, which facilitate users to analyze the performance metrics.


Link: https://github.com/BenchCouncil/WPC