Due to increasing amounts of data and compute resources, the deep learning achieves many successes in various domains. Recently, researchers and engineers make effort to apply the intelligent algorithms to the mobile or embedded devices, e.g. smart phone, self-driving cars, smart home. On one hand, the neural networks are made more lightweight to adapt the mobile or embedded devices by using simpler architecture, or by quantizing, pruning and compressing the networks. On the other hand, the mobile and embedded devices provide additional hardware acceleration using GPUs or NPUs to support the AI applications. Since AI applications on mobile and embedded devices get more and more attention, the benchmarking of the AI ability of those devices becomes an urgent problem to be solved.
AIoTBench, is a comprehensive benchmark suite to evaluate the AI ability of mobile and embedded devices. Our benchmark 1) covers different application domains, e.g. image recognition, speech recognition and natural language processing; 2) covers different platforms, including Android devices and Raspberry Pi; 3) covers different development tools, including TensorFlow and Caffe2; 4) offers both end-to-end application workloads and micro workloads.

The workloads in AIoTBench are implemented using both TensorFlow Lite and Caffe 2 on the platform of Android as well as Raspberry Pi. Only the prediction procedure are included since the training are usually carried out on datacenters.
Image classification workload. This is an end-to-end application workload of vision domain, which takes an image as input and outputs the image label. The model we use for image classification is MobileNet, which is a light weight convolutional network designed for mobile and embedded devices.
Speech recognition workload. This is an end-to-end application workload of speech domain, which takes words and phrases in a spoken language as input and converts them to the text format. The model we use is the DeepSpeech 2, which consists of 2 convolutional layers, 5 bidirectional RNN layers, and a fully connected layer.
Transformer translation workload. This is an end-to-end application workload of NLP domain, which takes the text of one language as input and translates into another language. The model we use is transformer translation model, which solves sequence to sequence problems using attention mechanisms without recurrent connections used in traditional neural seq2seq models.
Micro workloads. In our benchmarks, we provide the micro workloads, which are the basic operations to compose different networks. In detail, the micro workloads include convolutional operation, pointwise convolution, depthwise convolution, matrix multiply, pointwise add, ReLU activation, sigmoid activation, max pooling, average pooling.