The end-to-end application benchmark consists of four modules: online server, offline analyzer, query generator, and data storage, as shown in Fig. 1. Among them, online server receives the query requests and performs personalized searching and recommendation, integrated with AI inference.

Offline analyzer chooses the appropriate AI algorithm implementations and performs a training stage to generate a learning model. Also, the offline analyzer is responsible to build data indexes to accelerate data access.

Query generator is to simulate concurrent users and send query requests to online server based on a specific configuration. The configuration designates parameters like concurrency, query arriving rate, distribution, and user thinking time, to simulate different query characteristics and satisfy multiple generation strategies. We implement our query generator based on JMeter.

Data storage module stores all kinds of data, including the user database that saves all the attributes of user information, the product database that holds all the attributes of product information, logs that record the complete query histories, text data that contains the product description text or the user comments, image and video data that depict the appearance and usage of product vividly, and audio data that stores the voice search data and voice chat data. Overall, the data storage covers various data types including structured, un-structured, and semi-structured data, and diverse data sources, including table, text, image, audio and video.

To support scalable deployment on the clusters with different scales, each module is able to deploy on multiple nodes. Also, a series of data generators are provided to generate the e-commerce data with different scales, through setting several parameters, e.g., the number of products and product attribute fields, the number of users and user attribute fields.

Figure 1 AIBench Specification.

Online Server

Online server provides personalized searching and recommendations combining traditional machine learning and deep learning technologies. Online server consists of four submodules including search planer, recommender, searcher, and ranker.

Search Planer is the entrance of online server. It is responsible for receiving the query requests from query generator, and sending the request to the other online components and receiving the return results. We use the Spring Boot framework to implement the search planer.

Recommender is to analyze the query item and provides personalized recommendation, according to the user information obtained from the user database. It first conducts query spelling correction and query rewrite, then it predicts the belonged category of the query item based on a classification model—FastText. Using a deep neural network proposed by Alibaba, the query planer then conducts an inference process and uses the offline trained model to provide personalized recommendation. It returns two vectors—the probability vector of the predicted categories and the user preference score vector of product attribute, such as the user preference for brand, color, etc. We use the Flask Web framework and Nginx to build our recommender for category prediction, and adopt TensorFlow serving to implement online recommendation.

To guarantee scalability and service efficiency, Searcher follows an industry scale architecture. Searcher is deployed on several different, i.e., three clusters, that hold the inverted indexes of product information in memory to guarantee high concurrency and low latency. In view of the click-through rate and purchase rate, the products belong to three categories according to the popularity—high, medium, and low, occupying the proportion of 15%, 50%, and 50%, respectively. Note that the high popularity category is the top 15% popular products chosen from the medium popularity category. The indexes of products with different popularities are stored into different clusters. Given a searching request, the searcher searches these three clusters one by one, until reaching a specific amount. Generally, the cluster that holds low popularity products is rarely searched in a realistic scenario. So for each category, searcher adopts different deployment strategies. The cluster for high popularity contains more nodes and more backups to guarantee the searching efficiency. While the cluster for low popularity deploys the least number of nodes and backups. We use the Elasticsearch to set up and manage the three clusters of searcher.

Ranker uses the weight returned by Recommender as initial weight, and ranks the scores of products through a personalized L2R neural network. The ranker also uses the Elasticsearch to implement product ranking.

Offline Analyzer

Offline analyzer is responsible for training models and building indexes to improve the online serving performance. It consists of three parts—AI trainer, job scheduler, and indexer.

AI trainer is to train models using related data stored in database. To dig the features within product data, e.g., text, image, audio, video, and power the efficiency of online server, the AI trainer chooses ten AI algorithms (component benchmarks) from the AIBench framework currently, including classification for category prediction, recommendation for personalized recommendation, learning to ranking for result scoring and ranking, image-to-text for image caption, image-to-image and image generation for image resolution enhancement, face embedding for face detection within an image, spatial transformer for image rotating and resizing, object detection for detecting video data, speech recognition for audio data recognition.

Job schedule provides two kinds of training mechanisms: batch processing and streaming-like processing. In a realistic scenario, some models need to be updated frequently. For example, when we search an item and click one product showed in the first page, the application will immediately train a new model based on the product that we just clicked, and make new recommendations shown in the second page. Our benchmark implementation considers this situation, and adopts a streaming-like approach to update the model every several seconds. For batch processing, the trainer will update the models every several hours. The indexer is to build indexes for product information. In total, the indexer provides three kinds of indexes, including the inverted indexes with a few fields of products for searching, forward indexes with a few fields for ranking, and forward indexes with a majority of fields for summary generation.