LinkedIn Pensieve: A Deep Supervised Learning Embedding Feature
Besides all the great opportunities and features provided by LinkedIn, it also offers an embedding platform based on a supervised learning approach to matching appropriate matches between job seekers and employers. This article hopes to guide you through the how and what of this model.
Before getting into anything, it is important to know what supervised learning is.
So, Supervised learning is a machine learning technique that helps a machine learn various classification and recognition parameters using a set of labeled data. This can be thought of as a child learning under guidance, where the child is told what the correct goal is and also informed if at any point the child is giving results that do not lead to the goal. Other types of learning include unsupervised and reinforcement learning.
LinkedIn aims to provide a hassle-free medium for job seekers and employers to ensure their reach to each other is appropriate and lead to hiring. For this, several AI models are used together to extract appropriate jobs and display them. To make this task even more efficient and accurate, LinkedIn created a new embedding feature, Pensieve, that benefits from the supervised learning model to train models and produce entity embeddings. Alternatively, representation learning could have been used, but it is computationally very expensive for latency-sensitive applications. So, Pensieve is built on supervised learning to pre-compute and publish entity embeddings.
This can be divided into three sectors:
Offline Training Pipeline
This enables developers to focus on practicing deep learning by integrating training data generation into the infrastructure. This is done by focusing on agile experimentation for scaling the training instances to millions.
The neural networks are trained such that they recognize the features with lower frequency regarding our entities and encode them efficiently to low-dimensional semantic embeddings. Out of all, most of the iteration cycles are used here to improve embedding quality.
Embedding Serving Framework
The neural network once trained, is packaged for embedding serving. The multi-model computation required for A/B testing is satisfied by the parallel nearline and offline embedding serving pipelines. The result from these pipelines is used to publish the pre-computed embeddings to the Feature Marketplace to be used by other AI models.
LinkedIn defines relationships between entities like members, job postings, titles, skills, companies, etc. Building the edges for these relationships using only the information provided by the members is known as “Feature Standardization”, where these entities are given id values that are used to reference them.
However, geolocations and companies possess a high cardinality where one location/company can be related to millions of members, so, to take care of slow convergence in such large models, we subset the features before using them benefitting from the co-occurrence between job-member pairs.
The chief idea is to identify the semantic concept embeddings to match entities for computing relevance score. Deep Structures Semantic Model (DSSM) does this by passing the features of every entity through a deep NN that compute the embedding which can calculate the relevance score using cosine similarity.
For LinkedIn, the relating between members and job postings is to be identified, and for this, these features through a multilayer perceptron to produce embedding. Here, having separate routes for members and job postings is fruitful when scaling because it enables independent pre-computation for new members and job postings.
However, the model shows limitations in performance and scalability when more layers are added, which encouraged the introduction of soup connections, to pass all inputs from previous layers to the next. The results obtained converged faster because of feature reuse and shorter path creation for gradient during backpropagation. Now, the final prediction is the logistic regression between job seekers and posting pair scores. This model uses Hadamard product over others for distance function to give the model the flexibility to learn
The model is trained and converted into two subgraphs, one for the LinkedIn member pyramid and the other one for the job pyramid. These subgraphs are then packaged, versioned, and distributed into the serving framework for embedding pre-computation.
Scaling is a fundamental requirement in this model. If there is a delay in handling the incoming messages, consumers suffer from stale or missing embeddings for computing the ranking scores, largely affecting the effectiveness of the ranking models, and eventually, the members.
To overcome the message delay error, two approaches can be used. First, increase the run loop stage parallelization across various tasks which can be done by increasing the size of the thread pool containing the jobs. Second, a multi-data-center strategy can be designed that enables consumers to work with the embeddings from jobs in every data center irrespective of the origin, which combats the issue of downtime unavailability.
Current and Future Development
With the use of Pensieve, it is the feature that takes the majority of the total feature where originally, it was consumed by sparse title, seniority, skill, and even location features. Each new version of Pensieve has better performance as per the product’s key metrics.
Future models are expected to incorporate computationally expensive models working on pre-trained techniques like BERT, enabling the processing of raw text data. Also, it is hoped that entity embeddings be used as foundational pieces to represent the job-seeking activities by members.