Establishing a Massive Scale Realized Retrieval System at Pinterest | by Pinterest Engineering | Pinterest Engineering Weblog | Jan, 2025

Pinterest Engineering
Pinterest Engineering Blog

Bowen Deng | Machine Studying Engineer, Homefeed Candidate Era; Zhibo Fan | Machine Studying Engineer, Homefeed Candidate Era; Dafang He | Machine Studying Engineer, Homefeed Relevance; Ying Huang | Machine Studying Engineer, Curation; Raymond Hsu | Engineering Supervisor, Homefeed CG Product Enablement; James Li | Engineering Supervisor, Homefeed Candidate Era; Dylan Wang | Director, Homefeed Relevance; Jay Adams | Principal Engineer, Pinner Curation & Progress

At Pinterest, our mission is to convey everybody the inspiration to create a life they love. Discovering the best content material on-line and serving the best viewers performs a key position on this mission. Fashionable large-scale suggestion programs often embody a number of phases the place retrieval goals at retrieving candidates from billions of candidate swimming pools, and rating predicts which merchandise a person tends to have interaction from the trimmed candidate set retrieved from early phases [2]. Fig 1 illustrates a basic multi-stage suggestion funnel design in Pinterest.

Fig 1. Normal multi-stage suggestion system design in Pinterest. We retrieve candidates from billions of Pin content material corpus and slender it right down to 1000’s of candidates for the rating mannequin to attain and eventually generate the feeds for Pinners. “CG” is brief for candidate era and “LWS” is brief for Lightweight Scoring, which is our pre-ranking mannequin.

The Pinterest rating mannequin is a robust transformer based mostly mannequin discovered from a uncooked person engagement sequence with a blended system serving [3]. It’s highly effective at capturing customers’ lengthy and brief time period engagement and offers prompt predictions. Nevertheless, Pinterest’s retrieval system up to now differs, as a lot of them are based mostly on heuristic approaches equivalent to these based mostly on Pin-Board graphs or user-followed pursuits. This work illustrates our effort in efficiently constructing Pinterest an inner embedding-based retrieval system for natural content material discovered purely from logged person engagement occasions and serves in manufacturing. We have now deployed our system for homefeed in addition to notification.

Fig. 2. Two Tower Fashions for Coaching and Serving.

A two tower-based strategy has been extensively adopted in business [6], the place one tower learns the question embedding and one tower learns the merchandise embedding. The web serving can be low cost with nearest neighbor search with question embedding and merchandise embeddings. This part illustrates the present machine studying design of the two-tower machine studying mannequin for discovered retrieval at Pinterest.

The overall two-tower mannequin structure with coaching goal and serving illustration is in diagram Fig 2.

For coaching an environment friendly retrieval mannequin, many works mannequin it as an excessive multi-class classification downside. Whereas in follow we cannot do softmax over all merchandise corpus, we are able to simply leverage in batch damaging, which gives a reminiscence environment friendly method of sampling damaging. To place it extra formally, a retrieval mannequin ought to optimize the place C is your complete corpus and T is all true labels.

Nevertheless, in follow we are able to solely pattern softmax over a set of damaging gadgets S.

The place given a sampled set D, and the sampled softmax may very well be formulated as:

As we pattern gadgets from our coaching set that may have reputation bias, it’s important for us to appropriate the pattern chance [1]. We use easy logic tuning based mostly on the estimated chance for every merchandise.

𝐿⟮person, merchandise⟯ = 𝒆user · 𝒆item - log P⟮merchandise is within the batch⟯

The place 𝒆user , 𝒆item are the person embedding and merchandise embedding correspondingly.

In our mannequin design, we encode person long-term engagement [11] , person profile, and context as enter [2] within the person tower (as proven later in Fig 4).

Fig 3. Person sequence modeling in two-tower structure. PinnerSage [11] encodes long-term person representations whereas person realtime person sequence modeled with sequence transformer make the mannequin in a position to seize prompt person intention.

As Pinterest serves over 500 million MAUs, designing and implementing an ANN-based retrieval system isn’t trivial. At Pinterest, we’ve our in-house ANN serving system designed based mostly on algorithms [5, 7]. So as to have the ability to serve the merchandise embeddings on-line, we break it down into two items: on-line serving and offline indexing. In on-line serving, person embedding is computed throughout request time so it may leverage probably the most up-to-date options to do personalised retrieval. In offline indexing, hundreds of thousands of merchandise embeddings are computed and pushed to our in-house Manas serving system for on-line serving. Fig. 4 illustrates the system structure for embedding-based retrieval with auto retraining adopted.

Fig 4. Full Serving Pipeline of Realized Retrieval with Auto Retraining

In a real-world suggestion system, it’s a necessity to regularly retrain the fashions to refresh the discovered data of customers and seize current traits. We established an auto retraining workflow to retrain the fashions periodically and validate the mannequin efficiency earlier than deploying them to the mannequin and indexing providers.

Nevertheless, completely different from rating fashions, two-tower fashions are break up into two mannequin artifacts and deployed to separate providers. When a brand new mannequin is retrained, we have to be certain that the serving mannequin model is synchronized between the 2 providers. If we don’t contemplate model synchronization, as a result of distinction in deployment velocity (the place often the Pin indexing pipeline takes for much longer time than the viewer mannequin being prepared), candidate high quality will drastically drop if the embedding house is mismatched. From the infrastructure perspective, any rollback on both service can be detrimental. Furthermore, when a brand new index is constructed and being rolled out to manufacturing, the hosts of ANN search service is not going to change altogether instantly; this ensures that through the rollout interval, a sure share of the visitors gained’t endure from mannequin model mismatch.

To sort out the issue, we connect a chunk of mannequin model metadata to every ANN search service host, which accommodates a mapping from mannequin identify to the newest mannequin model. The metadata is generated along with the index. At serving time, homefeed backend will first get the model metadata from its assigned ANN service host and use the mannequin of the corresponding model to get the person embeddings. This ensures “anytime” mannequin model synchronization: even when some ANN hosts have mannequin variations N and others have variations N+1 through the index rollout interval, the mannequin model remains to be synchronized. As well as, to make sure rollback functionality, we maintain the newest N variations of the viewer mannequin in order that we are able to nonetheless compute the person embeddings from the best mannequin even when the ANN service is rolled again to its final construct.

Homefeed in Pinterest might be probably the most sophisticated system that should retrieve gadgets for various instances: Pinner engagement, content material exploration, curiosity diversification, and many others. It has over 20 candidate mills served in manufacturing with completely different retrieval methods. Presently the discovered retrieval candidate generator goals for driving person engagement. It has the highest person protection and high three save charges. Since launched, it has helped deprecate two different candidate mills with enormous general web site engagement wins.

On this weblog, we offered our work in constructing our discovered retrieval system throughout completely different surfaces in Pinterest. The machine studying based mostly strategy allows us for quick characteristic iteration and additional consolidates our system.

We want to thank all of our collaborators throughout Pinterest. Zhaohui Wu, Yuxiang Wang, Tingting Zhu, Andrew Zhai, Chantat Eksombatchai, Haoyu Chen, Nikil Pancha, Xinyuan Gui, Hedi Xia, Jianjun Hu, Daniel Liu, Shenglan Huang, Dhruvil Badani, Liang Zhang, Weiran Li, Haibin Xie, Yaonan Huang, Keyi Chen, Tim Koh, Tang Li, Jian Wang, Zheng Liu, Chen Yang, Laksh Bhasin, Xiao Yang, Anna Kiyantseva, Jiacheng Hong.

References:

[1] On the Effectiveness of Sampled Softmax Loss for Item Recommendation

[2] Deep Neural Networks for YouTube Recommendations

[3] Transact: Transformer-based realtime user action model for recommendation at pinterest

[4] Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time

[5] Manas HNSW Streaming Filters

[6] Pinterest House Feed Unified Light-weight Scoring: A Two-tower Method

[7] Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.

[8] Sample Selection Bias Correction Theory

[9] PinnerFormer: Sequence Modeling for User Representation at Pinterest