Airbnb Brandometer: Powering Model Notion Measurement on Social Media Information with AI | by Tiantian Zhang | The Airbnb Tech Weblog | Apr, 2024

Tiantian Zhang
The Airbnb Tech Blog

How we quantify model perceptions from social media platforms by deep studying

By Tiantian Zhang, Shuai Shao (Shawn)

At Airbnb, now we have developed Brandometer, a state-of-the-art pure language understanding (NLU) approach for understanding model notion based mostly on social media knowledge.

Model notion refers back to the common emotions and experiences of consumers with an organization. Quantitatively, measuring model notion is an especially difficult process. Historically, we depend on buyer surveys to seek out out what prospects take into consideration an organization. The downsides of such a qualitative examine is the bias in sampling and the limitation in knowledge scale. Social media knowledge, alternatively, is the biggest client database the place customers share their experiences and is the best complementary client knowledge to seize model perceptions.

In comparison with conventional approaches to extract concurrency and count-based prime related subjects, Brandometer learns word embeddings and makes use of embedding distances to measure relatedness of name perceptions (e.g., ‘belonging’, ‘linked’, ‘dependable’). Phrase embedding represents phrases within the type of real-valued vectors, and it performs nicely in reserving semantic meanings and relatedness of phrases. Phrase embeddings obtained from deep neural networks are arguably the most well-liked and evolutionary approaches in NLU. We explored a wide range of phrase embedding fashions, from quintessential algorithms Word2Vec and FastText, to the newest language mannequin DeBERTa, and in contrast them by way of producing dependable model notion scores.

For ideas represented as phrases, we use similarity between its embedding and that of “Airbnb” to measure how vital the idea is with respect to the Airbnb model, which is called as Notion Rating. Model Notion is outlined as Cosine Similarity between Airbnb and the precise key phrase:

the place

Eq. 1

On this weblog submit, we’ll introduce how we course of and perceive social media knowledge, seize model perceptions by way of deep studying and the best way to ‘convert’ the cosine similarities to calibrated Brandometer metrics. We can even share the insights derived from Brandometer metrics.

Downside Setup and Information

In an effort to measure model notion on social media, we assessedall Airbnb associated mentions from 19 platforms (e.g., X — previously generally known as Twitter, Fb, Reddit, and so on) and generated phrase embeddings with state-of-the-art fashions.

In an effort to use Social media knowledge to generate significant phrase embeddings for the aim of measuring model notion, we conquered two challenges:

  • High quality: Social media posts are largely user-generated with various content material comparable to standing sharing and critiques, and could be very noisy.
  • Amount: Social media submit sparsity is one other problem. Contemplating that it usually requires a while for social media customers to generate knowledge in response to sure actions and occasions, a month-to-month rolling window maintains stability of promptness and detectability. Our month-to-month dataset is comparatively small (round 20 million phrases) as in comparison with a typical dataset used to coach good high quality phrase embeddings (e.g., about 100 billion phrases for Google Information Word2Vec mannequin). Heat-start from pre-trained fashions didn’t assist because the in-domain knowledge barely moved the realized embeddings.

We developed a number of knowledge cleansing processes to enhance knowledge high quality. On the identical time, we innovated the modeling methods to mitigate the affect on phrase embedding high quality attributable to knowledge amount and high quality.

Along with knowledge, we explored and in contrast a number of phrase embedding coaching methods with the aim to generate dependable model notion scores.

Word2Vec

Word2Vec is by far the best and most generally used phrase embedding mannequin since 2013. We began with constructing CBOW-based Word2Vec fashions utilizing Gensim. Word2Vec produced first rate in-domain phrase embeddings, and extra importantly, the idea of analogies. In our domain-specific phrase embeddings, we’re in a position to seize analogies within the Airbnb area, comparable to “host” — “present” + “visitor” ~= “want”, “metropolis” — “mall” + “nature” ~= “park”.

FastText

FastText takes into consideration the interior construction of phrases, and is extra strong to out-of-vocabulary phrases and smaller datasets. Furthermore, as impressed by Sense2Vec, we affiliate phrases with sentiments (i.e., POSITIVE, NEGATIVE, NEUTRAL), which varieties model notion ideas on the sentiment ranges.

DeBERTa

Latest progress in transformer-based language fashions (e.g., BERT) has considerably improved the efficiency of NLU duties with the benefit of producing contextualized phrase embeddings. We developed DeBERTa based mostly phrase embeddings, which works higher with smaller dataset and pays extra consideration to surrounding context by way of disentangled consideration mechanisms. We skilled every thing from scratch (together with tokenizer) utilizing Transformers, and the concatenated final consideration layer embeddings resulted in the very best phrase embeddings for our case.

Model Notion Rating Stabilization and Calibration

The variability of phrase embeddings has been broadly studied (Borah, 2021). The causes vary from the underlying stochastic nature of deep studying fashions (e.g., random initialization of phrase embeddings, embedding coaching which ends up in native optimum for international optimization standards) to the amount and high quality modifications of knowledge corpus throughout time.

With Brandometer, we have to scale back the variability in embedding distances to generate secure time collection monitoring. Secure embedding distances helped protect the inherent patterns and buildings current within the time collection knowledge, and therefore it contributes to higher predictability of the monitoring course of. Moreover, it made the monitoring course of extra strong to noisy fluctuations. We studied the influential elements and took the next steps to cut back:

  1. Rating averaging over repetitive coaching with bootstrap sampling
  2. Rank-based notion rating

Score averaging over repetitive coaching with upsampling

For every month’s knowledge, we skilled N fashions with the identical hyper-parameters, and took the typical of N notion scores as the ultimate rating for every idea. In the meantime, we did upsampling to be sure that every mannequin iterated on an equal variety of knowledge factors throughout months.

We outlined variability as:

Eq.2

the place

CosSim(w) refers back to the cosine similarity based mostly notion rating outlined in Eq. 1, A refers back to the algorithm, M refers back to the time window (i.e. month), V refers back to the vocabulary and |V| is the vocabulary dimension, and n refers back to the variety of repetitively skilled fashions.

As N approaches 30, the rating variability values converge and settle inside a slim interval. Therefore, we picked N = 30 for all.