Sequential A/B Testing Retains the World Streaming Netflix Half 2: Counting Processes | by Netflix Know-how Weblog | Mar, 2024

The counting processes are capabilities that increment by 1 every time a brand new occasion arrives. Clearly, there are fewer occasions occurring within the remedy than within the management. If these have been login occasions, this is able to counsel that the brand new code incorporates a bug that forestalls some customers from with the ability to log in efficiently.

This can be a widespread scenario when coping with occasion timestamps. To offer one other instance, if occasions corresponded to errors or crashes, we wish to know if these are accruing sooner within the remedy than within the management. Furthermore, we wish to reply that query as rapidly as attainable to stop any additional disruption to the service. This necessitates sequential testing strategies which have been launched in part 1.

Time-Inhomogeneous Poisson Course of

Our knowledge for every remedy group is a realization of a one-dimensional level course of, that’s, a sequence of timestamps. As the speed at which the occasions arrive is time-varying (in each remedy and management), we mannequin the purpose course of as a time-inhomogeneous Poisson point process. This level course of is outlined by an depth operate λ: ℝ → [0, ∞). The variety of occasions within the interval [0,t), denoted N(t), has the next Poisson distribution

N(t) ~ Poisson(Λ(t)), the place Λ(t) = ∫₀ᵗ λ(s) ds.

We search to check the null speculation H₀: λᴬ(t) = λᴮ(t) for all t i.e. the depth capabilities for management (A) and remedy (B) are the identical. This may be achieved semiparametrically with out making any assumptions concerning the depth capabilities λᴬ and λᴮ. Furthermore, the novelty of the analysis is that this may be achieved sequentially, as described in section 4 of our paper. Conveniently, the one knowledge required to check this speculation at time t is Nᴬ(t) and Nᴮ(t), the full variety of occasions noticed up to now in management and remedy. In different phrases, all you have to check the null speculation is 2 integers, which might simply be up to date as new occasions arrive. Right here is an instance from a simulated A/A check, by which we all know by design that the depth operate is similar for the management (A) and the remedy (B), albeit nonstationary.