A fine-grained community site visitors evaluation with Millisampler

What the analysis is: 

Millisampler is one among Meta’s newest characterization instruments and permits us to watch, characterize, and debug community efficiency at high-granularity timescales effectively. This light-weight community site visitors characterization device for continuous monitoring operates at tremendous, configurable timescales. It collects time collection of ingress and egress site visitors volumes, variety of lively flows, incoming ECN marks, and ingress and egress retransmissions. Moreover, Millisampler can also be capable of determine in-region site visitors and cross-region site visitors (longer RTT). Millisampler runs on our server fleet accumulating brief, periodic snapshots of this knowledge at 100us, 1ms, and 10ms time granularities, shops it in native disk, and makes it out there for a number of days for on-demand evaluation. For the reason that knowledge is barely aggregated flow-level header info, it doesn’t comprise any personally identifiable info (PII). Even with the minimal quantity of data it collects, Millisampler knowledge has confirmed very helpful in apply,  notably when mixed with present coarser-grained knowledge — we’re capable of see clearly how change buffers or host NICs, for instance, is likely to be unable to deal with the ingress site visitors sample.

 

The way it works: 

Millisampler contains userspace code to schedule runs, retailer knowledge, and serve knowledge, and an eBPF-based tc filter that runs within the kernel to gather fine-timescale knowledge. The person code attaches the tc filter and permits knowledge assortment. A tc filter is among the many first programmable steps on the receipt of a packet and close to the final step on transmission. On ingress, because of this the eBPF code executes on the CPU core that’s processing the delicate irq (backside half) because the packet is directed towards the proudly owning socket. As a result of processing occurs on many CPU cores, to keep away from locks, we use per-CPU variables, which improve the reminiscence requirement to eradicate danger of competition. To attenuate overhead, we pattern periodically and for brief intervals of time. Userspace subsequently configures two parameters in Millisampler: the sampling interval and the variety of samples. We schedule runs with three sampling intervals: 10ms, 1ms, and 100μs, with a set variety of samples to 2,000 for all sampling intervals. Which means that our statement intervals vary from 200ms (100μs sampling price) to 20s (10ms sampling price), permitting us to watch occasions at sub-RTT to cross-region RTT time scales, and, on the identical time, repair the reminiscence footprint of every run to 2,000 64-bit counters per CPU core for every worth we measure.

Millisampler collects a wide range of metrics. It computes ingress and egress complete bytes and ingress ECN-marked bytes from the lengths and CE bits of the packets. Millisampler additionally soundsTTLd marked retransmits. Millisampler makes use of a 128-bit sketch to estimate the variety of lively (incoming and outgoing) connections. Utilizing the sketch leads to an approximation of the connection rely that’s exact as much as a dozen connections and saturates at round 500 connections per sampling interval. Though there may be house for added precision, in apply, greater than the precise variety of connections, the qualitative variation between just a few connections to dozens or a whole lot of connections has been useful towards figuring out patterns of site visitors with extra connections (heavy incast) versus extra site visitors with fewer connections.

Why it issues:

Millisampler is a robust device for troubleshooting and efficiency evaluation. Two contrasting community efficiency faults that we solved at Meta in the previous few years relied on our needing a fine-grained view of site visitors. The primary downside featured synchronized site visitors bursts at tremendous time scales, and seeing this motivated us to construct and deploy Millisampler to catch it rapidly if it occurred once more. The second, which an early Millisampler prototype helped root-cause, featured a NIC driver bug that brought about it to cease delivering packets for milliseconds at a time, thereby proving the worth of Millisampler in complicated investigations. Whereas Millisampler (or Millisampler-like knowledge) performed an essential position in these investigations, it was solely as a part of our wealthy ecosystem of information assortment instruments that monitor a dizzying array of metrics throughout hosts and a community.

Past such incidents, Millisampler knowledge has additionally confirmed helpful in characterizing and analyzing site visitors traits of providers, permitting us to design and deploy a variety of options to assist enhance their efficiency. For instance, we’ve got been capable of characterize the character of bursts throughout a variety of providers with a view to perceive the depth of incast and tune transport efficiency accordingly. We’ve got additionally been ready to have a look at complicated interactions between short-RTT and long-RTT flows and perceive how bursts of both have an effect on equity for the opposite. In a following submit, we’ll take a look at an extension of Millisampler — Syncmillisampler — the place we run Millisampler synchronously throughout all hosts in a rack and use that knowledge to determine buffer competition within the top-of-rack ASICs.

Learn the total paper:

Acknowledgements:

Ehab Ghabashneh, Cristian Lumezanu, Raghu Nallamothu, and Rob Sherwood additionally contributed to the design and implementation of Millisampler.