To your eyes solely: bettering Netflix video high quality with neural networks | by Netflix Know-how Weblog | Nov, 2022
by Christos G. Bampis, Li-Heng Chen and Zhi Li
When you’re binge-watching the newest season of Stranger Issues or Ozark, we try to ship the absolute best video high quality to your eyes. To take action, we constantly push the boundaries of streaming video high quality and leverage the most effective video applied sciences. For instance, we spend money on next-generation, royalty-free codecs and complex video encoding optimizations. Lately, we added one other highly effective instrument to our arsenal: neural networks for video downscaling. On this tech weblog, we describe how we improved Netflix video high quality with neural networks, the challenges we confronted and what lies forward.
There are, roughly talking, two steps to encode a video in our pipeline:
- Video preprocessing, which encompasses any transformation utilized to the high-quality supply video previous to encoding. Video downscaling is essentially the most pertinent instance herein, which tailors our encoding to display resolutions of various gadgets and optimizes image high quality beneath various community circumstances. With video downscaling, a number of resolutions of a supply video are produced. For instance, a 4K supply video shall be downscaled to 1080p, 720p, 540p and so forth. That is usually performed by a standard resampling filter, like Lanczos.
- Video encoding utilizing a standard video codec, like AV1. Encoding drastically reduces the quantity of video information that must be streamed to your machine, by leveraging spatial and temporal redundancies that exist in a video.
We recognized that we will leverage neural networks (NN) to enhance Netflix video high quality, by changing standard video downscaling with a neural network-based one. This strategy, which we dub “deep downscaler,” has a couple of key benefits:
- A discovered strategy for downscaling can enhance video high quality and be tailor-made to Netflix content material.
- It may be built-in as a drop-in resolution, i.e., we don’t want some other modifications on the Netflix encoding facet or the consumer machine facet. Tens of millions of gadgets that assist Netflix streaming robotically profit from this resolution.
- A definite, NN-based, video processing block can evolve independently, be used past video downscaling and be mixed with totally different codecs.
In fact, we imagine within the transformative potential of NN all through video purposes, past video downscaling. Whereas standard video codecs stay prevalent, NN-based video encoding instruments are flourishing and shutting the efficiency hole by way of compression effectivity. The deep downscaler is our pragmatic strategy to bettering video high quality with neural networks.
The deep downscaler is a neural community structure designed to enhance the end-to-end video high quality by studying a higher-quality video downscaler. It consists of two constructing blocks, a preprocessing block and a resizing block. The preprocessing block goals to prefilter the video sign previous to the following resizing operation. The resizing block yields the lower-resolution video sign that serves as enter to an encoder. We employed an adaptive community design that’s relevant to the big variety of resolutions we use for encoding.
Throughout coaching, our purpose is to generate the most effective downsampled illustration such that, after upscaling, the imply squared error is minimized. Since we can’t immediately optimize for a standard video codec, which is non-differentiable, we exclude the impact of lossy compression within the loop. We deal with a sturdy downscaler that’s educated given a standard upscaler, like bicubic. Our coaching strategy is intuitive and ends in a downscaler that isn’t tied to a selected encoder or encoding implementation. Nonetheless, it requires an intensive analysis to reveal its potential for broad use for Netflix encoding.
The purpose of the deep downscaler is to enhance the end-to-end video high quality for the Netflix member. By our experimentation, involving goal measurements and subjective visible assessments, we discovered that the deep downscaler improves high quality throughout varied standard video codecs and encoding configurations.
For instance, for VP9 encoding and assuming a bicubic upscaler, we measured a mean VMAF Bjøntegaard-Delta (BD) rate acquire of ~5.4{79c2d5f4633b4a18cbb72669d3b9130a25e242070f7ddf547332a9c02c9386f7} over the standard Lanczos downscaling. We now have additionally measured a ~4.4{79c2d5f4633b4a18cbb72669d3b9130a25e242070f7ddf547332a9c02c9386f7} BD price acquire for VMAF-NEG. We showcase an instance outcome from considered one of our Netflix titles beneath. The deep downscaler (purple factors) delivered larger VMAF at comparable bitrate or yielded comparable VMAF scores at a decrease bitrate.
In addition to goal measurements, we additionally carried out human topic research to validate the visible enhancements of the deep downscaler. In our preference-based visible assessments, we discovered that the deep downscaler was most popular by ~77{79c2d5f4633b4a18cbb72669d3b9130a25e242070f7ddf547332a9c02c9386f7} of check topics, throughout a variety of encoding recipes and upscaling algorithms. Topics reported a greater element preservation and sharper visible look. A visible instance is proven beneath.
We additionally carried out A/B testing to know the general streaming influence of the deep downscaler, and detect any machine playback points. Our A/B assessments confirmed QoE enhancements with none adversarial streaming influence. This exhibits the advantage of deploying the deep downscaler for all gadgets streaming Netflix, with out playback dangers or high quality degradation for our members.
Given our scale, making use of neural networks can result in a major improve in encoding prices. In an effort to have a viable resolution, we took a number of steps to enhance effectivity.
- The neural community structure was designed to be computationally environment friendly and in addition keep away from any unfavourable visible high quality influence. For instance, we discovered that only a few neural community layers had been ample for our wants. To scale back the enter channels even additional, we solely apply NN-based scaling on luma and scale chroma with a normal Lanczos filter.
- We carried out the deep downscaler as an FFmpeg-based filter that runs along with different video transformations, like pixel format conversions. Our filter can run on each CPU and GPU. On a CPU, we leveraged oneDnn to additional scale back latency.
The Encoding Applied sciences and Media Cloud Engineering groups at Netflix have collectively innovated to carry Cosmos, our next-generation encoding platform, to life. Our deep downscaler effort was a wonderful alternative to showcase how Cosmos can drive future media innovation at Netflix. The next diagram exhibits a top-down view of how the deep downscaler was built-in inside a Cosmos encoding microservice.
A Cosmos encoding microservice can serve a number of encoding workflows. For instance, a service may be referred to as to carry out complexity evaluation for a high-quality enter video, or generate encodes meant for the precise Netflix streaming. Inside a service, a Stratum operate is a serverless layer devoted to operating stateless and computationally-intensive capabilities. Inside a Stratum operate invocation, our deep downscaler is utilized previous to encoding. Fueled by Cosmos, we will leverage the underlying Titus infrastructure and run the deep downscaler on all our multi-CPU/GPU environments at scale.
The deep downscaler paves the trail for extra NN purposes for video encoding at Netflix. However our journey just isn’t completed but and we try to enhance and innovate. For instance, we’re finding out a couple of different use instances, similar to video denoising. We’re additionally extra environment friendly options to making use of neural networks at scale. We’re thinking about how NN-based instruments can shine as a part of next-generation codecs. On the finish of the day, we’re captivated with utilizing new applied sciences to enhance Netflix video high quality. To your eyes solely!
We want to acknowledge the next people for his or her assist with the deep downscaler mission:
Lishan Zhu, Liwei Guo, Aditya Mavlankar, Kyle Swanson and Anush Moorthy (Video Picture and Encoding staff), Mariana Afonso and Lukas Krasula (Video Codecs and High quality staff), Ameya Vasani (Media Cloud Engineering staff), Prudhvi Kumar Chaganti (Streaming Encoding Pipeline staff), Chris Pham and Andy Rhines (Knowledge Science and Engineering staff), Amer Ather (Netflix efficiency staff), the Netflix Metaflow staff and Prof. Alan Bovik (College of Texas at Austin).