Scalable Annotation Service — Marken | Netflix TechBlog

Scalable Annotation Service — Marken | Netflix TechBlog
Scalable Annotation Service — Marken | Netflix TechBlog

At Netflix, we’ve got lots of of micro companies every with its personal information fashions or entities. For instance, we’ve got a service that shops a film entity’s metadata or a service that shops metadata about photos. All of those companies at a later level wish to annotate their objects or entities. Our group, Asset Administration Platform, determined to create a generic service referred to as Marken which permits any microservice at Netflix to annotate their entity.


Generally individuals describe annotations as tags however that may be a restricted definition. In Marken, an annotation is a chunk of metadata which might be connected to an object from any area. There are a lot of completely different sorts of annotations our consumer purposes wish to generate. A easy annotation, like beneath, would describe {that a} explicit film has violence.

  • Film Entity with id 1234 has violence.

However there are extra fascinating instances the place customers wish to retailer temporal (time-based) information or spatial information. In Pic 1 beneath, we’ve got an instance of an software which is utilized by editors to assessment their work. They wish to change the colour of gloves to wealthy black so they need to have the ability to mark up that space, on this case utilizing a blue circle, and retailer a remark for it. It is a typical use case for a artistic assessment software.

An instance for storing each time and house based mostly information can be an ML algorithm that may determine characters in a body and needs to retailer the next for a video

  • In a selected body (time)
  • In some space in picture (house)
  • A personality identify (annotation information)
Pic 1 : Editors requesting adjustments by drawing shapes just like the blue circle proven above.

Targets for Marken

We wished to create an annotation service which may have the next targets.

  • Permits to annotate any entity. Groups ought to have the ability to outline their information mannequin for annotation.
  • Annotations might be versioned.
  • The service ought to have the ability to serve real-time, aka UI, purposes so CRUD and search operations must be achieved with low latency.
  • All information must be additionally obtainable for offline analytics in Hive/Iceberg.


For the reason that annotation service can be utilized by anybody at Netflix we had a must assist completely different information fashions for the annotation object. An information mannequin in Marken might be described utilizing schema — similar to how we create schemas for database tables and so on.

Our group, Asset Administration Platform, owns a special service that has a json based mostly DSL to explain the schema of a media asset. We prolonged this service to additionally describe the schema of an annotation object.

"sort": "BOUNDING_BOX", ❶
"model": 0, ❷
"description": "Schema describing a bounding field",
"properties": ❸
"sort": "bounding_box",
"obligatory": true
"sort": "time_range",
"obligatory": true

Within the above instance, the appliance desires to signify in a video an oblong space which spans a spread of time.

  1. Schema’s identify is BOUNDING_BOX
  2. Schemas can have variations. This permits customers to make add/take away properties of their information mannequin. We don’t enable incompatible adjustments, for instance, customers can’t change the info sort of a property.
  3. The information saved is represented within the “properties” part. On this case, there are two properties
  4. boundingBox, with sort “bounding_box”. That is principally an oblong space.
  5. boxTimeRange, with sort “time_range”. This permits us to specify begin and finish time for this annotation.

Geometry Objects

To signify spatial information in an annotation we used the Well Known Text (WKT) format. We assist following objects

  • Level
  • Line
  • MultiLine
  • BoundingBox
  • LinearRing

Our mannequin is extensible permitting us to simply add extra geometry objects as wanted.

Temporal Objects

A number of purposes have a requirement to retailer annotations for movies which have time in it. We enable purposes to retailer time as body numbers or nanoseconds.

To retailer information in frames purchasers should additionally retailer frames per second. We name this a SampleData with following parts:

  • sampleNumber aka body quantity
  • sampleNumerator
  • sampleDenominator

Annotation Object

Similar to schema, an annotation object can also be represented in JSON. Right here is an instance of annotation for BOUNDING_BOX which we mentioned above.

"annotationId": ❶
"id": "188c5b05-e648-4707-bf85-dada805b8f87",
"model": "0"
"associatedId": ❷
"entityType": "MOVIE_ID",
"id": "1234"
"annotationType": "ANNOTATION_BOUNDINGBOX", ❸
"annotationTypeVersion": 1,
"metadata": ❹
"fileId": "identityOfSomeFile",
"x": 20,
"y": 30
"x": 40,
"y": 60

"startTimeInNanoSec": 566280000000,
"endTimeInNanoSec": 567680000000

  1. The primary part is the distinctive id of this annotation. An annotation is an immutable object so the id of the annotation at all times features a model. Each time somebody updates this annotation we mechanically increment its model.
  2. An annotation have to be related to some entity which belongs to some microservice. On this case, this annotation was created for a film with id “1234”
  3. We then specify the schema sort of the annotation. On this case it’s BOUNDING_BOX.
  4. Precise information is saved within the metadata part of json. Like we mentioned above there’s a bounding field and time vary in nanoseconds.

Base schemas

Similar to in Object Oriented Programming, our schema service permits schemas to be inherited from one another. This permits our purchasers to create an “is-a-type-of” relationship between schemas. In contrast to Java, we assist a number of inheritance as nicely.

We’ve got a number of ML algorithms which scan Netflix media belongings (photos and movies) and create very fascinating information for instance figuring out characters in frames or figuring out match cuts. This information is then saved as annotations in our service.

As a platform service we created a set of base schemas to ease creating schemas for various ML algorithms. One base schema (TEMPORAL_SPATIAL_BASE) has the next optionally available properties. This base schema can be utilized by any derived schema and never restricted to ML algorithms.

  • Temporal (time associated information)
  • Spatial (geometry information)

And one other one BASE_ALGORITHM_ANNOTATION which has the next optionally available properties which is often utilized by ML algorithms.

  • label (String)
  • confidenceScore (double) — denotes the arrogance of the generated information from the algorithm.
  • algorithmVersion (String) — model of the ML algorithm.

By utilizing a number of inheritance, a typical ML algorithm schema derives from each TEMPORAL_SPATIAL_BASE and BASE_ALGORITHM_ANNOTATION schemas.

"model": 0,
"description": "Base Schema for Algorithm based mostly Annotations",
"sort": "decimal",
"obligatory": false,
"description": "Confidence Rating",
"sort": "string",
"obligatory": false,
"description": "Annotation Tag",
"sort": "string",
"description": "Algorithm Model"


Given the targets of the service we needed to maintain following in thoughts.

  • Our service might be utilized by a whole lot of inner UI purposes therefore the latency for CRUD and search operations have to be low.
  • Apart from purposes we may have ML algorithm information saved. A few of this information might be on the body stage for movies. So the quantity of knowledge saved might be giant. The databases we decide ought to have the ability to scale horizontally.
  • We additionally anticipated that the service may have excessive RPS.

Another targets got here from search necessities.

  • Capability to go looking the temporal and spatial information.
  • Capability to go looking with completely different related and extra related Ids as described in our Annotation Object information mannequin.
  • Full textual content searches on many various fields within the Annotation Object
  • Stem search assist

As time progressed the necessities for search solely elevated and we’ll talk about these necessities intimately in a special part.

Given the necessities and the experience in our group we determined to decide on Cassandra because the supply of reality for storing annotations. For supporting completely different search necessities we selected ElasticSearch. Apart from to assist numerous options we’ve got bunch of inner auxiliary companies for eg. zookeeper service, internationalization service and so on.

Marken structure

Above image represents the block diagram of the structure for our service. On the left we present information pipelines that are created by a number of of our consumer groups to mechanically ingest new information into our service. A very powerful of such a knowledge pipeline is created by the Machine Studying group.

One of many key initiatives at Netflix, Media Search Platform, now makes use of Marken to retailer annotations and carry out numerous searches defined beneath. Our structure makes it attainable to simply onboard and ingest information from Media algorithms. This information is utilized by numerous groups for eg. creators of promotional media (aka trailers, banner photos) to enhance their workflows.


Success of Annotation Service (information labels) is determined by the efficient search of these labels with out understanding a lot of enter algorithms particulars. As talked about above, we use the bottom schemas for each new annotation sort (relying on the algorithm) listed into the service. This helps our purchasers to go looking throughout the completely different annotation varieties persistently. Annotations might be searched both by merely information labels or with extra added filters like film id.

We’ve got outlined a customized question DSL to assist looking, sorting and grouping of the annotation outcomes. Several types of search queries are supported utilizing the Elasticsearch as a backend search engine.

  • Full Textual content Search — Shoppers might not know the precise labels created by the ML algorithms. For instance, the label might be ‘bathe curtain’. With full textual content search, purchasers can discover the annotation by looking utilizing label ‘curtain’ . We additionally assist fuzzy search on the label values. For instance, if the purchasers wish to search ‘curtain’ however they wrongly typed ‘curtian` — annotation with the ‘curtain’ label might be returned.
  • Stem Search — With international Netflix content material supported in numerous languages, our purchasers have the requirement to assist stem seek for completely different languages. Marken service comprises subtitles for a full catalog of titles in Netflix which might be in many various languages. For instance for stem search , `clothes` and `garments` might be stemmed to the identical root phrase `material`. We use ElasticSearch to assist stem seek for 34 completely different languages.
  • Temporal Annotations Search — Annotations for movies are extra related whether it is outlined together with the temporal (time vary with begin and finish time) info. Time vary inside video can also be mapped to the body numbers. We assist labels seek for the temporal annotations inside the supplied time vary/body quantity additionally.
  • Spatial Annotation Search — Annotations for video or picture may embody the spatial info. For instance a bounding field which defines the situation of the labeled object within the annotation.
  • Temporal and Spatial Search — Annotation for video can have each time vary and spatial coordinates. Therefore, we assist queries which might search annotations inside the supplied time vary and spatial coordinates vary.
  • Semantics Search — Annotations might be searched after understanding the intent of the consumer supplied question. This kind of search gives outcomes based mostly on the conceptually comparable matches to the textual content within the question, not like the normal tag based mostly search which is predicted to be precise key phrase matches with the annotation labels. ML algorithms additionally ingest annotations with vectors as a substitute of precise labels to assist the sort of search. Person supplied textual content is transformed right into a vector utilizing the identical ML mannequin, after which search is carried out with the transformed text-to-vector to seek out the closest vectors with the searched vector. Based mostly on the purchasers suggestions, such searches present extra related outcomes and don’t return empty ends in case there are not any annotations which precisely match to the consumer supplied question labels. We assist semantic search utilizing Open Distro for ElasticSearch . We’ll cowl extra particulars on Semantic Search assist in a future weblog article.
Semantic search
  • Vary Intersection — We just lately began supporting the vary intersection queries throughout a number of annotation varieties for a selected title in the true time. This permits the purchasers to go looking with a number of information labels (resulted from completely different algorithms so they’re completely different annotation varieties) inside video particular time vary or the entire video, and get the checklist of time ranges or frames the place the supplied set of knowledge labels are current. A typical instance of this question is to seek out the `James within the indoor shot ingesting wine`. For such queries, the question processor finds the outcomes of each information labels (James, Indoor shot) and vector search (ingesting wine); after which finds the intersection of ensuing frames in-memory.

Search Latency

Our consumer purposes are studio UI purposes so that they anticipate low latency for the search queries. As highlighted above, we assist such queries utilizing Elasticsearch. To maintain the latency low, we’ve got to be sure that all of the annotation indices are balanced, and hotspot will not be created with any algorithm backfill information ingestion for the older films. We adopted the rollover indices technique to keep away from such hotspots (as described in our blog for asset administration software) within the cluster which might trigger spikes within the cpu utilization and decelerate the question response. Search latency for the generic textual content queries are in milliseconds. Semantic search queries have comparatively increased latency than generic textual content searches. Following graph reveals the typical search latency for generic search and semantic search (together with KNN and ANN search) latencies.

Common search latency
Semantic search latency


One of many key challenges whereas designing the annotation service is to deal with the scaling necessities with the rising Netflix film catalog and ML algorithms. Video content material evaluation performs a vital position within the utilization of the content material throughout the studio purposes within the film manufacturing or promotion. We anticipate the algorithm varieties to develop extensively within the coming years. With the rising variety of annotations and its utilization throughout the studio purposes, prioritizing scalability turns into important.

Information ingestions from the ML information pipelines are typically in bulk particularly when a brand new algorithm is designed and annotations are generated for the complete catalog. We’ve got arrange a special stack (fleet of situations) to manage the info ingestion stream and therefore present constant search latency to our shoppers. On this stack, we’re controlling the write throughput to our backend databases utilizing Java threadpool configurations.

Cassandra and Elasticsearch backend databases assist horizontal scaling of the service with rising information measurement and queries. We began with a 12 nodes cassandra cluster, and scaled as much as 24 nodes to assist present information measurement. This 12 months, annotations are added roughly for the Netflix full catalog. Some titles have greater than 3M annotations (most of them are associated to subtitles). At present the service has round 1.9 billion annotations with information measurement of two.6TB.


Annotations might be searched in bulk throughout a number of annotation varieties to construct information info for a title or throughout a number of titles. For such use instances, we persist all of the annotation information in iceberg tables in order that annotations might be queried in bulk with completely different dimensions with out impacting the true time purposes CRUD operations latency.

One of many frequent use instances is when the media algorithm groups learn subtitle information in numerous languages (annotations containing subtitles on a per body foundation) in bulk in order that they will refine the ML fashions they’ve created.

Future work

There’s a whole lot of fascinating future work on this space.

  1. Our information footprint retains rising with time. A number of instances we’ve got information from algorithms that are revised and annotations associated to the brand new model are extra correct and in-use. So we have to do cleanups for giant quantities of knowledge with out affecting the service.
  2. Intersection queries over a big scale of knowledge and returning outcomes with low latency is an space the place we wish to make investments extra time.


Burak Bacioglu and different members of the Asset Administration Platform contributed within the design and growth of Marken.