Cocoon

Cocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash. This system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency.

img


How does Cocoon store its data

  • Columnar Data Stored in Ordered Format
  • UserID Format

How is the Data Stored in ZDB

  • Multiple UserIds per Row: Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching.
  • Static Ranking: Static Rank of UserIds stored for even distribution. Bi-Directional.
  • Compression: Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best.
  • Sorting: ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value.

How is the Query Done

  • Round 1: Fetch 0th Bucket for each term in Query.
  • Round 2 (Optional): Fetch more buckets to improve estimate.

Estimation & Extrapolation

Query-Time Sampling (Not Data-Load Time Sampling)

  • Sampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample.
  • We load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate.

Extrapolation

  • In the int64 space, we determine what value we have reached at and what the hit count is.
  • We then extrapolate.
  • Srank-based ordering removes biases based on how userIds are created.

Horizontal Scaling

  • The data to process at query time is still a lot.
  • Horizontal Scaling to the rescue! Divide the data and have parallel processing done on it using a TW Tier.

Leaf Layer

  • 32 Leafs per tier.
  • Each Leaf reads only its own partition of data.
  • 2 Leafs can never read the same data… Ever.
  • 32 Data Partitions exist.

Aggregator Layer

  • Parses Query
  • Throttles Callers
  • Fans out query to Leafs
  • Handles Leaf Fail Over

System Architecture Diagram

+-------------------------------------------------------------+     +-------------------------+
|                  [Aggregator / Indexer Tiers]               |     |       Graph API         |
|  +------------------+ +------------------+ +--------------+  |     | +---------------------+ |
|  |                  | |                  | |              |  |====>| |Cocoon PHP Component | |
|  |   Aggregator     | |   Aggregator     | |  Aggregator  |  |     | +---------------------+ |
|  |       ||         | |       ||         | |      ||      |  |     +------------^------------+
|  |    Indexer       | |    Indexer       | |   Indexer    |  |                  |
|  +--------^---------+ +--------^---------+ +------^-------+  |     +------------v------------+
+-----------|--------------------|------------------|----------+     |    Supported Features   |
            |                    |                  |                | - Reach & Frequency     |
+-----------v--------------------v------------------v----------+     | - Reach Estimate        |
|                          ZippyDB                             |     | - Bid Suggestion        |
|    +----------------+          +----------------+            |     | - Outcome prediction    |
|    | ZDB node PRN   |<========>|  ADB node LLA  |            |     | - Pacing                |
|    +-------^--------+          +--------^-------+            |     | - Analysis              |
|            |                            |                    |     | - Audience Insights     |
|            +-----------+    +-----------+                    |     +-------------------------+
|                        |    |                                |
|                    +---v----v---+                            |
|                    |ZDB node ATN|                            |
|                    +------------+                            |
+--------------------------------------------------------------+

Cocoon Ingestion Pathways

1. Daily Delta Bulk Updates (per data source)

  • FB user profile
  • AdEnv (device, os, placement)
  • Friend connection (of page, group, application and event)
  • Location and Geo info
  • Interests
  • Instagram info

2. Real-time Updates (Dispatchers per data source)

  • Custom audience change dispatcher
  • Partner category change dispatcher
  • Look alike change dispatcher