Cocoon
Cocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash. This system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency.

How does Cocoon store its data
- Columnar Data Stored in Ordered Format
- UserID Format
How is the Data Stored in ZDB
- Multiple UserIds per Row: Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching.
- Static Ranking: Static Rank of UserIds stored for even distribution. Bi-Directional.
- Compression: Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best.
- Sorting: ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value.
How is the Query Done
- Round 1: Fetch 0th Bucket for each term in Query.
- Round 2 (Optional): Fetch more buckets to improve estimate.
Estimation & Extrapolation
Query-Time Sampling (Not Data-Load Time Sampling)
- Sampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample.
- We load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate.
Extrapolation
- In the
int64space, we determine what value we have reached at and what the hit count is. - We then extrapolate.
- Srank-based ordering removes biases based on how userIds are created.
Horizontal Scaling
- The data to process at query time is still a lot.
- Horizontal Scaling to the rescue! Divide the data and have parallel processing done on it using a TW Tier.
Leaf Layer
- 32 Leafs per tier.
- Each Leaf reads only its own partition of data.
- 2 Leafs can never read the same data… Ever.
- 32 Data Partitions exist.
Aggregator Layer
- Parses Query
- Throttles Callers
- Fans out query to Leafs
- Handles Leaf Fail Over
System Architecture Diagram
+-------------------------------------------------------------+ +-------------------------+
| [Aggregator / Indexer Tiers] | | Graph API |
| +------------------+ +------------------+ +--------------+ | | +---------------------+ |
| | | | | | | |====>| |Cocoon PHP Component | |
| | Aggregator | | Aggregator | | Aggregator | | | +---------------------+ |
| | || | | || | | || | | +------------^------------+
| | Indexer | | Indexer | | Indexer | | |
| +--------^---------+ +--------^---------+ +------^-------+ | +------------v------------+
+-----------|--------------------|------------------|----------+ | Supported Features |
| | | | - Reach & Frequency |
+-----------v--------------------v------------------v----------+ | - Reach Estimate |
| ZippyDB | | - Bid Suggestion |
| +----------------+ +----------------+ | | - Outcome prediction |
| | ZDB node PRN |<========>| ADB node LLA | | | - Pacing |
| +-------^--------+ +--------^-------+ | | - Analysis |
| | | | | - Audience Insights |
| +-----------+ +-----------+ | +-------------------------+
| | | |
| +---v----v---+ |
| |ZDB node ATN| |
| +------------+ |
+--------------------------------------------------------------+
Cocoon Ingestion Pathways
1. Daily Delta Bulk Updates (per data source)
- FB user profile
- AdEnv (device, os, placement)
- Friend connection (of page, group, application and event)
- Location and Geo info
- Interests
- Instagram info
2. Real-time Updates (Dispatchers per data source)
- Custom audience change dispatcher
- Partner category change dispatcher
- Look alike change dispatcher