Building an Audience Estimation System
Cocoon Cocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash. This system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency. How does Cocoon store its data Columnar Data Stored in Ordered Format UserID Format How is the Data Stored in ZDB Multiple UserIds per Row: Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching. Static Ranking: Static Rank of UserIds stored for even distribution. Bi-Directional. Compression: Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best. Sorting: ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value. How is the Query Done Round 1: Fetch 0th Bucket for each term in Query. Round 2 (Optional): Fetch more buckets to improve estimate. Estimation & Extrapolation Query-Time Sampling (Not Data-Load Time Sampling) Sampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample. We load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate. Extrapolation In the int64 space, we determine what value we have reached at and what the hit count is. We then extrapolate. Srank-based ordering removes biases based on how userIds are created. Horizontal Scaling The data to process at query time is still a lot. Horizontal Scaling to the rescue! Divide the data and have parallel processing done on it using a TW Tier. Leaf Layer 32 Leafs per tier. Each Leaf reads only its own partition of data. 2 Leafs can never read the same data… Ever. 32 Data Partitions exist. Aggregator Layer Parses Query Throttles Callers Fans out query to Leafs Handles Leaf Fail Over System Architecture Diagram +-------------------------------------------------------------+ +-------------------------+ | [Aggregator / Indexer Tiers] | | Graph API | | +------------------+ +------------------+ +--------------+ | | +---------------------+ | | | | | | | | |====>| |Cocoon PHP Component | | | | Aggregator | | Aggregator | | Aggregator | | | +---------------------+ | | | || | | || | | || | | +------------^------------+ | | Indexer | | Indexer | | Indexer | | | | +--------^---------+ +--------^---------+ +------^-------+ | +------------v------------+ +-----------|--------------------|------------------|----------+ | Supported Features | | | | | - Reach & Frequency | +-----------v--------------------v------------------v----------+ | - Reach Estimate | | ZippyDB | | - Bid Suggestion | | +----------------+ +----------------+ | | - Outcome prediction | | | ZDB node PRN |<========>| ADB node LLA | | | - Pacing | | +-------^--------+ +--------^-------+ | | - Analysis | | | | | | - Audience Insights | | +-----------+ +-----------+ | +-------------------------+ | | | | | +---v----v---+ | | |ZDB node ATN| | | +------------+ | +--------------------------------------------------------------+ Cocoon Ingestion Pathways 1. Daily Delta Bulk Updates (per data source) FB user profile AdEnv (device, os, placement) Friend connection (of page, group, application and event) Location and Geo info Interests Instagram info 2. Real-time Updates (Dispatchers per data source) Custom audience change dispatcher Partner category change dispatcher Look alike change dispatcher