Abhishek Shah

I am a software engineer driven by a passion for designing and scaling infrastructure that solves complex, real-world problems. Throughout my career, I have specialized in building high-performance, large-scale systems.

Led teams to design and build the AI Virtualization layer for compute and networking at Coupang.
Run the AI Orchestration team responsible for AI/ML workloads on kubernetes using open-source KAI scheduler that supports quotas and limits along with gang-scheduling.
Led teams for Big data processing of Ads Data at Meta.
Designed and built the CDN control plane at Netflix.
Delivered a platform for optimizing supply-chain allocation for Walmart, using mixed-integer programming built using Gurobi.

I thrive on taking nebulous, unprecedented technical challenges and translating them into robust, high-impact business solutions. For me, the true fulfillment of engineering lies not only in delivering world-class software but also in fostering a culture of technical excellence and mentorship where engineers grow together. I am constantly seeking the next ambitious challenge that demands deep innovation and drives meaningful business impact.

Building an Audience Estimation System

Cocoon Cocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash. This system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency. How does Cocoon store its data Columnar Data Stored in Ordered Format UserID Format How is the Data Stored in ZDB Multiple UserIds per Row: Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching. Static Ranking: Static Rank of UserIds stored for even distribution. Bi-Directional. Compression: Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best. Sorting: ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value. How is the Query Done Round 1: Fetch 0th Bucket for each term in Query. Round 2 (Optional): Fetch more buckets to improve estimate. Estimation & Extrapolation Query-Time Sampling (Not Data-Load Time Sampling) Sampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample. We load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate. Extrapolation In the int64 space, we determine what value we have reached at and what the hit count is. We then extrapolate. Srank-based ordering removes biases based on how userIds are created. Horizontal Scaling The data to process at query time is still a lot. Horizontal Scaling to the rescue! Divide the data and have parallel processing done on it using a TW Tier. Leaf Layer 32 Leafs per tier. Each Leaf reads only its own partition of data. 2 Leafs can never read the same data… Ever. 32 Data Partitions exist. Aggregator Layer Parses Query Throttles Callers Fans out query to Leafs Handles Leaf Fail Over System Architecture Diagram +-------------------------------------------------------------+ +-------------------------+ | [Aggregator / Indexer Tiers] | | Graph API | | +------------------+ +------------------+ +--------------+ | | +---------------------+ | | | | | | | | |====>| |Cocoon PHP Component | | | | Aggregator | | Aggregator | | Aggregator | | | +---------------------+ | | | || | | || | | || | | +------------^------------+ | | Indexer | | Indexer | | Indexer | | | | +--------^---------+ +--------^---------+ +------^-------+ | +------------v------------+ +-----------|--------------------|------------------|----------+ | Supported Features | | | | | - Reach & Frequency | +-----------v--------------------v------------------v----------+ | - Reach Estimate | | ZippyDB | | - Bid Suggestion | | +----------------+ +----------------+ | | - Outcome prediction | | | ZDB node PRN |<========>| ADB node LLA | | | - Pacing | | +-------^--------+ +--------^-------+ | | - Analysis | | | | | | - Audience Insights | | +-----------+ +-----------+ | +-------------------------+ | | | | | +---v----v---+ | | |ZDB node ATN| | | +------------+ | +--------------------------------------------------------------+ Cocoon Ingestion Pathways 1. Daily Delta Bulk Updates (per data source) FB user profile AdEnv (device, os, placement) Friend connection (of page, group, application and event) Location and Geo info Interests Instagram info 2. Real-time Updates (Dispatchers per data source) Custom audience change dispatcher Partner category change dispatcher Look alike change dispatcher

Building Our Own CDN

Building Our Own CDN Brief History Pre-2011: Business contracts were maintained with external CDN companies. Traffic Allocation: Guaranteed each external CDN a stable share of the streaming traffic. Simplicity of Steering: All of the Video CDN Steering logic fit entirely within: 5 Java Classes ~60 lines of code Scale: Handled 1/3 rd of peak internet traffic in the US. Strategic Shift: A decision was made to build an in-house CDN. Predictable Viewing Patterns The foundational architecture of this custom CDN system relies on a predictable data distribution rule: a small fraction of content generates the bulk of internet traffic. ...

Load Distribution

How we fixed non-uniform load distribution Problem Statement We encountered a distributed load balancing issue between a stateless API layer (~50 nodes) and a 5-node CockroachDB cluster. Each API node randomly sampled a subset of 3 database instances to establish its connection pool. Although this configuration assumed that uniform randomization at the client level would translate to an even distribution across the storage tier, it introduced a significant systemic imbalance. The aggregate traffic pattern created localized hotspots, causing disproportionate resource consumption on specific database nodes while underutilizing available cluster capacity. ...