[{"content":"Cocoon Cocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash. This system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency.\nHow does Cocoon store its data Columnar Data Stored in Ordered Format UserID Format How is the Data Stored in ZDB Multiple UserIds per Row: Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching. Static Ranking: Static Rank of UserIds stored for even distribution. Bi-Directional. Compression: Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best. Sorting: ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value. How is the Query Done Round 1: Fetch 0th Bucket for each term in Query. Round 2 (Optional): Fetch more buckets to improve estimate. Estimation \u0026amp; Extrapolation Query-Time Sampling (Not Data-Load Time Sampling) Sampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample. We load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate. Extrapolation In the int64 space, we determine what value we have reached at and what the hit count is. We then extrapolate. Srank-based ordering removes biases based on how userIds are created. Horizontal Scaling The data to process at query time is still a lot. Horizontal Scaling to the rescue! Divide the data and have parallel processing done on it using a TW Tier. Leaf Layer 32 Leafs per tier. Each Leaf reads only its own partition of data. 2 Leafs can never read the same data\u0026hellip; Ever. 32 Data Partitions exist. Aggregator Layer Parses Query Throttles Callers Fans out query to Leafs Handles Leaf Fail Over System Architecture Diagram +-------------------------------------------------------------+ +-------------------------+ | [Aggregator / Indexer Tiers] | | Graph API | | +------------------+ +------------------+ +--------------+ | | +---------------------+ | | | | | | | | |====\u0026gt;| |Cocoon PHP Component | | | | Aggregator | | Aggregator | | Aggregator | | | +---------------------+ | | | || | | || | | || | | +------------^------------+ | | Indexer | | Indexer | | Indexer | | | | +--------^---------+ +--------^---------+ +------^-------+ | +------------v------------+ +-----------|--------------------|------------------|----------+ | Supported Features | | | | | - Reach \u0026amp; Frequency | +-----------v--------------------v------------------v----------+ | - Reach Estimate | | ZippyDB | | - Bid Suggestion | | +----------------+ +----------------+ | | - Outcome prediction | | | ZDB node PRN |\u0026lt;========\u0026gt;| ADB node LLA | | | - Pacing | | +-------^--------+ +--------^-------+ | | - Analysis | | | | | | - Audience Insights | | +-----------+ +-----------+ | +-------------------------+ | | | | | +---v----v---+ | | |ZDB node ATN| | | +------------+ | +--------------------------------------------------------------+ Cocoon Ingestion Pathways 1. Daily Delta Bulk Updates (per data source) FB user profile AdEnv (device, os, placement) Friend connection (of page, group, application and event) Location and Geo info Interests Instagram info 2. Real-time Updates (Dispatchers per data source) Custom audience change dispatcher Partner category change dispatcher Look alike change dispatcher ","permalink":"/posts/metacocoon/","summary":"\u003ch2 id=\"cocoon\"\u003eCocoon\u003c/h2\u003e\n\u003cp\u003eCocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash.\nThis system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency.\u003c/p\u003e\n\u003cp\u003e\u003cimg alt=\"img\" loading=\"lazy\" src=\"../../assets/images/meta_cocoon.png\"\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"how-does-cocoon-store-its-data\"\u003eHow does Cocoon store its data\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eColumnar Data Stored in Ordered Format\u003c/li\u003e\n\u003cli\u003eUserID Format\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"how-is-the-data-stored-in-zdb\"\u003eHow is the Data Stored in ZDB\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eMultiple UserIds per Row:\u003c/strong\u003e Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eStatic Ranking:\u003c/strong\u003e Static Rank of UserIds stored for even distribution. Bi-Directional.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eCompression:\u003c/strong\u003e Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSorting:\u003c/strong\u003e ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value.\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"how-is-the-query-done\"\u003eHow is the Query Done\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003eRound 1:\u003c/strong\u003e Fetch 0th Bucket for each term in Query.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRound 2 (Optional):\u003c/strong\u003e Fetch more buckets to improve estimate.\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"estimation--extrapolation\"\u003eEstimation \u0026amp; Extrapolation\u003c/h2\u003e\n\u003ch3 id=\"query-time-sampling-not-data-load-time-sampling\"\u003eQuery-Time Sampling (Not Data-Load Time Sampling)\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eSampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample.\u003c/li\u003e\n\u003cli\u003eWe load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"extrapolation\"\u003eExtrapolation\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eIn the \u003ccode\u003eint64\u003c/code\u003e space, we determine what value we have reached at and what the hit count is.\u003c/li\u003e\n\u003cli\u003eWe then extrapolate.\u003c/li\u003e\n\u003cli\u003eSrank-based ordering removes biases based on how userIds are created.\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"horizontal-scaling\"\u003eHorizontal Scaling\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003eThe data to process at query time is still a lot.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eHorizontal Scaling to the rescue!\u003c/strong\u003e Divide the data and have parallel processing done on it using a TW Tier.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"leaf-layer\"\u003eLeaf Layer\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003e32 Leafs per tier.\u003c/li\u003e\n\u003cli\u003eEach Leaf reads only its own partition of data.\u003c/li\u003e\n\u003cli\u003e2 Leafs can never read the same data\u0026hellip; Ever.\u003c/li\u003e\n\u003cli\u003e32 Data Partitions exist.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch3 id=\"aggregator-layer\"\u003eAggregator Layer\u003c/h3\u003e\n\u003cul\u003e\n\u003cli\u003eParses Query\u003c/li\u003e\n\u003cli\u003eThrottles Callers\u003c/li\u003e\n\u003cli\u003eFans out query to Leafs\u003c/li\u003e\n\u003cli\u003eHandles Leaf Fail Over\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2 id=\"system-architecture-diagram\"\u003eSystem Architecture Diagram\u003c/h2\u003e\n\u003cpre tabindex=\"0\"\u003e\u003ccode\u003e+-------------------------------------------------------------+     +-------------------------+\n|                  [Aggregator / Indexer Tiers]               |     |       Graph API         |\n|  +------------------+ +------------------+ +--------------+  |     | +---------------------+ |\n|  |                  | |                  | |              |  |====\u0026gt;| |Cocoon PHP Component | |\n|  |   Aggregator     | |   Aggregator     | |  Aggregator  |  |     | +---------------------+ |\n|  |       ||         | |       ||         | |      ||      |  |     +------------^------------+\n|  |    Indexer       | |    Indexer       | |   Indexer    |  |                  |\n|  +--------^---------+ +--------^---------+ +------^-------+  |     +------------v------------+\n+-----------|--------------------|------------------|----------+     |    Supported Features   |\n            |                    |                  |                | - Reach \u0026amp; Frequency     |\n+-----------v--------------------v------------------v----------+     | - Reach Estimate        |\n|                          ZippyDB                             |     | - Bid Suggestion        |\n|    +----------------+          +----------------+            |     | - Outcome prediction    |\n|    | ZDB node PRN   |\u0026lt;========\u0026gt;|  ADB node LLA  |            |     | - Pacing                |\n|    +-------^--------+          +--------^-------+            |     | - Analysis              |\n|            |                            |                    |     | - Audience Insights     |\n|            +-----------+    +-----------+                    |     +-------------------------+\n|                        |    |                                |\n|                    +---v----v---+                            |\n|                    |ZDB node ATN|                            |\n|                    +------------+                            |\n+--------------------------------------------------------------+\n\u003c/code\u003e\u003c/pre\u003e\u003ch3 id=\"cocoon-ingestion-pathways\"\u003eCocoon Ingestion Pathways\u003c/h3\u003e\n\u003ch4 id=\"1-daily-delta-bulk-updates-per-data-source\"\u003e1. Daily Delta Bulk Updates (per data source)\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003eFB user profile\u003c/li\u003e\n\u003cli\u003eAdEnv (device, os, placement)\u003c/li\u003e\n\u003cli\u003eFriend connection (of page, group, application and event)\u003c/li\u003e\n\u003cli\u003eLocation and Geo info\u003c/li\u003e\n\u003cli\u003eInterests\u003c/li\u003e\n\u003cli\u003eInstagram info\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch4 id=\"2-real-time-updates-dispatchers-per-data-source\"\u003e2. Real-time Updates (Dispatchers per data source)\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003eCustom audience change dispatcher\u003c/li\u003e\n\u003cli\u003ePartner category change dispatcher\u003c/li\u003e\n\u003cli\u003eLook alike change dispatcher\u003c/li\u003e\n\u003c/ul\u003e","title":"Building an Audience Estimation System"},{"content":"Building Our Own CDN Brief History Pre-2011: Business contracts were maintained with external CDN companies. Traffic Allocation: Guaranteed each external CDN a stable share of the streaming traffic. Simplicity of Steering: All of the Video CDN Steering logic fit entirely within: 5 Java Classes ~60 lines of code Scale: Handled 1/3 rd of peak internet traffic in the US. Strategic Shift: A decision was made to build an in-house CDN. Predictable Viewing Patterns The foundational architecture of this custom CDN system relies on a predictable data distribution rule: a small fraction of content generates the bulk of internet traffic.\nBytes Streamed ^ | * | * | * | * | * * * * * * * * * * * * * +-----------------------------------\u0026gt; Content Rank Not a Traditional CDN Unlike standard CDNs that use on-demand caching mechanisms, this system employs unique optimization policies:\nPredetermined Placing: Proactive content placement instead of reactive on-demand caching. Precise Traffic Steering: Steers customers precisely to specific machines at Internet Exchanges (IXs) \u0026amp; Internet Service Providers (ISPs). No Global Replication: There is no guarantee that every machine has every file available. Popularity Computation: Utilizes highly granular and accurate popularity computations, leading to significant efficiency gains. Serving Machine Selection Process Network Proximity: Pick machines that are routable and close based on route prefixes and network cost. Health and Metrics: Filter for machines that are stable over time, healthy, capable of taking more traffic, and dynamic enough to respond to major changes in popularity. Content Availability: Select machines that explicitly hold the requested content. Machines on the Internet The CDN categorizes infrastructure deployment locations into distinct topological layers:\nMachine Location Layer Network Characteristics \u0026amp; Routing Content Strategy Embedded in ISP Closest to the ISP\u0026rsquo;s customers from a network topology perspective. Likely not routable from other ISPs. May not house all content. Internet Exchange (IXP) - Direct Peering Direct peering established between the ISP Data Center and the IXP. May not be routable from other ISPs. - Internet Exchange (IX) - Transit Requires transit routing between the IX and the ISP Data Center. Costs extra money ($\\$\\$\\$) for the transit link. Widely routable. Highly useful as a final fallback and holds the full content catalog. Architecture v1 The system coordinates communication between edge content serving units, state databases, and customer playback endpoints:\nContent Serving Machines update file location snapshots hourly. The Cache API receives these locations and saves them into a distributed Cassandra database. A Snapshot Generator periodically polls file locations from Cassandra to generate a unified cache index, which it writes as a snapshot to Amazon S3. On the consumer side, client Devices interact with the Playback API. The Serving Machine Picker regularly reads the pre-compiled snapshots from S3 to optimally route playback traffic. Placing Content on Machines Popularity Curve Strategy Geographic Aggregation: Popularity is computed at a country level (or broader regional scopes). Many distinct clusters inside a given country subscribe to the exact same country-level popularity feed. Smoothing Filter: Curves are mathematically smoothed over N past days using an Exponential Weighted Moving Average (EWMA) formula. Hardware Tiering: Highly popular content is explicitly placed on fast Flash/SSD arrays, while the remainder resides on high-capacity Hard Disk Drives (HDDs) to prevent overloading HDD hardware limitations. The Impact of Prediction Misses When mapping actual storage allocation against predicted offload performance, discrepancies surface:\nPredicted Performance: 400 TiB allocated storage was expected to hit a 99.6% network traffic offload rate. Actual Performance: 400 TiB allocated storage achieved only a 96% network traffic offload rate. Does a Small 3.6% Prediction Miss Matter? Yes, it matters profoundly due to system leverage effects. Consider a cluster processing 5 Tbps of total edge traffic:\nPredicted HDD load: $0.4%$ of 5 Tbps = 20 Gbps (expected to be served from HDD storage). Actual HDD load: $4%$ of 5 Tbps = 200 Gbps (actual traffic demanded from HDD storage due to the misprediction). Hardware Limit: The cluster\u0026rsquo;s total HDD capability = 15 machines times 20 HDDs times 0.5 Gbps = 150 Gbps. Consequences of the Bottleneck Traffic Loss: The HDDs are asked to deliver 200 Gbps but max out at 150 Gbps. This results in losing 50 Gbps of total traffic due to being strictly throttled by HDD read speeds. Thermal \u0026amp; Hardware Strain: Edge machines run hot because hard drives are pushed to peak capacity continuously, tipping the system over safe thresholds. Under-utilization: Expensive, high-performance SSD infrastructure remains under-utilized. QoS Degradation: Quality of Service (QoS) degrades significantly for all customers streaming content from hard disks. Financial Costs: Diverting massive chunks of traffic to distant alternative clusters incurs substantial external transit monetary costs ($$$$) and threatens to saturate ISP links. Mitigation To resolve this widespread systemic issue, the architecture transitioned to Granular, Per-Cluster Prediction modeling instead of broad country-level aggregations.\nHousekeeping on Content Serving Machines Content Allocation Minimizing Churn: As day-over-day popularity queues shift, content allocation strategies must prevent heavy internal churn on machines within a cluster. Algorithmic Routing: Consistent Hashing is employed to cleanly distribute files to target cluster nodes. Content Deletion Soft Deletion: When a file is scheduled for removal, it is initially \u0026ldquo;soft-deleted\u0026rdquo;. This buffer seamlessly accommodates state propagation delays across the network. Content Fill Time Off-Peak Operations: Edge machines perform content hydration/cache filling exclusively during off-peak hours. Cost Efficiency: Since network bandwidth pricing models calculate charges using 95th percentile traffic metrics, scheduling cache fills during low-use windows prevents driving up operational bills. Architectural Enhancements (v2) To achieve per-cluster optimizations and sub-hour state synchronization, enhanced real-time streaming tools were introduced into the data control plane:\nCache Tracking: Caches report file locations via the Cache API. Message Streaming: Updates pass into Kafka (equipped with MirrorMaker) for durable event streaming. Stream Processing: Apache Flink ingests Kafka events, outputting state data through structured checkpoints directly to Amazon S3. High-Performance Query Layer: Flink keeps an low-latency index stored in MemCache (backed by persistent disk storage). Device Playback: Consumer Devices ping the Playback API, allowing the Cache Picker to execute ultra-fast lookups against MemCache instead of pulling heavy file snapshots over S3. Media Ingestion: A Media Pipeline feeds directly into a Liveness Generator, which actively communicates with MemCache using routine full scans to guarantee global catalog accuracy. ","permalink":"/posts/netflixcdn/","summary":"\u003ch1 id=\"building-our-own-cdn\"\u003eBuilding Our Own CDN\u003c/h1\u003e\n\u003ch2 id=\"brief-history\"\u003eBrief History\u003c/h2\u003e\n\u003cul\u003e\n\u003cli\u003e\u003cstrong\u003ePre-2011:\u003c/strong\u003e Business contracts were maintained with external CDN companies.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eTraffic Allocation:\u003c/strong\u003e Guaranteed each external CDN a stable share of the streaming traffic.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eSimplicity of Steering:\u003c/strong\u003e All of the Video CDN Steering logic fit entirely within:\n\u003cul\u003e\n\u003cli\u003e5 Java Classes\u003c/li\u003e\n\u003cli\u003e~60 lines of code\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eScale:\u003c/strong\u003e Handled 1/3 rd of peak internet traffic in the US.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eStrategic Shift:\u003c/strong\u003e A decision was made to build an in-house CDN.\u003c/li\u003e\n\u003c/ul\u003e\n\u003chr\u003e\n\u003ch2 id=\"predictable-viewing-patterns\"\u003ePredictable Viewing Patterns\u003c/h2\u003e\n\u003cp\u003eThe foundational architecture of this custom CDN system relies on a predictable data distribution rule: \u003cstrong\u003ea small fraction of content generates the bulk of internet traffic\u003c/strong\u003e.\u003c/p\u003e","title":"Building Our Own CDN"},{"content":"Maya Platform at Coupang The Maya platform virtualizes compute and network for our tenants on shared physical infrastructuire. This is currently a major in-flight program with dynamic tenancy on the bare-metal path. Tenants self-serve VMs (with GPU passthrough) or bare-metal instances; instances in the same nodepool share a tenant VPC (private network) and base image. Conceptually similar to AWS VPC (on the Nitro substrate) plus AWS Managed Node Groups, but built for our internal fleet.\nCortex Platform at Coupang I\u0026rsquo;m the technical owner of Cortex, our custom Kubernetes platform for AI/ML workloads. It is a multi-tenant solution where our internal teams are offered resource guarantees within shared kubernetes clusters.\nWe replaced the default kube-scheduler. Cortex\u0026rsquo;s capability surface covers the full set : transactional gang scheduling for distributed training, fractional GPU allocation (0.5/0.25 shares), hierarchical parent-child queues for multi-tenant, fair-share and an iterative scenario-based preemption solver.\nAPI Layer at Coupang Our AI teams had to deal with different kubernetes primitives to get Notebooks and Inference apps working. We built a meta-CRD that allows the API team to define new app types. This allow ML engineers to define their apps with a single yaml file. Patent filed on a related CRD primitive (CompositeApplication).\n","permalink":"/about/","summary":"\u003ch2 id=\"maya-platform-at-coupang\"\u003eMaya Platform at Coupang\u003c/h2\u003e\n\u003cp\u003eThe Maya platform virtualizes compute and network for our tenants on shared physical infrastructuire.\nThis is currently a major in-flight program with dynamic tenancy on the bare-metal path. Tenants self-serve VMs (with GPU passthrough) or bare-metal instances; instances in the same nodepool share a  tenant VPC (private network) and base image. Conceptually similar to AWS VPC (on the Nitro substrate) plus AWS Managed Node Groups, but built for our internal fleet.\u003c/p\u003e","title":""},{"content":"ABHISHEK SHAH Sunnyvale, CA | (650) 630-9280 | shahabhishek@gmail.com | LinkedIn: /in/abhishekshah\nSUMMARY Senior engineering leader with 25+ years architecting and operating cloud compute platforms at hyperscale. Currently Senior Director at Coupang Intelligent Cloud (CIC), leading three engineering organizations — API Platform, Kubernetes Platform, and Virtualization — that together deliver Coupang\u0026rsquo;s internal GPU cloud and broader compute substrate. Lead Software Architect of CIC, founding member of the program, technical owner of CIC\u0026rsquo;s Kubernetes platform (Cortex) and the CompositeApplication CRD primitive (3 patents filed). Prior tenures at Netflix (designed and led the OpenConnect CDN control plane), Facebook/Meta (led the ad-audience platform processing a trillion updates per day), Google (built the original Kubernetes L4 SDN and DNS — code paths still running in every Kubernetes cluster today), and Roblox (next-gen pub/sub at 5M msgs/sec). Operate with influence across software, infrastructure, security, and partner-engineering organizations; partner-engineering relationships at the architecture level with NVIDIA, AWS, and Run:AI. Coupang Bar Raiser for senior and principal hiring.\nCORE EXPERTISE GPU \u0026amp; AI Compute: Multi-tenant GPU-as-a-Service (H200, B200), distributed training and inference substrate, fractional GPU allocation, NVIDIA partner-engineering, Run:AI integration and federation, NCCL/InfiniBand topology awareness, bare-metal GPU isolation.\nKubernetes \u0026amp; Container Platforms: CRD primitives, multi-tenant isolation, container runtime, pod lifecycle, CNI networking, node OS, image distribution. Original author of Kubernetes L4 SDN and DNS at Google.\nCloud at Scale: 150+ AWS EKS clusters, 5K+ compute nodes, multi-AZ HA/DR, static-stability architecture, 99.99–99.999% availability, Blue-Green deployment via Temporal, capacity planning for peak events.\nDistributed Systems Foundations: CDN control planes (Netflix OpenConnect at 1/3 of US peak internet traffic), trillion-updates/day ad-audience platforms (Meta), pub/sub at 5M msgs/sec (Roblox), petabyte-scale live data migrations with zero downtime.\nLeadership: 40-engineer organization across US, Korea, India, Shanghai; three teams (API, Kubernetes, Virtualization); written-context leadership style; cross-functional alignment through design authority; Bar Raiser for principal hires.\nLanguages: Go, Java, C/C++, Python.\nPROFESSIONAL EXPERIENCE Senior Director, Coupang Intelligent Cloud (CIC) Mountain View, CA | Oct 2022 – Present\nLead engineering for Coupang Intelligent AI Cloud — Coupang\u0026rsquo;s internal GPU and AI compute platform and the broader compute substrate underneath it. Manage three engineering teams (API Platform, Kubernetes Platform, Virtualization) totaling roughly 40 engineers distributed across the US, Korea, India, and Shanghai. Also serve as the Lead Software Architect for CIC and the platform-level technical owner across the three teams.\nFounding architecture and delivery of CIC GPU Cloud. Founding member of the CIC program; shipped multi-AZ, multi-SKU GPU provisioning in a single quarter (Q1 2025). CIC now supports H200 and B200 GPU SKUs end-to-end through a unified declarative API surface. Software-defined, API-driven cloud model replaces ticket-based provisioning; resource requests reconcile through controllers with no human handoff in the customer path.\nNVIDIA partner-engineering at architecture level. Drove design discussions with NVIDIA\u0026rsquo;s Run:AI team to ship a programmatic federation API for identity-system integration with CIC\u0026rsquo;s core identity infrastructure — replacing static credentials with token exchange. NVIDIA implemented and shipped the API; CIC\u0026rsquo;s security posture improved meaningfully. Resolved GPU scheduling integration issues directly with NVIDIA architects (zero GPU provisioning incidents post-resolution). Translated CIC requirements into NVIDIA Run:AI roadmap commitments with clear timelines.\nStatic-stability and Blue-Green for Coupang EKS Compute Platform. Introduced a static-stability architecture for CSP-tier services on the Coupang EKS Compute Platform, replacing manual zone-failure recovery with auto-healing backed by a single AWS Target Group model. Now standard across all CSP services; underpins Coupang\u0026rsquo;s 99.99% resiliency goal. Designed and implemented Blue-Green deployment on top of open-source Temporal workflow system, with rollback triggers, phased traffic rollouts, and health-based promotion gates.\nGalaxy → Coupang EKS Compute Platform migration. Led the migration of Galaxy — Coupang\u0026rsquo;s largest latency-sensitive application set — onto the Coupang EKS Compute Platform as part of company-wide compute centralization. Scaled the platform 2.5x its original compute footprint. Coordinated across S\u0026amp;D, CMG, Catalog, Tech Infrastructure, and Security organizations. Shipped with zero migration incidents, hitting Coupang\u0026rsquo;s 2024 compute-platform-centralization goal.\nAWS Idle-CPU root cause (doubled VM performance). Led the deep-systems investigation during the Galaxy migration that traced a serious performance regression on newer AWS VMs to a default Idle CPU configuration. Worked directly with AWS engineering leadership on the fix. Outcome: doubled performance, enabled migration to newer VM types, saved Coupang from paying ~10% more for slower compute. Invited to present the work at an AWS conference.\nOther cross-org programs: Drove iPhone Launch load-test resolution (critical AWS bug slowing EC2 and Kubernetes scaling). Designed/shipped multi-port multi-cluster ingress by patching the open-source AWS Load Balancer Controller. Drove the 2x Scale Test architecture for Coupang\u0026rsquo;s payment systems, including synthetic-data isolation safeguards that protected financial reporting integrity.\nOrg and culture. 40-engineer organization across four countries. Team leadership across API Platform, Kubernetes Platform, and Virtualization. Coupang Bar Raiser for senior and principal-level hiring across engineering and TPM.\nTechnical Director, Data Storage — Roblox San Mateo, CA | Nov 2021 – Oct 2022\nLed teams building next-generation data and caching infrastructure at hyperscale.\nDesigned a next-gen pub/sub system supporting 50K–100K subscribing processes and 5M messages/sec — the backbone for cache invalidation across Roblox\u0026rsquo;s fleet. Root-caused non-uniform load distribution on CockroachDB clusters (unintended binomial query distribution); significant hardware and license cost reduction. Deep systems-level debugging in production. Revamped backup architecture using Write-Once-Read-Many (WORM) integrity guarantees against ransomware. Technical Lead, API Systems \u0026amp; Streaming Infrastructure — Netflix Los Gatos, CA | Jan 2011 – Nov 2012 \u0026amp; Nov 2019 – Nov 2021\nTwo tenures across Netflix\u0026rsquo;s core infrastructure spanning OpenConnect CDN, APIs, and metadata.\nNetflix OpenConnect CDN control plane (founding architect): Designed and led the control plane interacting with thousands of edge and origin servers to optimally route Netflix streaming traffic — at peak, approximately one-third of US internet traffic. Predetermined-placement architecture against a predicted demand surface, rather than reactive caching. Three machine classes (ISP-embedded, IXP-peered, widely-routable fallback) with BGP-aware steering. BFF fanout optimization: Identified excessive fanout from Node.js BFF systems; reduced via batching. Savings: $1M+/year in infrastructure cost. Metacat → Apache Iceberg: Extended Netflix\u0026rsquo;s metadata platform to support Apache Iceberg; contributions merged at Netflix/metacat. Built the movie-stream URL generation library used for Netflix\u0026rsquo;s entire catalog. Designed a dynamic traffic-routing Rules Engine for partner-CDN/Netflix-CDN composition. Led SaaS provider evaluation for Netflix\u0026rsquo;s Gaming initiative. Technical Lead, Audience Infrastructure \u0026amp; Analytics — Facebook (Meta) Menlo Park, CA | Oct 2016 – Nov 2019\nLed the ad-audience platform processing a trillion updates per day across millions of audiences. Led the Audience Size Estimation System read-path: query-time sampling architecture on ZippyDB columnar substrate, symmetric bijective mapping for skew-corrected extrapolation, aggregator-leaf horizontal scaling (32 partitions), forward-index and FactTable constructs for high-cardinality and analytics queries. Operated at 40K QPS, p50 20ms / p90 100ms, 2 PB of storage across three regions. Directed a petabyte-scale live migration from HBase to RocksDB with added privacy controls and zero downtime across dependent systems. Technical Lead, Kubernetes — Google Mountain View, CA | Jul 2013 – Oct 2016\nCore contributor to Kubernetes\u0026rsquo; foundational networking layer during its formative years.\nCreated the software-defined networking layer for Kubernetes — the L4 SDN giving Kubernetes services stable virtual IPs backed by real endpoints. Code path still running in every Kubernetes cluster in production today. Built the Kubernetes DNS server for service discovery across microservices — one of the core primitives the Kubernetes data plane relies on. Earlier Roles Technical Lead, Smart Allocation — Walmart Labs (Nov 2012 – Jul 2013): Mixed-Integer-Programming-based supply-chain allocation models saving millions in shipping cost; concurrent simulator processing hundreds of millions of orders in minutes. Software Developer II, Visual Studio — Microsoft (Oct 2006 – Sep 2011): Custom virtualized WPF control rendering thousands of shapes in graphical designers; shipped as part of Visual Studio 2012 UML designers. Senior Software Engineer, Data Team — Visible Technologies (2010–2011): Hadoop/HBase map-reduce for author popularity ranking. Software Developer, Ordering Platform — Amazon (Dec 2004 – Oct 2006): Delivered RAPID, a Tier-1 partition-aware routing service core to Amazon.com ordering with high-availability and low-latency SLAs, serving US/Canada web traffic. Software Developer, R\u0026amp;D — Synygy (May 2002 – Dec 2004): Ported VC++ incentive-compensation code to SunOS; built SQL Server → Oracle data/schema migration tooling. PATENTS \u0026amp; PUBLICATIONS CompositeApplication CRD — Kubernetes custom resource for declarative multi-resource application composition (patent filed, Coupang). Open-source contributions: Netflix/metacat (Apache Iceberg support, merged); AWS Load Balancer Controller (multi-port, multi-cluster ingress patch, production at Coupang). EDUCATION M.S. Computer Engineering — University of Illinois, Chicago (Aug 2000 – May 2002) B.Tech — Veermata Jijabai Technological Institute (VJTI), Mumbai University (Jul 1996 – Jun 2000)\n","permalink":"/resume/","summary":"\u003ch1 id=\"abhishek-shah\"\u003eABHISHEK SHAH\u003c/h1\u003e\n\u003cp\u003eSunnyvale, CA  |  (650) 630-9280  |  \u003ca href=\"mailto:shahabhishek@gmail.com\"\u003eshahabhishek@gmail.com\u003c/a\u003e  |  LinkedIn: \u003ca href=\"https://www.linkedin.com/in/shahabhishek\"\u003e/in/abhishekshah\u003c/a\u003e\u003c/p\u003e\n\u003chr\u003e\n\u003ch2 id=\"summary\"\u003eSUMMARY\u003c/h2\u003e\n\u003cp\u003eSenior engineering leader with 25+ years architecting and operating cloud compute platforms at hyperscale. Currently \u003cstrong\u003eSenior Director at Coupang Intelligent Cloud (CIC)\u003c/strong\u003e, leading three engineering organizations — \u003cstrong\u003eAPI Platform, Kubernetes Platform, and Virtualization\u003c/strong\u003e — that together deliver Coupang\u0026rsquo;s internal GPU cloud and broader compute substrate. Lead Software Architect of CIC, founding member of the program, technical owner of CIC\u0026rsquo;s Kubernetes platform (Cortex) and the CompositeApplication CRD primitive (3 patents filed). Prior tenures at \u003cstrong\u003eNetflix\u003c/strong\u003e (designed and led the OpenConnect CDN control plane), \u003cstrong\u003eFacebook/Meta\u003c/strong\u003e (led the ad-audience platform processing a trillion updates per day), \u003cstrong\u003eGoogle\u003c/strong\u003e (built the original Kubernetes L4 SDN and DNS — code paths still running in every Kubernetes cluster today), and \u003cstrong\u003eRoblox\u003c/strong\u003e (next-gen pub/sub at 5M msgs/sec). Operate with influence across software, infrastructure, security, and partner-engineering organizations; partner-engineering relationships at the architecture level with NVIDIA, AWS, and Run:AI. Coupang Bar Raiser for senior and principal hiring.\u003c/p\u003e","title":""}]