<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Abhishek Shah</title>
    <link>/</link>
    <description>Recent content on Abhishek Shah</description>
    <image>
      <title>Abhishek Shah</title>
      <url>/images/papermod-cover.png</url>
      <link>/images/papermod-cover.png</link>
    </image>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 08 Jun 2024 12:00:00 -0700</lastBuildDate>
    <atom:link href="/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Building an Audience Estimation System</title>
      <link>/posts/metacocoon/</link>
      <pubDate>Sat, 08 Jun 2024 12:00:00 -0700</pubDate>
      <guid>/posts/metacocoon/</guid>
      <description>&lt;h2 id=&#34;cocoon&#34;&gt;Cocoon&lt;/h2&gt;
&lt;p&gt;Cocoon is a User-Count Estimating Query Engine on top of ZDB-On-Flash.
This system is an integral part of Ads Infra at Meta serving 40K qps under 100ms P90 latency.&lt;/p&gt;
&lt;p&gt;&lt;img alt=&#34;img&#34; loading=&#34;lazy&#34; src=&#34;../../assets/images/meta_cocoon.png&#34;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;how-does-cocoon-store-its-data&#34;&gt;How does Cocoon store its data&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Columnar Data Stored in Ordered Format&lt;/li&gt;
&lt;li&gt;UserID Format&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;how-is-the-data-stored-in-zdb&#34;&gt;How is the Data Stored in ZDB&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multiple UserIds per Row:&lt;/strong&gt; Each ZDB Row Value has multiple UserIds in it. Optional Attribute(s). Efficient Fetching.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Static Ranking:&lt;/strong&gt; Static Rank of UserIds stored for even distribution. Bi-Directional.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compression:&lt;/strong&gt; Compressed using Elias Fano Format. Quasi-Succinct. ~ Theoretical Best.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sorting:&lt;/strong&gt; ZDB Keys are sorted. Static Ranked UserIds are sorted across ZDB Keys and also, within each ZDB Value.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;how-is-the-query-done&#34;&gt;How is the Query Done&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Round 1:&lt;/strong&gt; Fetch 0th Bucket for each term in Query.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Round 2 (Optional):&lt;/strong&gt; Fetch more buckets to improve estimate.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;estimation--extrapolation&#34;&gt;Estimation &amp;amp; Extrapolation&lt;/h2&gt;
&lt;h3 id=&#34;query-time-sampling-not-data-load-time-sampling&#34;&gt;Query-Time Sampling (Not Data-Load Time Sampling)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Sampling of Data at Load Time leads to very poor estimation. Hard to determine how much to sample.&lt;/li&gt;
&lt;li&gt;We load all the data into Store. At Query Time, we determine how much we want to read till we are confident of a good estimate.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;extrapolation&#34;&gt;Extrapolation&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;In the &lt;code&gt;int64&lt;/code&gt; space, we determine what value we have reached at and what the hit count is.&lt;/li&gt;
&lt;li&gt;We then extrapolate.&lt;/li&gt;
&lt;li&gt;Srank-based ordering removes biases based on how userIds are created.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;horizontal-scaling&#34;&gt;Horizontal Scaling&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The data to process at query time is still a lot.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Horizontal Scaling to the rescue!&lt;/strong&gt; Divide the data and have parallel processing done on it using a TW Tier.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;leaf-layer&#34;&gt;Leaf Layer&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;32 Leafs per tier.&lt;/li&gt;
&lt;li&gt;Each Leaf reads only its own partition of data.&lt;/li&gt;
&lt;li&gt;2 Leafs can never read the same data&amp;hellip; Ever.&lt;/li&gt;
&lt;li&gt;32 Data Partitions exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;aggregator-layer&#34;&gt;Aggregator Layer&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Parses Query&lt;/li&gt;
&lt;li&gt;Throttles Callers&lt;/li&gt;
&lt;li&gt;Fans out query to Leafs&lt;/li&gt;
&lt;li&gt;Handles Leaf Fail Over&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;system-architecture-diagram&#34;&gt;System Architecture Diagram&lt;/h2&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;+-------------------------------------------------------------+     +-------------------------+
|                  [Aggregator / Indexer Tiers]               |     |       Graph API         |
|  +------------------+ +------------------+ +--------------+  |     | +---------------------+ |
|  |                  | |                  | |              |  |====&amp;gt;| |Cocoon PHP Component | |
|  |   Aggregator     | |   Aggregator     | |  Aggregator  |  |     | +---------------------+ |
|  |       ||         | |       ||         | |      ||      |  |     +------------^------------+
|  |    Indexer       | |    Indexer       | |   Indexer    |  |                  |
|  +--------^---------+ +--------^---------+ +------^-------+  |     +------------v------------+
+-----------|--------------------|------------------|----------+     |    Supported Features   |
            |                    |                  |                | - Reach &amp;amp; Frequency     |
+-----------v--------------------v------------------v----------+     | - Reach Estimate        |
|                          ZippyDB                             |     | - Bid Suggestion        |
|    +----------------+          +----------------+            |     | - Outcome prediction    |
|    | ZDB node PRN   |&amp;lt;========&amp;gt;|  ADB node LLA  |            |     | - Pacing                |
|    +-------^--------+          +--------^-------+            |     | - Analysis              |
|            |                            |                    |     | - Audience Insights     |
|            +-----------+    +-----------+                    |     +-------------------------+
|                        |    |                                |
|                    +---v----v---+                            |
|                    |ZDB node ATN|                            |
|                    +------------+                            |
+--------------------------------------------------------------+
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&#34;cocoon-ingestion-pathways&#34;&gt;Cocoon Ingestion Pathways&lt;/h3&gt;
&lt;h4 id=&#34;1-daily-delta-bulk-updates-per-data-source&#34;&gt;1. Daily Delta Bulk Updates (per data source)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;FB user profile&lt;/li&gt;
&lt;li&gt;AdEnv (device, os, placement)&lt;/li&gt;
&lt;li&gt;Friend connection (of page, group, application and event)&lt;/li&gt;
&lt;li&gt;Location and Geo info&lt;/li&gt;
&lt;li&gt;Interests&lt;/li&gt;
&lt;li&gt;Instagram info&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;2-real-time-updates-dispatchers-per-data-source&#34;&gt;2. Real-time Updates (Dispatchers per data source)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Custom audience change dispatcher&lt;/li&gt;
&lt;li&gt;Partner category change dispatcher&lt;/li&gt;
&lt;li&gt;Look alike change dispatcher&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    <item>
      <title>Building Our Own CDN</title>
      <link>/posts/netflixcdn/</link>
      <pubDate>Sat, 08 Jun 2024 12:00:00 -0700</pubDate>
      <guid>/posts/netflixcdn/</guid>
      <description>&lt;h1 id=&#34;building-our-own-cdn&#34;&gt;Building Our Own CDN&lt;/h1&gt;
&lt;h2 id=&#34;brief-history&#34;&gt;Brief History&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pre-2011:&lt;/strong&gt; Business contracts were maintained with external CDN companies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Traffic Allocation:&lt;/strong&gt; Guaranteed each external CDN a stable share of the streaming traffic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Simplicity of Steering:&lt;/strong&gt; All of the Video CDN Steering logic fit entirely within:
&lt;ul&gt;
&lt;li&gt;5 Java Classes&lt;/li&gt;
&lt;li&gt;~60 lines of code&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scale:&lt;/strong&gt; Handled 1/3 rd of peak internet traffic in the US.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strategic Shift:&lt;/strong&gt; A decision was made to build an in-house CDN.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;predictable-viewing-patterns&#34;&gt;Predictable Viewing Patterns&lt;/h2&gt;
&lt;p&gt;The foundational architecture of this custom CDN system relies on a predictable data distribution rule: &lt;strong&gt;a small fraction of content generates the bulk of internet traffic&lt;/strong&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title></title>
      <link>/about/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/about/</guid>
      <description>&lt;h2 id=&#34;maya-platform-at-coupang&#34;&gt;Maya Platform at Coupang&lt;/h2&gt;
&lt;p&gt;The Maya platform virtualizes compute and network for our tenants on shared physical infrastructuire.
This is currently a major in-flight program with dynamic tenancy on the bare-metal path. Tenants self-serve VMs (with GPU passthrough) or bare-metal instances; instances in the same nodepool share a  tenant VPC (private network) and base image. Conceptually similar to AWS VPC (on the Nitro substrate) plus AWS Managed Node Groups, but built for our internal fleet.&lt;/p&gt;</description>
    </item>
    <item>
      <title></title>
      <link>/resume/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>/resume/</guid>
      <description>&lt;h1 id=&#34;abhishek-shah&#34;&gt;ABHISHEK SHAH&lt;/h1&gt;
&lt;p&gt;Sunnyvale, CA  |  (650) 630-9280  |  &lt;a href=&#34;mailto:shahabhishek@gmail.com&#34;&gt;shahabhishek@gmail.com&lt;/a&gt;  |  LinkedIn: &lt;a href=&#34;https://www.linkedin.com/in/shahabhishek&#34;&gt;/in/abhishekshah&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;summary&#34;&gt;SUMMARY&lt;/h2&gt;
&lt;p&gt;Senior engineering leader with 25+ years architecting and operating cloud compute platforms at hyperscale. Currently &lt;strong&gt;Senior Director at Coupang Intelligent Cloud (CIC)&lt;/strong&gt;, leading three engineering organizations — &lt;strong&gt;API Platform, Kubernetes Platform, and Virtualization&lt;/strong&gt; — that together deliver Coupang&amp;rsquo;s internal GPU cloud and broader compute substrate. Lead Software Architect of CIC, founding member of the program, technical owner of CIC&amp;rsquo;s Kubernetes platform (Cortex) and the CompositeApplication CRD primitive (3 patents filed). Prior tenures at &lt;strong&gt;Netflix&lt;/strong&gt; (designed and led the OpenConnect CDN control plane), &lt;strong&gt;Facebook/Meta&lt;/strong&gt; (led the ad-audience platform processing a trillion updates per day), &lt;strong&gt;Google&lt;/strong&gt; (built the original Kubernetes L4 SDN and DNS — code paths still running in every Kubernetes cluster today), and &lt;strong&gt;Roblox&lt;/strong&gt; (next-gen pub/sub at 5M msgs/sec). Operate with influence across software, infrastructure, security, and partner-engineering organizations; partner-engineering relationships at the architecture level with NVIDIA, AWS, and Run:AI. Coupang Bar Raiser for senior and principal hiring.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
