Maya Platform at Coupang

The Maya platform virtualizes compute and network for our tenants on shared physical infrastructuire. This is currently a major in-flight program with dynamic tenancy on the bare-metal path. Tenants self-serve VMs (with GPU passthrough) or bare-metal instances; instances in the same nodepool share a tenant VPC (private network) and base image. Conceptually similar to AWS VPC (on the Nitro substrate) plus AWS Managed Node Groups, but built for our internal fleet.

Cortex Platform at Coupang

I’m the technical owner of Cortex, our custom Kubernetes platform for AI/ML workloads. It is a multi-tenant solution where our internal teams are offered resource guarantees within shared kubernetes clusters.

We replaced the default kube-scheduler. Cortex’s capability surface covers the full set : transactional gang scheduling for distributed training, fractional GPU allocation (0.5/0.25 shares), hierarchical parent-child queues for multi-tenant, fair-share and an iterative scenario-based preemption solver.

API Layer at Coupang

Our AI teams had to deal with different kubernetes primitives to get Notebooks and Inference apps working. We built a meta-CRD that allows the API team to define new app types. This allow ML engineers to define their apps with a single yaml file. Patent filed on a related CRD primitive (CompositeApplication).