We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.







Summary
Known as: Serving Platform Engineer, Inference Infrastructure Engineer, GPU Platform Engineer, ML Serving Engineer
Builds and operates the platform that serves models at scale: GPU scheduling, multi-tenant serving, capacity management, cost attribution, and cloud compute procurement.
Specializations
Where the Work Lives
GPU scheduling, capacity management, and cloud compute procurement for inference workloads.
Builds the multi-tenant serving platform — request routing, isolation, and cost attribution.
Candidate Archetypes
Turns a GPU pool into an isolatable, reliable serving platform with routing and priority semantics.
Makes cost-per-token legible and optimizable as a business control surface.
Arbitrages reserved and spot capacity across providers to keep inference supply ahead of demand.
Company Scale
Inference companies building on serving runtimes like vLLM and SGLang (Anyscale, Modal, Together, Fireworks) and frontier lab serving teams.
Featured Roles
If you’re hiring at the AI frontier, let’s talk.