Private Draft

The 29 personas behind AI

We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.

Shaped by Industry Experts
Kumar Chellapilla
Kumar ChellapillaVPE
Jennifer Anderson
Jennifer AndersonVPE / Stanford PhD
Thuan Pham
Thuan PhamCTO
Akash Garg
Akash GargCTO
Linghao Zhang
Linghao ZhangResearch Engineer
Wayne Chang
Wayne ChangEarly FB Engineer
Indrajit Khare
Indrajit KhareEM & Head of Product
← ATOMS & ENERGYUSERS & MARKETS →
← Back

Post-Training

Shapes model behavior via RL
Post-Training

Known as: Member of Technical Staff, Research Engineer, RL Engineer

Modifies model weights to turn base models into deployment-ready systems: instruction-following, stronger reasoning on multi-step tasks, steerable behavior, and safer/more reliable outputs. Uses fine-tuning and preference-based optimization (often with reinforcement learning) to shape behavior. This is where many "what it's like to use" changes occur.

Specializations

Instruction Tuning Supervised fine-tuning on demonstrations to teach instruction-following, format compliance, and task-specific behaviors. The baseline for deployment-ready models.
RLHF / DPO / Preference Optimization Preference-based optimization (RLHF, DPO, and variants) shapes behavior using human preference data. A parallel and increasingly dominant track — RL from verifiable rewards (RLVR) — uses clean reward signals (code compilation, test passage, mathematical proof verification) instead of human preferences. Different data requirements and failure modes (reward hacking on verifiable tasks), but the same optimization infrastructure and often the same team.
Reasoning & Chain-of-Thought Training Training models to reason step-by-step, use tools, and solve multi-step problems. The primary consumer of RLVR: verifiable reasoning tasks (math, code, formal proofs) provide the clean reward signal that scales RL. Test-time compute — scaling search and verification at inference — is now a distinct scaling axis alongside pre-training scale. Uses methods like GRPO that operate on the same infrastructure and optimization loops as pre-training; the boundary between post-training and training is dissolving here faster than anywhere else. A primary differentiator at frontier labs.
Safety Tuning Modifying model behavior for safety: refusals, boundary enforcement, harmful content reduction, and safety-capability tradeoff management. Translates safety eval findings into weight changes — RLHF runs targeted at specific failure modes, with constant tension between safety and capability regression.

The strategic split between pre-training compute and RL compute is a live frontier decision — labs are only beginning to scale RL compute and expect to increase it dramatically. This changes the hiring weight: more RL infrastructure and reward engineering, less data-mixing optimization. The surface area of RL environments is expanding fast — computer use (GUI navigation, web browsers, desktop applications) is now a distinct training domain alongside code, math, and tool use.

[1]Substrate
[2]Compute
[3]Intelligence
Primary

Modifies model weights via RLHF, DPO, and RL to shape behavior, reasoning, and safety.

[4]Systems
Secondary

Defines how models behave in deployment — safety tuning, instruction following, and behavioral guardrails.

[5]Distribution
Nate Walker
Nate Walker
Anthropic
Instruction tuning

Turns base models into compliant assistants via supervised fine-tuning, format hardening, and behavior shaping.

Alesia Trudie
Alesia Trudie
OpenAI
Preference optimization

Owns preference datasets, reward signals, and the RLHF/DPO/RLVR optimization loop that shapes model behavior.

Jon Richards
Jon Richards
DeepMind
Safety tuning

Drives refusal boundaries, policy adherence, and behavior regression gating via tight eval loops.

Early-Stage
Rare
Growth
Occasional
Enterprise
Primary

Frontier labs for RLHF. Growth-stage occasionally hires for fine-tuning.

Let’s Find Your Next Builder

If you’re hiring at the AI frontier, let’s talk.