Back to articles

Deep Signal Quarterly – Q2 2025

13th July 2025

By Steve Kilpatrick
Founder & Director
ML Systems and Infrastructure

474 Tracked Repos | 96,160 Commits | 8,663 Contributors

THE SIGNAL

Our Q1 predictions: serving role bifurcation is tracking, with new serving projects growing around distinct hardware targets and architectural approaches. AMD kernel contributor parity with NVIDIA hasn’t arrived yet but the gap closed further (AMD’s open-source kernel contributor base grew 25% this quarter). Training framework plateau: wrong. Frameworks grew 17% in commits and 14% in contributors, driven by compiler-adjacent work and RL infrastructure. One of three confirmed, one miss, one on track.

The quarter’s defining number: file churn across the four highest-growth categories spiked between 149% and 212% QoQ. Codebases are being restructured at their foundations. The engineers doing this work are building institutional knowledge that will be extraordinarily difficult to replicate once they settle in. LLM serving pulled 1,489 contributors (up 31%), while distributed training surged 50% in commits driven almost entirely by new projects that didn’t exist two quarters ago. Cost-per-token pressure is pulling engineering talent out of training-time work and into inference-time optimisation, kernel tuning, and compiler passes that directly affect serving economics.

This edition tracks 8,663 contributors across 474 tracked repositories and 96,160 commits.

Q-over-Q Snapshot

Broad acceleration across inference-adjacent categories, with distributed training’s outsized jump reflecting new project formation rather than organic growth.

Category	Commits	Contributors	Active Repos	Commits QoQ	Contribs QoQ
GPU Kernels & Performance	9,823	980	56	+23%	+25%
ML Compilers & Graph Optimization	14,291	769	31	+11%	+5%
Distributed Training & Parallelism	3,630	493	28	+50%	+44%
Inference Runtimes & Engines	2,297	386	21	+13%	+2%
LLM Serving & Inference	10,849	1,489	55	+46%	+31%
Training Frameworks & Model Architecture	21,780	2,414	67	+17%	+14%
ML Platform & Orchestration	7,142	823	28	+6%	+4%
Edge & On-Device ML	9,788	606	32	+6%	+0%
Model Optimization & Compression	2,885	269	26	-12%	+11%
Hardware-Software Co-Design	9,775	1,075	24	+14%	+8%
ML Debugging & Tooling	2,125	310	18	+11%	+15%
Agent Framework	1,775	376	6	+10%	-23%

What’s Moving

🚀 LLM Serving & Inference

Serving absorbed more engineering energy than any other category, and the nature of the work changed fundamentally. vllm crossed 400 contributors with 251 of them new. That intake, combined with deep work on async scheduling and cache reuse, points to active architectural expansion. Coordination patterns show collaborative systems work, not drive-by patches.

SGLang and TensorRT-LLM both ran at full activity across their contributor bases, with engineering concentrated in structured generation scheduling and CUDA graph capture respectively. Alternative serving architectures are rising: over half the contributors to emerging serving projects are brand new, building runtime orchestration rather than model-level serving logic. One pattern worth noting: several mid-tier serving projects saw contributor engagement drop below four active weeks even as commit counts rose. Short engagement paired with rising output means experimenters, not builders. The contributor-level migration data here tells a more granular story; one we’re making available to a small number of hiring teams directly.

⚙️ GPU Kernels & Performance

AMD’s open-source kernel investment reached a scale that hiring teams can no longer treat as secondary. Multiple AMD projects onboarded over 200 contributors, with engineering concentrated on operator libraries and test infrastructure for attention and matmul variants. That’s the kind of kernel-level muscle that creates long-term hiring lock-in.

A PyTorch kernel compiler project drew nearly all-new contributors: a greenfield effort attracting compiler-adjacent kernel engineers. File churn across GPU kernel repos jumped 212% QoQ; engineers are restructuring foundational code, not polishing edges. By Q4, expect more hyperscaler-backed kernel projects competing for the same thin pool of engineers who understand both compiler IRs and GPU memory hierarchies.

🧪 Training Frameworks & Model Architecture

The largest category by volume is undergoing a quiet internal reorganisation. PyTorch’s commit profile tilted further toward compiler and code-generation internals. TensorFlow showed a similar pattern, with engineering directed at compiler backend consolidation. “PyTorch engineer” and “TensorFlow engineer” are increasingly compiler engineering roles.

Hugging Face transformers absorbed over 300 contributors, but the documentation-heavy mix confirms adoption-scaling, not deep engineering. Across distributed training and RL infrastructure, over 280 contributors focused on reward model distribution and policy training loops, reinforcing our prediction that RLHF infrastructure is separating from general training work.

🔌 Hardware-Software Co-Design

One major hardware startup’s combined output across runtime and compiler projects engaged around 240 contributors, with engineering split between hardware bring-up and compiler work. The test-heavy commit profile signals a transition from prototype to production-grade tooling. That maturity shift changes the hiring profile: hardware startups now need engineers who can harden and optimise, not just explore.

The broader vendor presence in this category is substantial, with coordination patterns among the highest we track. Mean active weeks near 9 confirms deeply embedded contributors unlikely to respond to cold outreach. A userspace communication library targeting GPU-to-GPU data movement ran from just 8 engineers across a massive file footprint: foundational infrastructure being laid by a minuscule team.

🧠 ML Compilers & Graph Optimization

Compiler engineering grew slower than inference or kernels, but engagement depth tells a different story. Mean active weeks hit 9.3, the highest of any category. Graph-level projects refined autotuning and cost-model passes for token-level efficiency. A major compiler-runtime hybrid project led the category with nearly 4,000 commits, but its engineering spans serving runtime, language tooling, and platform integration. These engineers are developing a cross-cutting skillset that doesn’t map to traditional compiler job descriptions.

Quiet Corners

Inference Runtimes grew 13% in commits but contributors barely moved; consolidating around deeply engaged teams. Distributed Training surged on new project formation; an actor-mesh project and communication primitives absorbed most of the new contributor energy. ML Debugging grew 15% in contributors, with evaluation harness projects forming a micro-talent-pool.

Agent Framework grew commits 10% while shedding 23% of contributors; the experienced base contracted while new arrivals focus on integration and documentation. ML Platform held steady; churn declined 19%, the only category where it did. Edge & On-Device ML held flat; ExecuTorch dominated with backend integration work while llama.cpp’s community pushed quantisation-aware inference. Model Optimization saw more contributors doing less work per person; quantisation expertise is being absorbed into serving and kernel repos.

Where Talent Is Moving

The strongest cross-pollination runs between compilers and training frameworks: 211 contributors worked across both, the largest overlap. The GPU kernel-to-hardware co-design overlap at 122 reflects AMD and Intel expanding their kernel libraries. The directionality matters: contributors who start in hardware co-design repos migrate toward kernel performance work, not the reverse. If you’re hiring kernel engineers, hardware backend teams are your upstream pipeline.

A less obvious overlap: roughly 50 contributors worked across both GPU kernels and LLM serving. These engineers understand both CUDA-level primitives and the serving-system context in which they run. Fifty people, globally. That pool is small enough to enumerate but large enough to build a team from, if you know where to look.

Talent Migration: Contributor Overlap Between Categories

What This Means If You’re Hiring

Serving infrastructure engineers are the most contested hire in ML systems. The contributor base grew 31%, but the new entrants skew toward integration work, not runtime internals. Fewer than 300 of the 1,489 contributors maintained engagement across 10+ weeks. If your job description says “LLM serving engineer” without specifying runtime-systems depth or model-integration breadth, you’re fishing in the wrong pond. Staff-level serving specialists command $700K to $1.3M total comp.

Kernel and compiler engineers command a 30-50% premium over generalist ML engineers. The combined pool is 1,749 people, but only 76 work across both categories and into hardware co-design. Moving them requires more than a competitive offer; it requires a technical problem they can’t solve where they are.

Cross-domain profiles (compiler + training, kernel + serving, distributed + RL) are where scarcity is most acute. The talent overlap data shows these intersections involve dozens of people, not hundreds. If you’re building a team that needs this combination, waiting for inbound applications is not a strategy.

If any of these patterns match what you’re seeing in your own pipeline, that’s a conversation worth having before Q3 reshuffles the map.

Predictions

Watch for Q3: PyTorch’s kernel compiler project will cross 50 contributors, establishing kernel-compiler hybrid engineering as a distinct talent category. If it doesn’t, the greenfield energy is dissipating.
By year-end: At least one major cloud provider will launch a dedicated AMD kernel engineering team sourced primarily from AMD’s open-source contributor base, pulling 15-20 engineers out of the pool in a single move.
Q3 signal: Agent framework contributor counts will stabilise or decline further, while agent-adjacent infrastructure (guardrails, evaluation, orchestration) will grow 20%+ as the category matures from demo-ware to production tooling.

The file churn numbers tell the real story this quarter. Codebases are being rewritten from the inside, and the engineers doing that work are accumulating context that makes them harder to hire with every passing week.

This report is powered by D33P S1GNL: a proprietary contributor intelligence engine. For access to the full contributor-level dataset or to discuss ML Systems hiring, contact [email protected]

Get our latest articles and insight straight to your inbox

Hiring Machine Learning Talent?

We engage exceptional humans for companies powered by AI

Find Out More > View Jobs >