Back to articles

Deep Signal Quarterly – Q1 2025

17th April 2025

By Steve Kilpatrick
Founder & Director
ML Systems and Infrastructure

474 Tracked Repos | 72,024 Commits | 7,234 Contributors

THE SIGNAL

Our Q4 predictions: the serving/compiler convergence is on schedule for Q2, agent role fracture is visible in contributor patterns but hasn’t formalised in job titles yet, and the ExecuTorch-vs-MLX competition landed hard. ExecuTorch posted nearly 3,000 commits while MLX’s contributor base stabilised rather than growing. Meta is winning the edge talent war on volume. Two of three on target, one too early to grade.

The quarter’s headline: edge and on-device ML posted the largest acceleration of any category we track, +52% in commits on a contributor base that grew just 10%. That ratio is the number to carry into your next hiring standup. Existing engineers are shipping at dramatically higher velocity, not being joined by newcomers. Simultaneously, model optimization surged +46% in commits while pulling 36% more contributors, the only category where contributor growth outpaced commit growth by that margin. These two categories are converging. The engineers quantising models and the engineers deploying them to constrained devices are increasingly the same people. If your job descriptions still separate “optimization” from “edge deployment,” you’re splitting a talent pool that the market has already merged.

This edition tracks 7,234 contributors across 474 tracked repositories and 72,024 commits.

Q-over-Q Snapshot

Broad acceleration across the board, with inference runtimes the lone contraction: a stabilisation signal, not a decline.

Category	Commits	Contributors	Active Repos	Commits QoQ	Contribs QoQ
GPU Kernels & Performance	4,003	488	34	+34%	+13%
ML Compilers & Graph Optimization	9,203	640	26	+25%	+3%
Distributed Training & Parallelism	2,181	337	17	+17%	+17%
Inference Runtimes & Engines	2,573	390	22	-8%	+7%
LLM Serving & Inference	6,246	761	40	+27%	+15%
Training Frameworks & Model Architecture	18,310	2,079	57	+20%	+5%
ML Platform & Orchestration	6,625	799	27	+20%	+1%
Edge & On-Device ML	7,596	585	30	+52%	+10%
Model Optimization & Compression	1,908	267	20	+46%	+36%
Hardware-Software Co-Design	8,343	810	21	+28%	+17%
ML Debugging & Tooling	2,030	247	16	+8%	+4%
Agent Framework	3,006	874	5	+15%	+23%

What’s Moving

🌍 Edge & On-Device ML

The character of on-device work shifted from adoption-phase polish to active construction. ExecuTorch’s engineering concentrated in backend bring-up and operator coverage rather than examples or documentation. llama.cpp pulled over 200 contributors (150 new), with engineering focused on quantisation-aware inference paths and memory-mapped model loading. The churn rate across the category jumped 52%, confirming rapid influx of engineers new to on-device work.

The talent building on-device inference splits into two distinct profiles that rarely overlap. One group writes low-level runtime code (memory management, operator dispatch, hardware-specific backends). The other optimises model representations for constrained environments. Engineers deeply embedded in ExecuTorch rarely appear in llama.cpp’s contributor graph, and vice versa. Treating these as a single pool will produce misleading pipeline estimates.

🚀 LLM Serving & Inference

Serving infrastructure entered a new phase. vllm’s contributor base expanded sharply, with the engineering moving past model integration into scheduling internals, cache reuse strategies, and batch-shaping logic. This is systems engineering, not ML engineering. The skill profiles it demands look more like database kernel development than anything in a typical ML job description.

SGLang appeared with nearly all-new contributors. A major cloud provider project pulled 85 contributors concentrated in server-side orchestration and GPU resource management. The pattern: bases are expanding but engineering is fragmenting across competing approaches to scheduling, memory, and batching. For hiring, this fragmentation creates opportunity; engineers frustrated by architectural disagreements in one project become available to another. The contributor-level migration data here tells a more granular story; one we’re making available to a small number of hiring teams directly.

🔌 Hardware-Software Co-Design

Silicon vendors are staffing their open-source programmes faster than they can hire. A Tenstorrent project shipped nearly 2,000 commits from a team that grew 17% QoQ, with validation and test density that confirms production intent. Multiple other vendors deployed coordinated engineering across GPU kernel, compiler, and backend integration work, with contributor bases in the hundreds. The category overall grew 28% in commits and 17% in contributors.

New contributor overlap emerged this quarter between co-design and ML platform repos: hardware-aware engineers beginning to engage with orchestration tooling. If that crossover holds, it means the “full-stack accelerator engineer” profile (silicon to scheduler) is forming in the wild.

🧪 Training Frameworks & Model Architecture

The existing framework contributor base is working harder, not growing. The gap between commit growth and contributor growth is the widest of any major category. PyTorch’s energy concentrates on compiler-adjacent internals: graph compilation, autograd mechanics, distributed compilation paths. TensorFlow’s majority of work touches compiler backend consolidation rather than user-facing APIs.

This is the training plateau we’ve been watching. Scale-up refactors are cooling while inference-related complexity absorbs the operational energy. The 2,079 contributors represent the largest addressable pool in ML systems, but the subset doing genuinely novel work (compiler integration, distributed primitives, architecture research) is far smaller than the headline suggests.

🔢 Model Optimization & Compression

New contributors to quantisation and compression work are sticking around for the first time. The churn rate dropped 43% even as the contributor base grew sharply. A PyTorch project launched with engineering concentrated in INT4/FP8 inference paths and quantisation-aware training loops. Across the category, the engineering profile signals active, production-oriented development rather than the research experimentation that dominated a year ago. The test-to-feature ratio climbed noticeably, another maturity marker.

Quiet Corners

ML Compilers grew 25% in commits with just 3% contributor growth: fewer people, more output, the tightest labour market in ML systems. GPU Kernels jumped 34% in commits on 13% more contributors, driven by low-precision path expansion. Distributed Training grew symmetrically, refining checkpoint recovery and partitioning strategies rather than new approaches.

Inference Runtimes contracted in commits while adding contributors: stabilisation, not decline. ML Platform held the longest-tenured contributor base in our data. ML Debugging barely moved. Agent Framework added contributors without matching commit growth; the breadth-without-depth pattern persists.

Where Talent Is Moving

The compiler-framework overlap at 195 engineers remains the largest cross-category pair. What changed is direction: compiler-focused engineers are now appearing in framework codebases, specifically in subsystems where compilation meets execution. The serving-framework overlap at 147 engineers (up from 88 last quarter) confirms the downstream pull accelerating.

A less obvious but critical overlap: roughly 75 engineers active in both GPU kernels and hardware co-design. They cluster around vendor-specific stacks. For any company building on non-NVIDIA hardware, this pool represents a disproportionate share of the people who can make your accelerator perform. New overlaps emerged between co-design and platform repos. Early signal, not yet a trend. But if hardware-aware engineers start caring about orchestration, the hiring playbook changes.

Talent Migration: Contributor Overlap Between Categories

What This Means If You’re Hiring

Edge and on-device engineers present the most urgent sourcing challenge. The pool is growing (585 contributors, up 10%) but bifurcating. Runtime engineers who write operator dispatchers for mobile silicon and optimization engineers who compress models for deployment are diverging into separate tracks. Sourcing for “edge ML engineer” as a single role will miss half the candidates you need. The ExecuTorch and llama.cpp contributor bases barely overlap; pick which profile matters more for your stack and source accordingly.

LLM serving engineers are the most contested hire in ML infrastructure. vllm’s new contributors are mostly integration-layer participants; core systems engineers number in the low dozens across all serving projects. Staff-level serving specialists with kernel depth command $700K to $1.3M total comp. The 57% churn spike creates brief sourcing windows: engineers between projects are reachable for weeks, not months.

Compiler engineers remain the scarcest profile. The category added 3% more contributors despite 25% more commits. The existing cohort is simply working harder. Reaching them requires understanding which specific technical problems they own and what would motivate a move. ML compiler roles benchmark at $250K to $450K+ total comp, with the 30-50% specialist premium holding steady.

If any of these hiring pressures match what you’re experiencing, that’s worth discussing before Q2 reshuffles the deck.

Predictions

By Q3 2024: ExecuTorch will surpass llama.cpp in total quarterly contributors as Meta accelerates its on-device strategy, creating genuine talent competition between two well-funded edge projects.
By year-end 2024: Model optimization will absorb into adjacent categories (serving, edge, kernels) rather than persisting as a standalone hiring track. Quantisation expertise becomes a baseline expectation for senior inference engineers.
Watch for Q2 2024: At least one major cloud provider will announce a dedicated ML compiler hiring initiative (20+ headcount) as the 3% contributor growth rate collides with accelerating demand from serving and edge teams.

Every category that feeds production inference grew this quarter. Every category that feeds training held flat or contracted. The market has voted. The question is whether your hiring plan has caught up.

This report is powered by D33P S1GNL: a proprietary contributor intelligence engine. For access to the full contributor-level dataset or to discuss ML Systems hiring, contact [email protected]

Get our latest articles and insight straight to your inbox

Hiring Machine Learning Talent?

We engage exceptional humans for companies powered by AI

Find Out More > View Jobs >