Back to articles

Deep Signal Quarterly – Q4 2024

15th January 2025
By Steve Kilpatrick
Founder & Director
ML Systems and Infrastructure

474 Tracked Repos  |  74,748 Commits  |  7,268 Contributors

THE SIGNAL


Our Q3 predictions: the PyTorch optimization project crossing 100 contributors hasn’t landed yet (79 this quarter), but it absorbed work from standalone quantisation libraries as called. vllm’s core team did tighten review processes; serving commit velocity dropped 23% while contributor counts held, exactly the quality-over-quantity shift we projected. Fine-tuning optimisation as a distinct category is forming but hasn’t formalised. Roughly two of three directionally correct, one miss on magnitude.

GPU kernel engineering posted a 32% commit surge with contributor growth lagging at 9%. The same people are writing dramatically more code. Churn spiked 130%, meaning files are being rewritten, not extended. That signals deep architectural rework, not incremental patches. Meanwhile, serving infrastructure contracted 23% in commits with contributor counts flat. The conventional read is that serving cooled. The better read: serving absorbed what it needed from kernel and compiler layers, and the engineering gravity shifted to runtime and scheduling problems that only surface under production load.

This edition tracks 7,268 contributors across 474 tracked repositories and 74,748 commits.

Q-over-Q Snapshot


Kernel and platform categories absorbed energy while serving, agents, and distributed training contracted; the table captures a market rotating from scale-out to cost-per-token optimisation.

Category Commits Contributors Active Repos Commits QoQ Contribs QoQ
GPU Kernels & Performance 5,861 686 44 +32% +9%
ML Compilers & Graph Optimization 12,468 713 30 +4% +1%
Distributed Training & Parallelism 1,660 293 19 -12% -8%
Inference Runtimes & Engines 1,802 374 19 -6% +1%
LLM Serving & Inference 5,780 873 42 -23% -2%
Training Frameworks & Model Architecture 18,890 2,041 61 +2% +1%
ML Platform & Orchestration 6,722 871 29 +1% +8%
Edge & On-Device ML 7,571 555 29 -6% +0%
Model Optimization & Compression 2,938 246 25 +7% -6%
Hardware-Software Co-Design 7,296 913 19 -7% -3%
ML Debugging & Tooling 2,089 271 17 -11% -1%
Agent Framework 1,671 491 5 -26% -20%
Top Projects by Contributor Count

Whatโ€™s Moving


โš™๏ธ GPU Kernels & Performance

Architectural rewrites, not feature sprints, drove the biggest commit surge across all categories. The 130% churn rate tells you engineers are tearing out and replacing core compute paths. A major AMD project pulled in 37 new contributors with work concentrated on tile-level abstractions and attention primitives rather than one-off operator additions. Thatโ€™s a project building foundational infrastructure, not polishing what exists.

One vendorโ€™s device-side library project touched over 2,500 files with just 33 contributors, a ratio pointing to deep infrastructure refactoring. Coordination patterns show kernel work is increasingly collaborative; solo-contributor repos are fading from the data. A kernel benchmarking initiative launched with all-new contributors focused on performance measurement across non-NVIDIA accelerators. Small in absolute terms, but the pattern mirrors how other successful multi-vendor kernel projects started. The contributor-level migration data here tells a more granular story; one weโ€™re making available to a small number of hiring teams directly.

๐Ÿš€ LLM Serving & Inference

Servingโ€™s commit decline masks a maturation that changes what the role means in practice. vllm still anchors the category, but the work shifted: scheduling logic, cache reuse strategies, and tail-latency reduction under bursty load patterns now dominate. The engineering is stabilisation-oriented rather than feature-expansionary.

Hardware-specific serving forks targeting non-NVIDIA accelerators pulled in 72 new contributors concentrated on serving-layer optimisations and kernel integration. That level of new-contributor influx into forks, not upstream projects, signals that production teams are customising serving stacks. For hiring: the serving engineer you need in Q1 2025 is not the same profile as Q1 2024. Twelve months ago, you wanted someone who could stand up a serving stack and tune batch sizes. Now you need engineers who understand scheduler internals, memory pressure under concurrent requests, and hardware-specific dispatch. The pool didnโ€™t shrink; it specialised.

๐Ÿงช Training Frameworks & Model Architecture

Flat toplines conceal a compositional shift. PyTorch concentrated on compiler integration rather than training APIs. TensorFlowโ€™s commits are dominated by compiler backend consolidation. Both are converging: the training frameworkโ€™s value lives in its compilation pipeline, not its user-facing API. That convergence reshapes the talent profile; framework roles now demand compiler internals knowledge.

A video generation project launched blending diffusion model work with distributed training primitives. That intersection is producing a new hybrid profile. Contrast this with Hugging Faceโ€™s transformers, where the majority of contributors were new and the documentation-heavy profile confirms onboarding-phase activity. The gap between these profiles defines the hiring challenge.

๐Ÿ—๏ธ ML Platform & Orchestration

Platform engineering quietly posted the strongest contributor growth: 8% more people with the highest engagement depth in the dataset. Engineers here arenโ€™t passing through; theyโ€™re embedded at over 10 mean active weeks. New contributor crossover emerged between platform and both hardware co-design and debugging categories: a leading indicator that platform teams are absorbing hardware-awareness and observability concerns that used to live in separate organisations.

๐Ÿง  ML Compilers & Graph Optimization

Compiler engineering held steady while internal composition shifted. The predicted split between graph-level and hardware-targeting profiles materialised. Graph projects focused on autotuning and cost-per-token optimisation. Hardware projects concentrated on backend lowering passes and runtime integration. Churn dropped 50%, the sharpest decline across all categories. Combined with test-heavy profiles, this reads as compiler work entering stabilisation. For hiring: compiler engineers available in Q1 2025 are more likely to be specialists whoโ€™ve finished a stabilisation cycle than generalists exploring new projects.

Quiet Corners


Edge & On-Device ML held contributors flat while commits dipped; ExecuTorch absorbed most energy on backend integration and model optimisation for mobile targets. Model Optimization grew commits 7% from 6% fewer contributors; concentrated teams doing high-output work. Hardware Co-Design contracted modestly; hardware abstraction layer work sustained focus across vendor projects.

Inference Runtimes saw the highest coordination patterns of any project in the dataset, pointing to tightly coordinated cross-team development. Distributed Training cooled for the third consecutive quarter; elastic training was the only bright spot. ML Debugging contracted across the board. Agent Framework posted the steepest decline: commits down 26%, contributors down 20%. The documentation-heavy commit profile suggests adoption-phase churn.

Where Talent Is Moving


The compiler-framework overlap at 193 contributors remains the largest cross-category pair, growing for four consecutive quarters. These are engineers extending compiler pipelines in framework repos, and hiring teams posting โ€œframework engineerโ€ roles while screening for API experience are missing the candidates who move the needle.

The serving-framework overlap (well over a hundred engineers) confirms the downstream migration is now permanent. New overlaps emerged between distributed training and edge, and between hardware co-design and platform. The first likely reflects engineers bringing parallelism expertise to on-device model partitioning. The second suggests platform teams are starting to own hardware-specific deployment paths. The kernel-to-hardware overlap held steady, but directionality shifted: kernel engineers are spending more time in hardware abstraction layers, likely driven by multi-vendor kernel work accelerating.

Talent Migration: Contributor Overlap Between Categories

What This Means If Youโ€™re Hiring


Kernel engineers are the scarcest and most productive profile in the Q4 data. The 130% churn rate means theyโ€™re mid-rewrite. Pulling someone out of an active architectural overhaul requires more than competitive comp; it requires a technical pitch at least as compelling as the problem theyโ€™re solving today. Staff-level kernel specialists command $700K to $1.3M total comp with CUDA or Triton depth at the upper end.

Serving engineers who understand scheduler internals and hardware-aware dispatch are a different profile than twelve months ago. Generalist โ€œdeploy the modelโ€ engineers are abundant; โ€œoptimise the serving runtime under production loadโ€ engineers are rare. The overlap between serving and inference runtime categories is your sourcing shortcut.

Cross-domain profiles (compiler + kernel, training + serving, hardware + runtime) represent the highest-impact hires. The 193-person compiler-framework overlap and 84-person kernel-framework overlap are not large numbers globally. Teams that identify training-infrastructure engineers whose skills transfer to inference optimisation still have a window, but itโ€™s closing.

If any of these patterns match what youโ€™re tracking internally, we should talk before the Q1 reshuffle.

Predictions


  • Kernel rewrite completion (Q1-Q2 2025): The 130% churn is unsustainable beyond two quarters. By Q2, expect churn to normalise below 50% as architectural rewrites stabilise. That transition will briefly increase the reachability of kernel engineers whoโ€™ve been heads-down for six months.
  • Serving role bifurcation (by mid-2025): Serving job postings will split into โ€œserving platformโ€ (scheduling, fleet, orchestration) and โ€œserving performanceโ€ (kernel selection, cache policy, quantisation dispatch) at companies running inference at scale.
  • Video generation infrastructure (Q2 2025): The distributed-training-meets-diffusion overlap signals a new role type. At least three major labs will post roles explicitly combining distributed systems and video generation pipeline expertise.

The throughline for this quarter is rotation. Training frameworks held steady. Compilers held steady. But the nature of the engineering shifted: more stabilisation, more test infrastructure, more cross-layer integration. The categories that grew in raw output are doing foundational rewrites. The ones that shrank are where the easy architectural wins already shipped.


This report is powered by D33P S1GNL: a proprietary contributor intelligence engine. For access to the full contributor-level dataset or to discuss ML Systems hiring, contact [email protected]

Share Article

Get our latest articles and insight straight to your inbox

Hiring Machine Learning Talent?

We engageย exceptional humans for companies powered by AI

Upload your CV
One of our consultants will contact you
to find out more about you