Senior ML Inference Engineer – GPU Systems / Compiler Optimisation
California, San Francisco, USA
$200,000 – $350,000 DOE + equity
Our client is a fast-growing deep-tech company building next-generation software tools for a highly specialised, precision-critical engineering domain. Their platform leverages advanced machine learning and high-performance computing to dramatically accelerate complex, computationally intensive workflows, delivering order-of-magnitude runtime improvements where accuracy is non-negotiable.
The role
A critical role focused on reducing latency, improving throughput, and ensuring ML models and high-performance systems run efficiently at scale. You will work closely with a small, elite engineering team and own the performance layer of a production system handling some of the most demanding computational workloads in the industry.
What you will do
- Analyse model architectures and high-performance pipelines to identify and remove runtime and inference bottlenecks
- Optimise end-to-end GPU pipelines, including custom ML model execution, kernel tuning, and data I/O workflows
- Deploy and scale optimised systems across multi-GPU infrastructure
- Collaborate closely with ML engineers to improve model efficiency using PyTorch, CUDA, and low-level GPU tooling
- Support production readiness with a focus on correctness, reliability, and continuous performance improvement
What you will need
- 5 to 10 years of experience in ML engineering or high-performance systems engineering
- Strong experience optimising production inference or compute-intensive software systems
- Deep knowledge of GPU programming and performance tuning
- Hands-on experience with PyTorch and CUDA
- Experience deploying and scaling systems across cloud and on-premise GPU infrastructure
- Strong Python and systems programming skills (Rust or C++ a plus)
- Background in high-performance or large-scale ML systems
For more information please reach out to [email protected]
Share Job
Know someone who may be interested?
