Machine Learning Systems Engineer – LLM Infrastructure
California, Palo Alto, USA
$200,000 – $350,000 DOE + equity
Our client is an early-stage AI company working at the frontier of large language model performance and infrastructure. They are building next-generation language models designed to be dramatically faster and more efficient than traditional architectures, tackling core breakthroughs in model performance from first principles.
The role:
You will work directly with a small, elite technical team on performance-critical ML infrastructure at the core of next-generation AI products. This is an opportunity to work close to the metal on training and inference systems that matter.
What you will do:
Build and optimise infrastructure for large-scale model training and inference
Improve performance, reliability, and scalability across production ML systems
Work with model serving and inference frameworks to support low-latency, high-throughput deployment
Collaborate closely with researchers and engineers to productionise cutting-edge model breakthroughs
Develop internal tooling and systems for efficient experimentation and deployment
What you will need:
2 to 5 years of experience in ML systems engineering
Strong experience with infrastructure for training and/or inference systems
Hands-on experience with vLLM, TensorRT, ONNX Runtime, or similar serving frameworks
Strong Python skills and solid understanding of CUDA-enabled ML systems
Experience with PyTorch or TensorFlow
Familiarity with Docker, Kubernetes, and cloud platforms (AWS or Azure)
Strong systems mindset with the ability to work close to model and infrastructure internals
Visa sponsorship available.
For more information please reach out to [email protected]
Share Job
Know someone who may be interested?
