Software Engineer – AI Inference Infrastructure (Systems / Performance)
France, Paris
Competitive + equity
Our client is a deep-tech company building the infrastructure layer that allows AI models to run efficiently in production. They focus on inference: the systems, runtimes, and tooling required to deploy, operate, and scale AI models reliably and cost-effectively across heterogeneous hardware. This is not a product or application layer play. It is foundational infrastructure work.
A key technical breakthrough underpins their approach: novel sparse attention mechanisms for inference that shift significant memory and compute off GPUs and onto CPUs, enabling faster execution, lower cost, reduced GPU memory pressure, and extremely long context windows. The company is now transitioning toward monetisation, licensing inference engines and a production serving stack covering scaling, routing, monitoring, and observability.
The team is intentionally small, highly capable, and extremely efficient. There are no sprints, standups, or rigid rituals. Engineers operate with high autonomy. Decision-making is direct. The environment is calm, non-corporate, and free of internal politics. Hiring is treated as an existential decision: they would rather not hire than hire the wrong person.
The role:
This is a broadly scoped engineering role. There are no narrowly defined job specs. You will contribute to the inference stack, performance tooling, and production infrastructure. Day-to-day work spans designing frameworks and libraries, debugging low-level performance issues, integrating with specific hardware, rewriting major components, and reasoning through large external dependencies.
This role is not for engineers who need structured environments, bounded responsibilities, or defined career ladders. It is for people who are curious, self-directed, and genuinely comfortable working close to how computers actually function.
What you will work on:
- Inference runtime design and performance optimisation across CPUs, GPUs, and other accelerators
- Sparse attention mechanisms and memory/compute offload strategies
- Production serving infrastructure: scaling, routing, monitoring, observability
- Framework and library design for model deployment
- Low-level performance debugging and hardware integration
What makes a strong fit:
- Background in systems programming, performance engineering, low-level or hardware-adjacent work
- Comfort reasoning at multiple abstraction levels and working across an entire stack
- Strong first-principles thinking, particularly in immature or poorly-documented problem spaces
- High autonomy, intellectual honesty, and a bias toward doing over asking
- No requirement for process, titles, or narrow ownership boundaries
For more information please reach out to [email protected]
Share Job
Know someone who may be interested?
