Back to results

Software Engineer – AI Inference Infrastructure (Systems / Performance)

France, Paris
Competitive + equity

Our client is a deep-tech company building the infrastructure layer that allows AI models to run efficiently in production. They focus on inference: the systems, runtimes, and tooling required to deploy, operate, and scale AI models reliably and cost-effectively across heterogeneous hardware. This is not a product or application layer play. It is foundational infrastructure work.

A key technical breakthrough underpins their approach: novel sparse attention mechanisms for inference that shift significant memory and compute off GPUs and onto CPUs, enabling faster execution, lower cost, reduced GPU memory pressure, and extremely long context windows. The company is now transitioning toward monetisation, licensing inference engines and a production serving stack covering scaling, routing, monitoring, and observability.

The team is intentionally small, highly capable, and extremely efficient. There are no sprints, standups, or rigid rituals. Engineers operate with high autonomy. Decision-making is direct. The environment is calm, non-corporate, and free of internal politics. Hiring is treated as an existential decision: they would rather not hire than hire the wrong person.

The role:

This is a broadly scoped engineering role. There are no narrowly defined job specs. You will contribute to the inference stack, performance tooling, and production infrastructure. Day-to-day work spans designing frameworks and libraries, debugging low-level performance issues, integrating with specific hardware, rewriting major components, and reasoning through large external dependencies.

This role is not for engineers who need structured environments, bounded responsibilities, or defined career ladders. It is for people who are curious, self-directed, and genuinely comfortable working close to how computers actually function.

What you will work on:

Inference runtime design and performance optimisation across CPUs, GPUs, and other accelerators
Sparse attention mechanisms and memory/compute offload strategies
Production serving infrastructure: scaling, routing, monitoring, observability
Framework and library design for model deployment
Low-level performance debugging and hardware integration

What makes a strong fit:

Background in systems programming, performance engineering, low-level or hardware-adjacent work
Comfort reasoning at multiple abstraction levels and working across an entire stack
Strong first-principles thinking, particularly in immature or poorly-documented problem spaces
High autonomy, intellectual honesty, and a bias toward doing over asking
No requirement for process, titles, or narrow ownership boundaries

For more information please reach out to [email protected]

Contact:
Harry Kemp

apply now

Software Engineer – AI Inference Infrastructure (Systems / Performance)

France, Paris Competitive + equity

Related jobs

France, Paris
Competitive + equity