Senior Site Reliability Engineer (ML Platform & GPU Infrastructure)

site reliability engineeringSREGPUmachine learningML platformKubernetesRayNvidia Tritondistributed computinginference servingC#PythonGocloud-nativeGKEEKSDevOpsautomationobservability

Key details

Salary

Not specified

Employment type

Permanent Full Time

Seniority

Senior

Years experience

5-10

Location

Grenoble, France; Paris, France

Full job description

Senior Site Reliability Engineer role focused on building and operating GPU-powered services for machine learning workloads. Responsibilities include managing Ray clusters on Kubernetes, optimizing Nvidia Triton inference servers, and collaborating with ML and infrastructure teams. Requires 5+ years experience in backend engineering, SRE or DevOps, strong Kubernetes skills, GPU workload experience, and programming skills in C#, Python, or Go. Bonus for knowledge of distributed computing frameworks, Nvidia Triton, TensorRT, and cloud-native GPU orchestration. Hybrid work model in Paris or Grenoble.

What you'll do

Build and operate GPU-powered services for machine learning workloads
Manage on-demand provisioning of Ray clusters on Kubernetes for scalable distributed computing
Design, maintain, and monitor ray-as-a-service systems
Deliver robust, self-service platform offerings
Optimize and operate high-performance inference services using Nvidia Triton
Ensure low-latency and high-throughput serving of deep learning models
Collaborate with ML engineers, data scientists, and infrastructure teams to deliver production-grade services

Requirements

Master's or PhD in Computer Science or equivalent experience
5+ years in backend engineering, SRE or DevOps
Strong experience with Kubernetes, especially in dynamic provisioning and custom operators
Hands-on experience with GPU workloads, ideally in ML training or inference contexts
Solid programming skills in C#, Python, Go, or similar languages
Passion for automation, observability, and building reliable services
Bonus: Familiarity with Ray or other distributed computing frameworks
Bonus: Knowledge of Nvidia Triton, TensorRT, or similar inference serving technologies
Bonus: Familiarity with cloud-native GPU orchestration (e.g., GKE, EKS, or on-prem equivalents)

Tech stack

KubernetesRayNvidia Triton Inference ServerC#PythonGoTensorRTGKEEKS

Benefits

Hybrid work model blending home and in-office experiencesLearning, mentorship & career development programsHealth benefits, wellness perks & mental health supportDiverse, inclusive, and globally connected teamAttractive salary with performance-based rewards and family-friendly policiesPotential for equity depending on role and level

Apply now

Ready to take the next step in your career? Click the button below to continue to the application process.

Continue to application Browse more jobs

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.

The Trade Desk

Business Development GM (Holdco)

New York, US•2 months ago

$134K – $245K

business developmentsalesagency

View job details→

TripleLift

Accountant

Detroit, United States; New York, US•2 months ago

$75K – $95K

accountingpayrollcompensation

View job details→

TripleLift

Associate Campaign Manager

Pune, India•2 months ago

ad opsprogrammaticcampaign management

View job details→