AdTechTalent
Engineering13 days agoHybrid

Criteo

Senior Site Reliability Engineer (ML Platform & GPU Infrastructure)

site reliability engineeringSREGPUmachine learningML platformKubernetesRayNvidia Tritondistributed computinginference servingC#PythonGocloud-nativeGKEEKSDevOpsautomationobservability

Key details

Salary

Not specified

Employment type

Permanent Full Time

Seniority

Senior

Years experience

5-10

Location

Grenoble, France; Paris, France

Full job description

Senior Site Reliability Engineer role focused on building and operating GPU-powered services for machine learning workloads. Responsibilities include managing Ray clusters on Kubernetes, optimizing Nvidia Triton inference servers, and collaborating with ML and infrastructure teams. Requires 5+ years experience in backend engineering, SRE or DevOps, strong Kubernetes skills, GPU workload experience, and programming skills in C#, Python, or Go. Bonus for knowledge of distributed computing frameworks, Nvidia Triton, TensorRT, and cloud-native GPU orchestration. Hybrid work model in Paris or Grenoble.

What you'll do

  • Build and operate GPU-powered services for machine learning workloads
  • Manage on-demand provisioning of Ray clusters on Kubernetes for scalable distributed computing
  • Design, maintain, and monitor ray-as-a-service systems
  • Deliver robust, self-service platform offerings
  • Optimize and operate high-performance inference services using Nvidia Triton
  • Ensure low-latency and high-throughput serving of deep learning models
  • Collaborate with ML engineers, data scientists, and infrastructure teams to deliver production-grade services

Requirements

  • Master's or PhD in Computer Science or equivalent experience
  • 5+ years in backend engineering, SRE or DevOps
  • Strong experience with Kubernetes, especially in dynamic provisioning and custom operators
  • Hands-on experience with GPU workloads, ideally in ML training or inference contexts
  • Solid programming skills in C#, Python, Go, or similar languages
  • Passion for automation, observability, and building reliable services
  • Bonus: Familiarity with Ray or other distributed computing frameworks
  • Bonus: Knowledge of Nvidia Triton, TensorRT, or similar inference serving technologies
  • Bonus: Familiarity with cloud-native GPU orchestration (e.g., GKE, EKS, or on-prem equivalents)

Tech stack

KubernetesRayNvidia Triton Inference ServerC#PythonGoTensorRTGKEEKS

Benefits

Hybrid work model blending home and in-office experiencesLearning, mentorship & career development programsHealth benefits, wellness perks & mental health supportDiverse, inclusive, and globally connected teamAttractive salary with performance-based rewards and family-friendly policiesPotential for equity depending on role and level

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.