Full job description
Senior Site Reliability Engineer role in Analytics Infrastructure at Criteo. Full-time, hybrid position based in Paris and Grenoble, France. Responsibilities include designing and operating scalable, resilient distributed analytics platforms, managing data platforms like Vertica, Presto, Druid, and Tableau, improving automation and orchestration with Chef, Docker, Mesos, Kubernetes, and building internal tooling primarily in Scala. Participate in on-call rotations, incident response, and contribute to platform reliability and migrations. Requirements include 5+ years in backend engineering or SRE, strong programming skills in Scala/Java and scripting in Python, deep knowledge of distributed systems and large-scale data systems, experience with Linux/Unix, containerization, CI/CD, and observability practices. Strong collaboration, communication, and ownership mindset required. Benefits include hybrid work model, career development, health and wellness support, inclusive culture, competitive salary, and potential equity.
What you'll do
- Design, operate, and evolve distributed analytics infrastructure focusing on scalability, resilience, and performance
- Manage critical data platforms including Vertica, Presto, Druid, Tableau, and related systems
- Ensure reliability and low-latency access for a broad range of users and services
- Develop and maintain automation, deployment, and orchestration systems using Chef, Docker, Mesos, and Kubernetes
- Build internal tooling and operational frameworks primarily in Scala, with opportunities to work in Python or Ruby
- Enhance self-service capabilities for engineering teams while maintaining strong operational standards
- Participate in on-call rotations, incident response, troubleshooting, and root cause analysis
- Improve observability, automation, capacity planning, and cost/performance optimization
- Contribute to infrastructure migrations and reliability initiatives including Kubernetes adoption
- Work closely with product, data, and platform engineering teams
- Contribute to architectural discussions and help shape platform evolution
- Share knowledge and improve documentation, onboarding, and operational practices
Requirements
- 5+ years of experience in backend engineering, SRE, or distributed systems roles
- Strong programming skills in Scala, Java, or another JVM/statically typed language
- Scripting experience in Python
- Deep understanding of distributed systems including scalability, reliability, concurrency, and performance tuning
- Experience with large-scale data or analytics systems such as query engines, distributed storage, or OLAP databases
- Familiarity with query optimization, high-concurrency workloads, and data-intensive architectures
- Strong experience in Linux/Unix environments and production system operations
- Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes, Mesos)
- Proven ability to build automation, improve reliability, and reduce operational toil
- Experience with CI/CD pipelines, infrastructure-as-code, and configuration management systems
- Strong understanding of observability practices (metrics, logging, tracing) in distributed systems
- Ability to work effectively with cross-functional teams including product engineers and data stakeholders
- Strong communication skills
- Ownership mindset for operating and improving production systems at scale
- Willingness to participate in on-call rotations and incident response
Tech stack
ScalaJavaPythonRubyVerticaPrestoDruidTableauChefDockerMesosKubernetesCI/CDinfrastructure-as-codeconfiguration management
Benefits
Hybrid working model blending home and in-office experiencesLearning, mentorship, and career development programsHealth benefits, wellness perks, and mental health supportDiverse, inclusive, and globally connected teamAttractive salary with performance-based rewards and family-friendly policiesPotential for equity depending on role and level