AdTechTalent
Engineering12 days agoOn-site

The Trade Desk

Lead Staff Systems Reliability Engineer (Linux & Distributed Systems)

leadershipsystems reliabilitylinuxaerospikemongodbkafkaautomationnosqlperformance tuninghardwareinfrastructureclouddatacenter

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Lead

Years experience

5-10

Location

London, England, United Kingdom

Full job description

Lead Systems Reliability Engineer role focused on building and maintaining a data-driven platform using Aerospike, MongoDB, and Kafka with sub-millisecond latency. Responsibilities include leading a team managing large-scale systems and data structures, improving infrastructure automation, operating Linux-based systems, participating in on-call rotations, and benchmarking new hardware. Requires Linux experience, leadership skills, troubleshooting abilities, and familiarity with databases and automation tools. Training provided for NoSQL expertise. Located in London.

What you'll do

  • Lead a team to influence, manage, and plan work streams, systems, and data structures at scale within a global ecosystem
  • Encourage, improve, and build infrastructure automation for stateful systems at scale
  • Own operations for Linux-based systems running Aerospike, Kafka, and MongoDB
  • Serve as a point of contact to review new use cases, answer questions, and participate in on-call rotation
  • Learn to be a NoSQL subject matter expert (training provided)
  • Benchmark and analyze next generation hardware offerings

Requirements

  • Experience with Linux operating system
  • Leadership experience and ability to mentor
  • Troubleshooting techniques including isolation and scientific method
  • Ability to identify bottlenecks (CPU, IO)
  • Nice-to-have: experience with physical hardware internals, management, and operation
  • Nice-to-have: performing testing and tuning
  • Nice-to-have: experience with databases (relational or NoSQL)
  • Nice-to-have: experience with Ansible, PyInfra, or Chef
  • Nice-to-have: experience with Prometheus
  • Nice-to-have: experience with Kubernetes
  • Nice-to-have: programming or scripting skills in Python, Ruby, Rust, Bash, Golang, or C#

Tech stack

AerospikeMongoDBKafkaLinuxAnsiblePyInfraChefPrometheusKubernetesPythonRubyRustBashGolangC#

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.