AdTechTalent
Engineering8 days agoOn-site

The Trade Desk

Senior Software Engineer - Observability & IRM

incident managementKubernetesobservabilityloggingalertingdeveloper toolingBackstageGrafanaPrometheusSumo Logicservice catalogautomation

Key details

Salary

$125K – $229K

Employment type

Full-time

Seniority

Mid-level

Years experience

3-5

Location

Bellevue, Washington, United States

Full job description

The Trade Desk is seeking a mid-level full-time engineer to join the Service Excellence team focused on incident response tooling and infrastructure. Responsibilities include building and maintaining incident management tools, automating incident lifecycle processes, evaluating and migrating logging systems, and extending internal developer portals with Kubernetes integrations and SLO tooling. Candidates should have experience with production infrastructure or developer tooling, familiarity with observability concepts (logging, alerting, on-call workflows), strong debugging skills, and clear communication. Preferred skills include experience with Grafana, Prometheus, Sumo Logic, Backstage, OpsLevel, Kubernetes at scale, and HunnyPt. The role is located in Bellevue. Salary range is $124,900 to $228,900 USD. Benefits include comprehensive healthcare, retirement plans, disability coverage, life insurance, tuition reimbursement, parental leave, paid sick and vacation time, paid holidays, stock purchase plan, and performance-based stock grants.

What you'll do

  • Build and maintain incident management tooling
  • Automate incident lifecycle processes including alerting, escalation, incident channels, retrospectives, and SLA tracking
  • Evaluate and migrate logging stack
  • Re-evaluate logging vendor and collection architecture
  • Extend internal developer portal (Backstage/Service catalog) with Kubernetes integrations, maturity models, and SLO adoption tooling
  • Build alert quality tooling to improve signal-to-noise ratio, smarter routing, better grouping, and tighter feedback loops

Requirements

  • Experience building and operating production infrastructure or internal developer tooling
  • Comfort working across the stack including distributed systems, Kubernetes, observability pipelines, and web-based tooling
  • Familiarity with observability concepts: logging, alerting, on-call workflows
  • Strong debugging instincts
  • Clear communication skills

Tech stack

KubernetesGrafanaPrometheusSumo LogicBackstageOpsLevelHunnyPt

Benefits

Comprehensive healthcare (medical, dental, and vision) with premiums paid in full for employees and dependentsRetirement benefits such as a 401k plan and company matchShort and long-term disability coverageBasic life insuranceWell-being benefitsReimbursement for certain tuition expensesParental leaveSick time of 1 hour per 30 hours workedVacation time up to 120 hours in the first year and 160 hours thereafterAround 13 paid holidays per yearEmployee Stock Purchase Plan with discounted stock purchaseEligibility for stock-based compensation grants based on performanceVariable compensation-based incentives and commissions depending on role

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.