AdTechTalent
Other6 days agoOn-site

DoubleVerify

Sr. Incident Manager

incident managementSREDevOpstechnical operationsAWSGCPDatadogGrafanaPagerDutydistributed systemscloudAdTechautomationAIITILSLOSLI

Key details

Salary

$131K – $260K

Employment type

Full-time

Seniority

Senior

Years experience

5-10

Location

New York City, New York, United States

Full job description

DoubleVerify seeks a Senior Incident Manager to lead Major Incident Management, overseeing critical incident lifecycle from detection to post-incident review. Requires 7+ years in SRE, DevOps, Technical Operations, or Incident Management with experience leading Sev1/Sev2 incidents in high-availability environments. Must have strong communication skills, technical understanding of distributed systems and cloud (AWS or GCP), and experience with monitoring tools like Datadog, Grafana, and PagerDuty. Responsibilities include leading incident response, coordinating cross-functional teams, managing communications, improving incident processes, and tracking metrics. Nice to have experience in AdTech, familiarity with SLOs/SLIs, automation, AI-driven tooling, and ITIL certification. Role is full-time, on-site at NYC HQ. Salary range $131,000 - $260,000 plus bonus, equity, and benefits.

What you'll do

  • Lead Sev1–Sev3 incidents as the single point of accountability
  • Drive real-time decision-making, escalation, and coordination across teams
  • Run incident communications, including updates to executives and stakeholders
  • Translate technical issues into clear business impact
  • Own and improve the Major Incident Management process
  • Lead post-incident reviews and ensure follow-through on actions
  • Track key metrics (e.g., MTTR, incident trends) and drive improvements
  • Coordinate with Product, Commercial, and Legal on client communications when needed
  • Align incident response with business priorities, including customer and revenue impact
  • Improve tooling, automation, and workflows for incident response

Requirements

  • 7+ years in SRE, DevOps, Technical Operations, or Incident Management
  • Experience leading Sev1/Sev2 incidents in high-availability environments
  • Proven ability to coordinate cross-functional teams during critical outages
  • Solid understanding of distributed systems and cloud environments (AWS or GCP)
  • Experience with monitoring and incident tools (e.g., Datadog, Grafana, PagerDuty)
  • Comfortable working with logs, alerts, and system diagnostics
  • Strong communicator, including with executive stakeholders
  • Ability to translate technical issues into business impact
  • Comfortable driving decisions and alignment under pressure
  • Calm, decisive, and execution-focused mindset
  • Able to push for action and maintain momentum
  • Experience improving processes and operational maturity
  • Nice to have: Experience in AdTech, digital media, or similar environments
  • Nice to have: Familiarity with SLOs/SLIs and reliability frameworks
  • Nice to have: Experience with automation or AI-driven incident tooling
  • Nice to have: ITIL or similar certification

Tech stack

AWSGCPDatadogGrafanaPagerDutySLOsSLIsITIL

Benefits

BonusEquityBenefits

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.