Sr. Incident Manager

incident managementSREDevOpstechnical operationsAWSGCPDatadogGrafanaPagerDutydistributed systemscloudAdTechautomationAIITILSLOSLI

Key details

Salary

$131K – $260K

Employment type

Full-time

Seniority

Senior

Years experience

5-10

Location

New York, US

Full job description

DoubleVerify seeks a Senior Incident Manager to lead Major Incident Management, overseeing critical incident lifecycle from detection to post-incident review. Requires 7+ years in SRE, DevOps, Technical Operations, or Incident Management with experience leading Sev1/Sev2 incidents in high-availability environments. Must have strong communication skills, technical understanding of distributed systems and cloud (AWS or GCP), and experience with monitoring tools like Datadog, Grafana, and PagerDuty. Responsibilities include leading incident response, coordinating cross-functional teams, managing communications, improving incident processes, and tracking metrics. Nice to have experience in AdTech, familiarity with SLOs/SLIs, automation, AI-driven tooling, and ITIL certification. Role is full-time, on-site at NYC HQ. Salary range $131,000 - $260,000 plus bonus, equity, and benefits.

What you'll do

Lead Sev1–Sev3 incidents as the single point of accountability
Drive real-time decision-making, escalation, and coordination across teams
Run incident communications, including updates to executives and stakeholders
Translate technical issues into clear business impact
Own and improve the Major Incident Management process
Lead post-incident reviews and ensure follow-through on actions
Track key metrics (e.g., MTTR, incident trends) and drive improvements
Coordinate with Product, Commercial, and Legal on client communications when needed
Align incident response with business priorities, including customer and revenue impact
Improve tooling, automation, and workflows for incident response

Requirements

7+ years in SRE, DevOps, Technical Operations, or Incident Management
Experience leading Sev1/Sev2 incidents in high-availability environments
Proven ability to coordinate cross-functional teams during critical outages
Solid understanding of distributed systems and cloud environments (AWS or GCP)
Experience with monitoring and incident tools (e.g., Datadog, Grafana, PagerDuty)
Comfortable working with logs, alerts, and system diagnostics
Strong communicator, including with executive stakeholders
Ability to translate technical issues into business impact
Comfortable driving decisions and alignment under pressure
Calm, decisive, and execution-focused mindset
Able to push for action and maintain momentum
Experience improving processes and operational maturity
Nice to have: Experience in AdTech, digital media, or similar environments
Nice to have: Familiarity with SLOs/SLIs and reliability frameworks
Nice to have: Experience with automation or AI-driven incident tooling
Nice to have: ITIL or similar certification

Tech stack

AWSGCPDatadogGrafanaPagerDutySLOsSLIsITIL

Benefits

BonusEquityBenefits

Apply now

Ready to take the next step in your career? Click the button below to continue to the application process.

Continue to application Browse more jobs

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.

TripleLift

Accountant

Detroit, United States; New York, US•2 months ago

$75K – $95K

accountingpayrollcompensation

View job details→

TripleLift

Data Scientist

New York, US•2 months ago

$90K – $120K

data sciencemachine learningpython

View job details→

TripleLift

Director of Sales - US, West

Los Angeles, United States•2 months ago

$290K – $350K

sales leadershipprogrammaticCTV

View job details→