AdTechTalent
Engineering15 days agoHybrid

PubMatic

Infrastructure Monitoring Engineer (On Contract)

infra monitoringNOCAI toolsincident managementGrafanaNagiosLinuxnetworkingPythonshell scriptingprompt engineeringproduction supportad servingreal-time systems

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Entry

Years experience

0-2

Location

Pune, Maharashtra, India

Full job description

Infra Monitoring Engineer role in Network Operations Center to ensure high availability, performance, and reliability of critical infrastructure. Responsibilities include monitoring infrastructure and network systems using Grafana, Nagios; handling P1/P2 alerts and incidents; providing Tier-1 production support; collaborating with Engineering, AdOps, and DevOps teams; supporting real-time ad serving systems; leveraging AI tools for log analysis and root cause analysis; documenting incidents and maintaining runbooks; ensuring shift handovers and process adherence; identifying automation opportunities. Requires 1-3 years experience in NOC/Infra Monitoring/Production Support, basic Linux and networking knowledge, familiarity with monitoring and incident management tools, exposure to AI tools, prompt engineering skills, strong troubleshooting and communication skills, and willingness to work 24/7 shifts. Bachelor’s degree in Computer Science/IT or related field required. Hybrid work model (3 days office, 2 days remote). Benefits include parental leave, healthcare insurance, broadband reimbursement, snacks, and catered lunches.

What you'll do

  • Monitor infrastructure, applications, and network systems using tools such as Grafana, Nagios, and internal dashboards
  • Handle alerts and incidents (P1/P2), perform initial triage, and ensure timely escalation and resolution
  • Provide Tier-1 support for production systems and services
  • Collaborate with Engineering, AdOps, and DevOps teams for troubleshooting and issue resolution
  • Support real-time systems involved in ad serving, bidding, and traffic flow
  • Leverage AI tools (e.g., log analysis assistants, alert summarization tools) to speed up debugging and root cause analysis
  • Use structured prompts to extract insights from logs, metrics, and incident data
  • Participate in deployment monitoring and post-release validation
  • Document incidents, contribute to RCA, and maintain operational runbooks/Wiki
  • Ensure effective shift handovers and adherence to NOC processes
  • Identify recurring issues and suggest automation or AI-assisted solutions

Requirements

  • 1–3 years of experience in NOC / Infra Monitoring / Production Support roles
  • Basic understanding of Linux (CLI, processes, memory, disk, networking basics)
  • Familiarity with monitoring tools like Grafana, Nagios, or similar
  • Basic knowledge of networking concepts (TCP/IP, DNS, HTTP/HTTPS)
  • Understanding incident management and alerting tools (Jira, Zenduty, etc.)
  • Exposure to AI tools (ChatGPT or similar) for troubleshooting, documentation, or analysis
  • Basic understanding of prompt engineering
  • Strong analytical and troubleshooting skills
  • Ability to prioritize incidents based on severity and impact
  • Good communication skills and ability to coordinate across teams
  • Willingness to work in 24/7 rotational shifts
  • Bachelor’s degree in Computer Science / IT or related field (B.E / B.Tech / MCA / BCA, etc.)

Tech stack

GrafanaNagiosLinuxTCP/IPDNSHTTP/HTTPSJiraZendutyChatGPTPythonShell

Benefits

Paternity/maternity leaveHealthcare insuranceBroadband reimbursementKitchen with healthy snacks and drinksCatered lunches

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.