AdTechTalent
Engineering140 days agoRemote

Attain

Sr/Staff Site Reliability Engineer, Consumer Apps

terraformgitlabhelmkubernetesistiogcpbigqueryspannerprometheusgrafanallmawsdockerkafkaamazon kinesissnsgoogle pubsubaws lambdagoogle cloud functionsgoogle cloud rundatadogsite reliability engineeringsreinfrastructure as codeautomationcloud-nativesoc2

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Senior

Years experience

5-10

Location

Chicago, United States; Redwood City, United States

Full job description

Attain is hiring a Senior/Staff Site Reliability Engineer to build and maintain infrastructure and supporting tools for Klover's fintech platform. The role involves working with Terraform, Helm, Kubernetes, Istio, GCP BigQuery and Spanner, Prometheus, Grafana, and leveraging LLM models. Responsibilities include deploying infrastructure, monitoring databases, automating systems, and participating in architecture and capacity planning. Candidates should have 6+ years of experience with cloud-native infrastructure (AWS/GCP), containerization, SQL databases, stream and pub/sub technologies, serverless computing, infrastructure-as-code, observability tools, and SOC2 compliance. The position is full-time with a hybrid schedule in Chicago, IL and Redwood City, CA.

What you'll do

  • Write Terraform modules for deploying infrastructure resources via GitLab pipelines
  • Develop Helm charts for deploying services and jobs in Kubernetes cluster
  • Define metrics, network policies, and routing rules for Istio service mesh
  • Monitor and maintain GCP BigQuery and Spanner databases
  • Pipe metrics to Google-managed Prometheus instance and build Grafana dashboards and alerts
  • Experiment with GCP offerings, 3rd party vendors, and open-source tools to automate and secure operations
  • Leverage latest LLM models in developing infrastructure and tooling
  • Pair with engineering leads to instrument and monitor critical functionality
  • Add automation to existing and new systems to reduce manual processes
  • Participate in architecture design and capacity planning to ensure scalability, maintainability, reliability, and security
  • Build, maintain, and improve CI/CD pipeline

Requirements

  • 6+ years of experience building and maintaining large-scale cloud-native infrastructure (AWS and/or GCP)
  • Experience with containerization technologies Docker, Kubernetes, and Istio or similar service mesh technology
  • Experience with SQL database technologies such as MySQL, Google BigQuery, and Google Spanner
  • Experience with stream technologies such as Kafka and Amazon Kinesis
  • Experience with pub sub technologies such as AWS SNS and Google Pub/Sub
  • Experience with serverless computing technologies such as AWS Lambda and Google Cloud Functions/Google Cloud Run
  • Experience with infrastructure-as-code tools such as Terraform
  • Experience with observability tools such as Datadog, Prometheus, and Grafana
  • Strong computer science and software engineering fundamentals
  • Experience with SOC2 Compliance processes and requirements
  • Comfortable wearing many hats
  • Willingness to learn and teach in a fast-paced, collaborative environment
  • Strong desire to automate processes
  • Ability to provide and seek constructive feedback
  • Interest in experimenting with and stress testing new technologies

Tech stack

TerraformGitLabHelmKubernetesIstioGCP BigQueryGoogle SpannerPrometheusGrafanaLLM modelsAWSDockerKafkaAmazon KinesisAWS SNSGoogle Pub/SubAWS LambdaGoogle Cloud FunctionsGoogle Cloud RunDatadog

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.