Full job description
Tatari is hiring a Data Platform Engineer focused on systems and infrastructure to ensure the reliability, stability, and operational health of their data platform. The role requires 3+ years of experience in cloud infrastructure, SRE, or platform engineering with strong operational discipline. Responsibilities include owning platform reliability, enforcing environment promotion discipline, defining SOPs, monitoring platform health, collaborating with cross-functional teams, and supporting stable customer-facing and internal systems. Required skills include high availability architecture, workflow orchestration, Linux and scripting, distributed data processing, containerization, data ingestion and streaming systems, infrastructure-as-code, OLAP/OLTP databases, monitoring and observability tools, network infrastructure, and security management. MLOps experience is a plus. The position is full-time, hybrid with 2 days per week onsite in New York, NY. Compensation ranges from $190,000 to $240,000 plus equity and benefits including health insurance, 401K, education stipend, unlimited PTO, and wellness days.
What you'll do
- Own reliability and availability of data platform infrastructure across all environments
- Enforce and improve environment promotion discipline
- Define and uphold SOPs around deployments, maintenance windows, and change management
- Instrument and monitor platform health using observability tooling and build meaningful alerting
- Participate in architecture and deployment discussions and push back when something isn't ready
- Collaborate with data scientists, engineers, and product managers on infrastructure needs as a partner
- Identify and remediate reliability risks before incidents occur
- Support customer-facing and internal systems prioritizing stability over velocity
Requirements
- Operational instinct and discipline around production environments
- 3+ years in cloud infrastructure, SRE, or platform engineering
- Experience with high availability architecture including blue/green deployments, data replication, load balancing
- Experience with workflow orchestration tools like Airflow or similar
- Strong Linux fundamentals and scripting skills (Bash, Python, or similar)
- Experience with distributed data processing frameworks like Spark or PySpark
- Containerization and orchestration experience (Kubernetes, Docker, or similar)
- Experience with data ingestion, ETL, or streaming systems (Kafka, Flink, or similar)
- Infrastructure-as-code and provisioning experience (Terraform, Helm, or similar)
- Experience with OLAP and OLTP databases (Clickhouse, Postgres, Redshift, or similar)
- Experience with monitoring, logging, and observability tools (Datadog, Prometheus, Kibana, or similar)
- Experience administering and scaling managed data platforms (Databricks or similar)
- Network infrastructure fundamentals knowledge (load balancers, DNS, auto-scaling, multi-region topologies, proxies)
- Security and access management knowledge including least-privilege and secrets management
- MLOps concepts or tooling experience is a plus
- Humility, methodical execution, strong communication, ownership, and independence
Tech stack
AWSGCPAzureAirflowcronLinuxBashPythonSparkPySparkKubernetesDockerKafkaFlinkTerraformHelmClickhousePostgresRedshiftDatadogPrometheusKibanaDatabricksMLOps
Benefits
Total compensation $190,000 - $240,000Equity compensationHealth insurance coverage for employee and dependents401K, FSA, and commuter benefits$150 monthly spending account$1,000 annual continued education benefit$500 Newbie Productivity PerkUnlimited PTO and sick daysMonthly Company Wellness Day OffSnacks, drinks, and catered lunches at the officeTeam building eventsHybrid return-to-office 2 days per week