Full job description
Senior Software Development Engineer 4, Data Platform Engineering role at InMobi Advertising in Lucknow. Responsibilities include architecting a self-serve Data Platform-as-a-Product integrating OSS tools (Spark, Flink, Airflow, Iceberg), internal services, and cloud offerings. Operate distributed systems processing petabytes of data daily with multi-region Kubernetes infrastructure. Optimize compute utilization for batch and real-time streaming with sub-second latency. Build telemetry and data quality frameworks ensuring 24/7 uptime with automated incident response. Required skills: 7-10 years experience in production data platforms, deep data engineering fundamentals, distributed compute (Spark, Flink), data lake architecture (Iceberg, Polaris), orchestration (Airflow), data transformation (DBT), data quality frameworks, query acceleration, data governance, Kubernetes platform development, cloud infrastructure (GKE, GCS), programming in Python, PySpark, Scala, IaC (Terraform, Helm, GitOps), and CI/CD pipelines. Good-to-have: cloud data platform/control plane development, advanced observability (Prometheus, Grafana, Loki), real-time streaming (Kafka), and cost optimization strategies. Benefits include continuous learning and career progression programs, equal employment opportunity, and accommodations for disabilities.
What you'll do
- Design & Development: Bridge OSS tools (Spark, Flink, Airflow, Iceberg), internal services, and cloud offerings into cohesive data platform infrastructure
- Build intuitive platform integrations enabling push-button data workflows
- Scale Engineering: Operate distributed systems processing petabytes of data daily
- Own multi-region Kubernetes infrastructure with elastic scalability and fault tolerance
- Performance Optimization: Optimize compute utilization (Spark/Flink clusters, Velox/Gluten acceleration) for large-scale batch and real-time streaming with sub-second latency
- Observability & Data Quality: Build comprehensive telemetry (metrics, logs, traces) and data quality frameworks for 24/7 uptime
- Enforce SLAs/SLOs with automated incident response and data validation
Requirements
- 7–10 years building, optimizing, and operating production data platforms
- Deep data engineering fundamentals: data modeling, partitioning strategies, query optimization
- Distributed compute: Spark (PySpark/Scala), Flink streaming, performance tuning at petabyte scale
- Data lake architecture: Iceberg table format, Polaris catalog, schema evolution, time travel
- Orchestration: Airflow DAG development, dependency management, SLA monitoring
- Data transformation: DBT modeling, testing, documentation, incremental builds
- Data quality: Great Expectations, dqueue validation frameworks, drift detection
- Query acceleration: Velox, Gluten integration, columnar formats (Parquet, ORC)
- Data governance: OpenMetadata catalog, lineage tracking, access control
- Kubernetes platform development: operators (Spark/Flink), Yunikorn scheduler, multi-tenancy, autoscaling
- Cloud infrastructure: GKE multi-region clusters, GCS object storage, hybrid cloud/on-prem architecture
- Programming: Python, PySpark, Scala for data pipelines and platform tooling
- IaC: Terraform, Helm, GitOps for reproducible deployments
- CI/CD: Automated testing, deployment pipelines for data platform components
Tech stack
SparkPySparkScalaFlinkAirflowIcebergPolarisDBTGreat ExpectationsdqueueVeloxGlutenParquetORCOpenMetadataKubernetesYunikornGKEGCSPythonTerraformHelmGitOpsCI/CDPrometheusGrafanaLokiFirehydrantKafka
Benefits
Continuous learning and career progression through InMobi Live Your Potential programEqual Employment Opportunity employerReasonable accommodations for qualified individuals with disabilities