AdTechTalent
Data Science6 days agoOn-site

Kargo

Senior Machine Learning Engineer

machine learningmlopsmultimodalLLMVLMpythonpytorchtensorflowraykubeflowmlflowdockerkubernetesdistributed trainingcloudawsgcpazureterraformsqlcreative scoringad performance

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Senior

Years experience

5-10

Location

Waterford, Ireland

Full job description

Lead the design and deployment of multimodal ML models for Kargo's creative scoring system Finetouch. Deliver improved predictive accuracy and multimodal signal coverage. Build production-grade MLOps pipelines using MLflow, Kubeflow, and Ray Train. Scale distributed training and inference with Ray and PyTorch Distributed. Develop APIs and model endpoints for real-time scoring integration. Implement real-time monitoring, drift detection, and alerting for model reliability. Requires 5+ years ML engineering or MLOps experience with LLMs, VLMs, or multimodal architectures, expertise in Python, PyTorch/TensorFlow, distributed training frameworks, MLOps tooling, cloud ML deployment on AWS/GCP/Azure, Docker, Kubernetes, CI/CD, and strong SQL skills. Preferred experience with vector databases, embedding pipelines, and creative scoring or ad performance prediction.

What you'll do

  • Lead design and production deployment of multimodal ML models for creative scoring system Finetouch
  • Deliver improved creative scoring models with better predictive accuracy and expanded multimodal signal coverage
  • Establish end-to-end MLOps pipelines for training, fine-tuning, deployment, and monitoring using MLflow, Kubeflow, Ray Train
  • Scale distributed training and inference to reduce training time and inference cost using Ray, PyTorch Distributed, and cloud infrastructure
  • Build and operate APIs, embedding services, and model endpoints for real-time scoring consumption with SLAs
  • Deploy real-time monitoring, drift detection, and alerting for model reliability with runbooks and on-call ownership

Requirements

  • 5+ years in ML engineering or MLOps with production systems involving LLMs, VLMs, or multimodal architectures
  • Expert in Python and PyTorch or TensorFlow
  • Experience with distributed training frameworks such as Ray, PyTorch Lightning, Horovod
  • Hands-on experience with MLOps tooling like MLflow, Weights & Biases, Kubeflow, Argo, or Airflow
  • Cloud-native ML deployment experience on AWS SageMaker, GCP Vertex AI, or Azure ML with infrastructure-as-code tools like Terraform and Helm
  • Production fluency with Docker, Kubernetes, and CI/CD patterns for ML
  • Strong SQL, data pipeline, and feature store design skills
  • Preferred: experience with vector databases, embedding pipelines, and real-time retrieval systems
  • Preferred: background in creative scoring, aesthetic modeling, or ad performance prediction

Tech stack

PythonPyTorchTensorFlowRayPyTorch LightningHorovodMLflowWeights & BiasesKubeflowArgoAirflowAWS SageMakerGCP Vertex AIAzure MLTerraformHelmDockerKubernetesSQL

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.