AdTechTalent
Data ScienceYesterdayHybrid

Merkle

Senior Data Scientist

pythonazurenlpmachine learningdatabricksazure synapseazure mlscikit-learnhugging facesentence-transformersfaisschroma dbazure ai searchlangchaintensorflowpytorchstatsmodelsazure openaisemantic matchingragldaembeddingvector databasesforecastingsemantic searchdata pipelinecloud migration

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Senior

Years experience

5-10

Location

Mumbai, India

Full job description

Senior Data Scientist role requiring expertise in Python, Azure Cloud, and NLP to build and enhance machine learning models at scale. Responsibilities include designing and executing end-to-end ML pipelines, developing models for classification, regression, clustering, semantic matching, embedding optimization, and RAG architectures. Experience with Azure Synapse, Databricks, Snowflake, and vector databases (Chroma DB, FAISS, Azure AI Search) is required. Tasks include migrating data pipelines to Azure Databricks, optimizing embedding storage, and performing vector index tuning. Skills in forecasting models, NLP techniques, and semantic query understanding are essential.

What you'll do

  • Design and execute end-to-end machine learning pipelines
  • Develop machine learning pipelines using Azure Synapse, Databricks, and Snowflake
  • Build and deploy classification, regression, and clustering models
  • Develop and deploy proof-of-concept solutions for client use cases
  • Implement semantic matching and similarity search
  • Build embedding models and optimise embedding storage
  • Train and optimise models for new data providers
  • Improve LDA model performance
  • Implement hybrid semantic search
  • Optimise RAG architectures and retrieval QA systems
  • Enable semantic query understanding
  • Develop forecasting models for marketing and demand prediction
  • Apply NLP-based forecasting techniques
  • Use semantic similarity for audience intelligence
  • Migrate data pipelines and retrain models
  • Optimise embedding storage and retrieval
  • Perform vector index tuning and benchmarking

Requirements

  • Design and execute end-to-end machine learning pipelines including data extraction, preprocessing, feature engineering, model development, tuning, and deployment
  • Develop machine learning pipelines using Azure Synapse, Databricks, and Snowflake
  • Build and deploy classification, regression, and clustering models
  • Develop and deploy proof-of-concept solutions for client use cases
  • Implement semantic matching and similarity search using cosine similarity, dot-product scoring, and bi-encoder/cross-encoder architectures (e.g., SBERT, sentence-transformers)
  • Build embedding models by fine-tuning pre-trained models and optimising embedding storage in vector databases such as Chroma DB, FAISS, and Azure AI Search
  • Train and optimise models for new data providers with dynamic input handling
  • Improve LDA model performance for large-scale topic modelling
  • Implement hybrid semantic search by combining dense and sparse retrieval methods
  • Optimise RAG architectures and retrieval QA systems for chatbot and recommendation performance
  • Enable semantic query understanding using intent classification and query expansion techniques
  • Develop forecasting models for marketing, demand prediction, and trend analysis
  • Apply NLP-based forecasting techniques using sentiment and external data
  • Use semantic similarity for audience intelligence, including zero-shot and few-shot classification techniques
  • Migrate data pipelines from Azure Synapse to Azure Databricks and retrain models accordingly
  • Optimise embedding storage and retrieval within Azure AI Search
  • Perform vector index tuning including HNSW optimisation and ANN benchmarking for production systems

Tech stack

PythonAzure DatabricksAzure MLAzure SynapseAzure Blob StorageScikit-learnNumPyPandasHugging Facesentence-transformersFAISSChroma DBAzure AI SearchLangChainTensorFlowPyTorchStatsmodelsAzure OpenAI

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.