AdTechTalent
Data Science78 days agoOn-site

Samba TV

Data Scientist (Knowledge Graph & Identity)

pythonpysparkdatabricksdelta lakesqlawsgcpairflowmlopsdataopsmachine learningknowledge graphidentityentity resolutionprobabilistic record linkageembeddingsemantic similarityllmragvector databasescausal inferencea/b testingsynthetic controluplift modelingmediaad techmeasurementaudience modeling

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Mid-level

Years experience

3-5

Location

Warsaw, Poland

Full job description

Mid-level Data Scientist role on the Knowledge Graph & Identity team in Warsaw. Responsible for end-to-end delivery of data science projects with minimal guidance, focusing on knowledge graphs, identity spine, measurement, or audience modeling. Requires expertise in modern ML and AI methodologies and ability to build production-ready solutions. Collaborate with product and engineering teams and mentor junior data scientists. Qualifications include Bachelor's degree (Master's preferred) in a quantitative field, 3-5 years of data science experience, advanced Python, SQL, PySpark skills, experience with Databricks, Delta Lake, cloud platforms (AWS or GCP), core ML techniques, MLOps practices, and exposure to modern AI methods. Preferred skills include knowledge graph construction, entity resolution, semantic data modeling, probabilistic record linkage, causal inference, and media/ad tech experience.

What you'll do

  • Own end-to-end delivery of significant data science projects from problem scoping to production deployment focusing on knowledge graph and identity solutions
  • Make independently-reasoned decisions on methodology, model selection, and evaluation; document technical solutions clearly
  • Lead solution design; break down complex epics into well-scoped user stories with acceptance criteria; adopt DataOps and MLOps best practices
  • Build production-quality Python and PySpark code on Databricks; implement advanced ML and AI workflows including entity resolution, probabilistic record linkage, embedding-based matching, semantic similarity, and LLM-augmented pipelines
  • Develop and maintain reusable tools, libraries, and documentation to improve team efficiency and standards; conduct code reviews
  • Mentor junior data scientists on technical execution, code quality, and career development; lead internal talks or workshops
  • Collaborate cross-functionally with product, engineering, and operations; translate business requirements into technical specifications; partner with data engineering on scalable pipeline design; participate in design reviews and working groups

Requirements

  • Bachelor's degree in Statistics, Data Science, Computer Science, Mathematics or related quantitative field; Master's preferred
  • 3–5 years of hands-on data science experience with ability to deliver complex projects independently
  • Advanced Python with production-quality code, testing, and documentation
  • Strong SQL and PySpark skills for billion-row datasets
  • Experience with Databricks workflows, Delta Lake, and job orchestration
  • Working knowledge of cloud platforms (AWS or GCP)
  • Solid command of core ML techniques: regression, classification, clustering, model evaluation, experimental design
  • Proficiency with MLOps practices: experiment tracking, pipeline orchestration (Airflow), reproducible model deployment
  • Exposure to modern AI methodologies: RAG systems, LLM-augmented models, vector databases, semantic search
  • Strong communication skills for documentation and cross-functional collaboration
  • Ability to mentor junior data scientists and contribute to team standards

Tech stack

PythonPySparkDatabricksDelta LakeSQLAWSGCPAirflowRDFOWLSPARQLLLMMLOpsDataOps

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.