Full job description
Moloco is seeking a Machine Learning Engineer focused on ML infrastructure to develop and optimize scalable machine learning systems supporting large-scale ad serving and model training. Responsibilities include building and maintaining ML infrastructure, optimizing data pipelines, collaborating with cross-functional teams, implementing CI/CD best practices, operating high-performance ML systems using JAX and Rust on GPUs and TPUs, monitoring system performance, integrating new tools, automating workflows with AI agents, designing experiments to improve ad quality, and documenting architecture and processes. Candidates should have 4+ years of experience in ML engineering or related fields, proficiency in Python, Java, C++, or Rust, experience with ML frameworks (TensorFlow, PyTorch, Keras, JAX), cloud platforms (AWS, GCP, Azure), containerization tools (Docker, Kubernetes), and scalable data pipeline tools (Apache Beam, Spark, Airflow). Strong problem-solving, collaboration skills, and ability to work in ambiguous environments are required. The role is full-time based in Seoul, Korea.
What you'll do
- Design, build, and maintain robust machine learning infrastructure to support large-scale ad serving and model training globally
- Develop and optimize data pipelines and workflows for efficient model deployment and monitoring
- Collaborate with cross-functional teams including data scientists, product managers, and software engineers to deliver end-to-end ML solutions
- Implement best practices for model versioning, reproducibility, and CI/CD in ML systems
- Build and operate high-performance ML systems using frameworks and languages such as JAX and Rust, optimized for GPUs and TPUs
- Monitor, troubleshoot, and continuously improve the reliability, scalability, and performance of ML systems delivering millions of predictions per second worldwide
- Evaluate and integrate new tools, frameworks, and technologies to enhance the ML platform’s capabilities
- Integrate AI-driven agents into the core engineering and modeling lifecycle to automate and amplify the team's impact
- Contribute to the design and execution of experiments to improve ad quality and system performance
- Document system architecture, processes, and best practices to ensure knowledge sharing and maintainability
Requirements
- 4+ years of experience in machine learning engineering, ML infrastructure, or a related field
- Proficiency in programming languages such as Python, Java, C++, or Rust
- Hands-on experience with ML frameworks and libraries (e.g., TensorFlow, PyTorch, Keras, Jax)
- Experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization/orchestration tools (e.g., Docker, Kubernetes)
- Experience building and maintaining scalable data pipelines using tools such as Apache Beam, Apache Spark, Airflow
- Ability to thrive in ambiguous environments and proactively solve complex infrastructure challenges
- Strong collaboration skills with cross-functional teams
- Strong problem-solving skills and a growth mindset
Tech stack
PythonJavaC++RustTensorFlowPyTorchKerasJAXAWSGCPAzureDockerKubernetesApache BeamApache SparkAirflow
Benefits
Innovative benefits that empower employees to take care of themselves and their familiesInclusive work environmentOpportunities for growth and learning