Full job description
The Product team at Epsilon is seeking a senior full-time engineer to build AI-augmented program management tools used by 300+ engineers and leadership. Responsibilities include designing and building agentic AI workflows, RAG pipelines over enterprise knowledge bases, integrating LLM APIs with cost and rate-limit controls, developing multi-modal agent tool libraries, full-stack dashboard development with real-time AI insights, containerizing and deploying services on AWS/Azure, and collaborating with program managers and product owners to translate delivery challenges into technical solutions. Required skills include 2+ years building LLM-powered production applications, proficiency with agent frameworks (LangChain, LangGraph, etc.), practical RAG experience, backend development with Python or Java, REST API design, SQL, frontend React and TypeScript, cloud and DevOps experience with AWS or Azure, Docker, CI/CD, and monitoring tools. Good to have experience with JIRA, Confluence, GitHub/Bitbucket APIs, developer productivity tooling, LLM evaluation frameworks, and Agile delivery metrics. The role is based in Bengaluru, Karnataka, India.
What you'll do
- Design and build production-grade agentic SDLC workflows from prompt design through tool integration to autonomous execution loops
- Implement RAG pipelines over enterprise knowledge bases (JIRA, Confluence, internal wikis)
- Integrate LLM APIs with proper rate-limit handling, cost governance, and fallback routing
- Build and maintain multi-modal agent tool libraries for JIRA, Confluence, GitHub/Bitbucket, and calendar-aware sprint agents
- Develop front-ends for delivery dashboards with real-time data, drill-down charts and actionable AI insights
- Build backends that aggregate data from internal planning tools and serve them to front-end and agent layers
- Design and maintain database schemas for program metrics, sprint snapshots, and model response logs
- Containerize and deploy services on AWS or Azure using Docker and CI/CD pipelines
- Instrument LLM observability metrics such as latency, token spend, hallucination flags, and user feedback loops
- Maintain version control and A/B testing to iterate on model behavior without redeployment
- Partner with Program Managers, Product Owners, Scrum Masters to translate delivery challenges into agent-addressable requirements
- Document agent designs, API contracts, and RAG architecture decisions in Confluence
- Contribute to Agile COE metrics strategy and ensure data pipelines feeding dashboards are audit-ready
Requirements
- 2+ years hands-on building LLM-powered applications in production
- Proficiency with at least one agent framework: LangChain, LangGraph, AutoGen, CrewAI, or Semantic Kernel
- Practical RAG experience: chunking, embedding, vector stores and retrieval evaluation
- Experience integrating OpenAI, Anthropic, or Azure OpenAI APIs with tool/function calling, structured outputs, streaming
- Design prompt templates, system prompts, and chain-of-thought scaffolding for reliability at scale
- 6-8+ years with Python (preferred) or Java for API and service development
- REST API design and implementation using FastAPI, Flask, Spring Boot, or Node.js/Express
- SQL fluency: query optimisation, schema design, and data modelling across PostgreSQL, MySQL, or Snowflake
- Understanding of async patterns, message queues (Kafka, SQS, RabbitMQ), and event-driven architectures
- Frontend React + TypeScript with hooks, context, and component composition patterns
- Data visualisation with Recharts, Chart.js, or D3 for operational dashboards
- Experience consuming streaming REST or WebSocket APIs for real-time UI updates
- AWS or Azure experience: compute, storage, managed databases
- Docker, container orchestration basics, and CI/CD pipeline ownership
- Monitoring and alerting with CloudWatch, Datadog, or equivalent
Tech stack
PythonJavaFastAPIFlaskSpring BootNode.jsExpressSQLPostgreSQLMySQLSnowflakeReactTypeScriptRechartsChart.jsD3AWSAzureDockerGitHub ActionsJenkinsKafkaSQSRabbitMQOpenAI APIAnthropic APIAzure OpenAI APILangChainLangGraphAutoGenCrewAISemantic KernelCursor AIGitHub CopilotJIRA APIConfluence REST APIGitHub APIBitbucket API
Benefits
Employee well-being focusCollaborative work environmentOpportunities for growth through learning, development and career advancementInnovation-driven cultureWork-life balance and flexibilityDiversity, inclusion, and equal employment opportunities