Engineering3 months agoOn-site

Microsoft

Principal Software Engineer

GPU inference optimizationLLMSLMdeep learningmachine learningCUDATensorRTTritonprofiling toolsmodel compressionhigh-throughput inferenceMicrosoft DLISAzureH100A100digital advertisingAIsoftware engineering

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Senior

Years experience

10+

Location

Redmond, United States

Full job description

Monetization Engineering at Microsoft is building a unified monetization platform for AI-native surfaces including Copilot, Search, MSN, and Shopping. The role focuses on GPU inference optimization and deep learning for large language models (LLM/SLM) to drive Microsoft's advertising and monetization platforms. Responsibilities include accelerating large-scale deep learning inference, bridging GPU and deep learning technologies with business applications, and supporting online/offline applications. Required qualifications include a Bachelor's degree in Computer Science or related field with 8+ years of engineering experience coding in C, C++, C#, Java, JavaScript, or Python, and passing Microsoft Cloud background checks. Preferred qualifications include a Master's degree with 12+ years experience or Bachelor's with 15+ years, expertise in GPU inference optimization (CUDA, TensorRT, Triton), profiling tools, deep understanding of LLM/SLM architectures, experience with latency-critical services, model compression, and high-throughput inference serving stacks. Familiarity with Microsoft DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure GPU environments is required. Salary range is $163,000 - $296,400 annually, with higher ranges for San Francisco Bay area and New York City. The position is full-time, on-site in Redmond, Washington.

What you'll do

Serve as the technological core of Microsoft's rapidly expanding digital advertising business
Accelerate Microsoft’s large-scale deep learning inference for Ads, Shopping, Copilot, and other surfaces, including offline and online applications supporting OpenAI LLM models and next-generation LLMs/SLMs
Bridge state-of-the-art GPU and deep learning technologies with critical business applications

Requirements

Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience
Ability to meet Microsoft, customer and/or government security screening requirements including Microsoft Cloud background check upon hire/transfer and every two years thereafter
Preferred: Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience OR Bachelor's Degree AND 15+ years experience OR equivalent experience
Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels)
Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks
Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders)
Experience optimizing latency-critical online services
Experience with model compression (quantization, distillation, SVD, low-rank methods)
Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing)
Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure/H100/A100 GPU environments
Publications, competition wins, or real-world deployments related to model efficiency

Tech stack

CC++C#JavaJavaScriptPythonCUDATensorRTTritonNsightTensorBoardPyTorch profilerMicrosoft DLISTalon routingTriton/TensorRT-LLM stackAzureH100 GPUA100 GPU

Benefits

Certain roles may be eligible for benefits and other compensation (details at https://careers.microsoft.com/us/en/us-corporate-pay)

Apply now

Ready to take the next step in your career? Click the button below to continue to the application process.

Continue to application Browse more jobs

Company

Microsoft

Every company has a mission. What's ours? To empower every person and every organization to achieve more. We believe technology can and should be a force for good and that meaningful innovation contributes to a brighter world in the future and today. Our culture doesn’t just encourage curiosity; it embraces it. Each day we make progress together by showing up as our authentic selves. We show up with a learn-it-all mentality. We show up cheering on others, knowing their success doesn't diminish our own. We show up every day open to learning our own biases, changing our behavior, and inviting in differences. Because impact matters. Microsoft operates in 190 countries and is made up of approximately 228,000 passionate employees worldwide.

Industry

Software Development

Company size

10001+

Website

https://news.microsoft.com/

Posted

3 months ago

Category: Engineering

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.

TripleLift

Data Scientist

New York, US•2 months ago

$90K – $120K

data sciencemachine learningpython

View job details→

TripleLift

Director of Sales - US, West

Los Angeles, United States•2 months ago

$290K – $350K

sales leadershipprogrammaticCTV

View job details→

TripleLift

Director, Product Management

New York, US•2 months ago

$200K – $250K

product managementCTVprogrammatic

View job details→