AdTechTalent
Engineering41 days agoOn-site

Microsoft

Principal Software Engineer

GPU inference optimizationLLMSLMdeep learningmachine learningCUDATensorRTTritonprofiling toolsmodel compressionhigh-throughput inferenceMicrosoft DLISAzureH100A100digital advertisingAIsoftware engineering

Key details

Salary

Not specified

Employment type

Full-time

Seniority

Senior

Years experience

10+

Location

Redmond, Washington, United States

Full job description

Monetization Engineering at Microsoft is building a unified monetization platform for AI-native surfaces including Copilot, Search, MSN, and Shopping. The role focuses on GPU inference optimization and deep learning for large language models (LLM/SLM) to drive Microsoft's advertising and monetization platforms. Responsibilities include accelerating large-scale deep learning inference, bridging GPU and deep learning technologies with business applications, and supporting online/offline applications. Required qualifications include a Bachelor's degree in Computer Science or related field with 8+ years of engineering experience coding in C, C++, C#, Java, JavaScript, or Python, and passing Microsoft Cloud background checks. Preferred qualifications include a Master's degree with 12+ years experience or Bachelor's with 15+ years, expertise in GPU inference optimization (CUDA, TensorRT, Triton), profiling tools, deep understanding of LLM/SLM architectures, experience with latency-critical services, model compression, and high-throughput inference serving stacks. Familiarity with Microsoft DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure GPU environments is required. Salary range is $163,000 - $296,400 annually, with higher ranges for San Francisco Bay area and New York City. The position is full-time, on-site in Redmond, Washington.

What you'll do

  • Serve as the technological core of Microsoft's rapidly expanding digital advertising business
  • Accelerate Microsoft’s large-scale deep learning inference for Ads, Shopping, Copilot, and other surfaces, including offline and online applications supporting OpenAI LLM models and next-generation LLMs/SLMs
  • Bridge state-of-the-art GPU and deep learning technologies with critical business applications

Requirements

  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience
  • Ability to meet Microsoft, customer and/or government security screening requirements including Microsoft Cloud background check upon hire/transfer and every two years thereafter
  • Preferred: Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience OR Bachelor's Degree AND 15+ years experience OR equivalent experience
  • Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels)
  • Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks
  • Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders)
  • Experience optimizing latency-critical online services
  • Experience with model compression (quantization, distillation, SVD, low-rank methods)
  • Experience in building high-throughput inference serving stacks (continuous batching, KV-cache optimizations, routing)
  • Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT-LLM stack, and Azure/H100/A100 GPU environments
  • Publications, competition wins, or real-world deployments related to model efficiency

Tech stack

CC++C#JavaJavaScriptPythonCUDATensorRTTritonNsightTensorBoardPyTorch profilerMicrosoft DLISTalon routingTriton/TensorRT-LLM stackAzureH100 GPUA100 GPU

Benefits

Certain roles may be eligible for benefits and other compensation (details at https://careers.microsoft.com/us/en/us-corporate-pay)

Apply now

This MVP uses a placeholder application flow. In production, this section can connect to an external apply URL or a native application form.

Similar jobs

More roles worth a look

Related opportunities based on specialty and working model so candidates can keep momentum.