Praca w Polska Praca w Pomorzu Województwie

Cataloxy Gdańsk...Praca w GdańskSenior AI Engineer - Inference Systems Optimization

Oferta pracy Senior AI Engineer - Inference Systems Optimization, Gdańsk

ID: 1066132549 1

Senior AI Engineer - Inference Systems Optimization, Gdańsk

Wynagrodzenie: do uzgodnienia

Podsumowanie

Stanowisko: Senior AI Engineer - Inference Systems Optimization

Opublikowane: 30.07.2025. Aktualne do: 13.08.2025

Branże: IT

Płeć: dowolna

Firma: Accelerated.Finance

Oferta pracy od partnera

Napisz

Opis oferty pracy

The Role

We are seeking a senior AI engineer passionate about performance optimization to join our team and revolutionize our inference systems. You will be at the heart of technical innovation, transforming the latest research advances into ultra-high-performance production solutions.

Core Mission

Your mission will be to radically optimize the performance of our systems by leveraging and improving current leading inference technologies. You will work on cutting-edge technical challenges, reducing latency and maximizing throughput to serve millions of users.

Key Responsibilities

Analysis and Benchmarking

Analyze performance bottlenecks (CPU scheduling, GPU memory, kernel efficiency) - Establish detailed performance metrics (TTFT, tokens/s, P99 latency, GPU utilization) - Design and implement comprehensive benchmarks comparing SOTA inference solutions performance
Document trade-offs between different optimization approaches

Inference Systems Optimization

Optimize current inference engines/frameworks performance for our specific use cases - Implement advanced techniques: RadixAttention, PagedAttention, continuous batching - Develop optimized CUDA kernels (Triton, FlashInfer integration)
Integrate torch.compile and CUDA graphs to maximize performance
Optimize KV cache management and batching strategies

Research Paper Implementation

Transform the latest academic innovations into production code
Implement optimization techniques like MLA (Multi-head Latent Attention) - Adapt MoE (Mixture of Experts) architectures for efficient inference
Integrate model-specific optimizations (DeepSeek, Llama, etc.)

Infrastructure and Scalability

Architect distributed multi-GPU solutions with tensor parallelism
Optimize GPU fleet utilization (H100, H200, ...)
Implement advanced monitoring and profiling systems
Develop debugging tools to identify performance issues

Desired Profile

Essential Technical Skills

Deep expertise in PyTorch and NVIDIA ecosystem (CUDA, NCCL, cuDNN)
Mastery of inference frameworks: SGLang, vLLM, Dynamo, or equivalents
Solid experience (5+ years) in ML systems optimization in production
Practical knowledge of Transformer architectures and attention techniques
Skills in GPU programming (CUDA, Triton) and kernel optimization

Advanced Technical Skills (Strong Plus)

Experience with inference optimization techniques:
Quantization (INT8, INT4, FP8)
KV cache optimization (MQA, GQA, MLA)
Speculative decoding, multi-token prediction
Structured generation and constrained decoding
Knowledge of frameworks: FlashAttention, FlashInfer, xFormers - Experience with high-performance distributed systems
Contributions to open-source ML inference projects

Personal Qualities

Passion for optimization and performance
Ability to read and implement complex research papers
Excellent analytical and problem-solving skills
Autonomy and ability to work on unstructured problems
Clear communication of technical results

What We Offer

Technical Impact

Work on systems serving billions of tokens per day
Access to latest GPUs (H100, H200) and compute resources - Direct collaboration with research teams
Open-source contributions and technical publications

Work Environment

Team of experts passionate about performance
Culture of innovation and technical experimentation
Flexibility and autonomy in technical approaches
Continuous training on latest advances

Compensation Package

- Competitive salary aligned with expertise

- Significant equity

- Conference and training budget

- Cutting-edge hardware

Key Technologies

Inference frameworks: SGLang, vLLM, TensorRT-LLM

GPU optimization: CUDA, Triton, FlashInfer, FlashAttention

Deep Learning: PyTorch, torch.compile

Architectures: Transformers, MoE, MLA, attention variants

Infrastructure: Multi-GPU, tensor parallelism, distributed systems

How to Apply

Send your resume along with examples of optimization projects you have completed. We particularly value:

Open-source contributions to inference projects
Benchmarks or performance analyses you have conducted
Implementations of innovative optimization techniques

Stały link do tej strony:

Podobne oferty pracy Gdańsk w IT

LIMS Expert. Remote Dodaj do ulubionych
	30 lip., 22:47 - 13.08.2025 wynagrodzenie do uzgodnienia

Solution Architect (Digital Hangar) Dodaj do ulubionych
	30 lip., 14:47 - 14.08.2025 Pensja: 180000zł - 300000zł

Quant Developer Dodaj do ulubionych
	30 lip., 14:47 - 14.08.2025 wynagrodzenie do uzgodnienia

Senior Java Developer. Remote Dodaj do ulubionych
	30 lip., 14:47 - 17.08.2025 wynagrodzenie do uzgodnienia

OMP Supply Chain Planning Solution Consultant. Remote Dodaj do ulubionych
	30 lip., 14:47 - 14.08.2025 wynagrodzenie do uzgodnienia

Software Engineer (Python–GenAI). Remote Dodaj do ulubionych
	30 lip., 14:47 - 18.08.2025 wynagrodzenie do uzgodnienia

Konsultant ds. wdrożeń ERP – Enova365 (Kadry i Płace) Dodaj do ulubionych
	30 lip., 13:41 - 13.08.2025 wynagrodzenie do uzgodnienia

Mobile Device Management Knowledge. Remote Dodaj do ulubionych
	30 lip., 02:45 - 13.08.2025 wynagrodzenie do uzgodnienia

Tester Wydajnościowy Dodaj do ulubionych
	29 lip., 22:47 - 12.08.2025 Pensja: 201600zł - 262080zł

Analityk IT familiar with Guidewire Dodaj do ulubionych
	29 lip., 22:47 - 13.08.2025 Pensja: 161280zł - 282240zł

UX UI Designer Dodaj do ulubionych
	29 lip., 22:47 - 13.08.2025 Pensja: 42000zł - 48000zł

Data Engineer. Remote Dodaj do ulubionych
	29 lip., 22:44 - 13.08.2025 Pensja: 168000zł - 300000zł

Partner Handlowy ds. stacji ładowania pojazdów Dodaj do ulubionych
	29 lip., 21:01 - 16.08.2025 wynagrodzenie do uzgodnienia

Firmware Developer Dodaj do ulubionych
	29 lip., 14:48 - 12.08.2025 wynagrodzenie do uzgodnienia

Project Manager Dodaj do ulubionych
	29 lip., 14:48 - 13.08.2025 wynagrodzenie do uzgodnienia

Najbliższe miasta od Gdańsk

Branża:	Płeć:	Wykształcenie:	Doświadczenie zawodowe:	Grafik pracy: