Cataloxy
wyszukiwanie ofert pracy
po frazie kluczowejpo ID
Pensja:  od     do PLN
Branża: Płeć: Wykształcenie: Doświadczenie zawodowe: Grafik pracy:
szukaj w mieście: Gdańsk
Praca w Polska Praca w Pomorzu Województwie
Cataloxy Gdańsk...Praca w GdańskSenior AI Engineer - Inference Systems Optimization

Oferta pracy Senior AI Engineer - Inference Systems Optimization, Gdańsk

ID: 1066132549   1

Senior AI Engineer - Inference Systems Optimization, Gdańsk

Wynagrodzenie: do uzgodnienia

Podsumowanie

Stanowisko: Senior AI Engineer - Inference Systems Optimization
Opublikowane: 30.07.2025. Aktualne do: 13.08.2025
Branże: IT
Płeć: dowolna
Firma: Accelerated.Finance
  Oferta pracy od partnera
 

Napisz do firmy Napisz

Opis oferty pracy

The Role

We are seeking a senior AI engineer passionate about performance optimization to join our team and revolutionize our inference systems. You will be at the heart of technical innovation, transforming the latest research advances into ultra-high-performance production solutions.


Core Mission

Your mission will be to radically optimize the performance of our systems by leveraging and improving current leading inference technologies. You will work on cutting-edge technical challenges, reducing latency and maximizing throughput to serve millions of users.


Key Responsibilities


Analysis and Benchmarking

  • Analyze performance bottlenecks (CPU scheduling, GPU memory, kernel efficiency) - Establish detailed performance metrics (TTFT, tokens/s, P99 latency, GPU utilization) - Design and implement comprehensive benchmarks comparing SOTA inference solutions performance

  • Document trade-offs between different optimization approaches


Inference Systems Optimization

  • Optimize current inference engines/frameworks performance for our specific use cases - Implement advanced techniques: RadixAttention, PagedAttention, continuous batching - Develop optimized CUDA kernels (Triton, FlashInfer integration)

  • Integrate torch.compile and CUDA graphs to maximize performance

  • Optimize KV cache management and batching strategies


Research Paper Implementation

  • Transform the latest academic innovations into production code

  • Implement optimization techniques like MLA (Multi-head Latent Attention) - Adapt MoE (Mixture of Experts) architectures for efficient inference

  • Integrate model-specific optimizations (DeepSeek, Llama, etc.)


Infrastructure and Scalability

  • Architect distributed multi-GPU solutions with tensor parallelism

  • Optimize GPU fleet utilization (H100, H200, ...)

  • Implement advanced monitoring and profiling systems

  • Develop debugging tools to identify performance issues


Desired Profile


Essential Technical Skills

  • Deep expertise in PyTorch and NVIDIA ecosystem (CUDA, NCCL, cuDNN)

  • Mastery of inference frameworks: SGLang, vLLM, Dynamo, or equivalents

  • Solid experience (5+ years) in ML systems optimization in production

  • Practical knowledge of Transformer architectures and attention techniques

  • Skills in GPU programming (CUDA, Triton) and kernel optimization


Advanced Technical Skills (Strong Plus)

  • Experience with inference optimization techniques:

  • Quantization (INT8, INT4, FP8)

  • KV cache optimization (MQA, GQA, MLA)

  • Speculative decoding, multi-token prediction

  • Structured generation and constrained decoding

  • Knowledge of frameworks: FlashAttention, FlashInfer, xFormers - Experience with high-performance distributed systems

  • Contributions to open-source ML inference projects


Personal Qualities

  • Passion for optimization and performance

  • Ability to read and implement complex research papers

  • Excellent analytical and problem-solving skills

  • Autonomy and ability to work on unstructured problems

  • Clear communication of technical results


What We Offer


Technical Impact

  • Work on systems serving billions of tokens per day

  • Access to latest GPUs (H100, H200) and compute resources - Direct collaboration with research teams

  • Open-source contributions and technical publications


Work Environment

  • Team of experts passionate about performance

  • Culture of innovation and technical experimentation

  • Flexibility and autonomy in technical approaches

  • Continuous training on latest advances


Compensation Package

- Competitive salary aligned with expertise

- Significant equity

- Conference and training budget

- Cutting-edge hardware


Key Technologies


Inference frameworks: SGLang, vLLM, TensorRT-LLM

GPU optimization: CUDA, Triton, FlashInfer, FlashAttention

Deep Learning: PyTorch, torch.compile

Architectures: Transformers, MoE, MLA, attention variants

Infrastructure: Multi-GPU, tensor parallelism, distributed systems


How to Apply

Send your resume along with examples of optimization projects you have completed. We particularly value:

  • Open-source contributions to inference projects

  • Benchmarks or performance analyses you have conducted

  • Implementations of innovative optimization techniques



Stały link do tej strony:




Podobne oferty pracy Gdańsk w IT

LIMS Expert. Remote

30 lip., 22:47 - 13.08.2025
wynagrodzenie do uzgodnienia

Solution Architect (Digital Hangar)

30 lip., 14:47 - 14.08.2025
Pensja: 180000zł - 300000zł

Quant Developer

30 lip., 14:47 - 14.08.2025
wynagrodzenie do uzgodnienia

Senior Java Developer. Remote

30 lip., 14:47 - 17.08.2025
wynagrodzenie do uzgodnienia

OMP Supply Chain Planning Solution Consultant. Remote

30 lip., 14:47 - 14.08.2025
wynagrodzenie do uzgodnienia

Software Engineer (Python–GenAI). Remote

30 lip., 14:47 - 18.08.2025
wynagrodzenie do uzgodnienia

Konsultant ds. wdrożeń ERP – Enova365 (Kadry i Płace)

30 lip., 13:41 - 13.08.2025
wynagrodzenie do uzgodnienia

Mobile Device Management Knowledge. Remote

30 lip., 02:45 - 13.08.2025
wynagrodzenie do uzgodnienia

Tester Wydajnościowy

29 lip., 22:47 - 12.08.2025
Pensja: 201600zł - 262080zł

Analityk IT familiar with Guidewire

29 lip., 22:47 - 13.08.2025
Pensja: 161280zł - 282240zł

UX UI Designer

29 lip., 22:47 - 13.08.2025
Pensja: 42000zł - 48000zł

Data Engineer. Remote

29 lip., 22:44 - 13.08.2025
Pensja: 168000zł - 300000zł

Partner Handlowy ds. stacji ładowania pojazdów

29 lip., 21:01 - 16.08.2025
wynagrodzenie do uzgodnienia

Firmware Developer

29 lip., 14:48 - 12.08.2025
wynagrodzenie do uzgodnienia

Project Manager

29 lip., 14:48 - 13.08.2025
wynagrodzenie do uzgodnienia
Najbliższe miasta od Gdańsk