course

NL/EN
This training is available in Dutch and English. More information

Deploying and fine-tuning open source LLMs

Deploy and fine-tune open source LLMs like Llama and Mistral on your own infrastructure with vLLM, Ollama, and LoRA.

Not yet scheduled
- No location
-
2 days
1790 (excl. VAT)

Description

Not every AI project can or should run on commercial APIs. Regulation, latency requirements, cost, or data sovereignty can be reasons to self-host open source models. In this course you learn to run models like Llama and Mistral with the inference frameworks vLLM and Ollama, and how to fine-tune them with LoRA on your own data. You finish with a deployment on your own infrastructure.

This course is intended for software, ML, and platform engineers who want to run and fine-tune open source LLMs on their own or cloud infrastructure. Working in a Microsoft/.NET environment and want to build AI agents in C#? Have a look at our related courses.

Learning Goals

CheckmarkOperate inference frameworks (vLLM, Ollama) to deploy open source LLMs on your own infrastructure
ApplyLogo InfoSupport
CheckmarkCustomize a base LLM for domain-specific tasks using LoRA and QLoRA fine-tuning
ApplyLogo InfoSupport
CheckmarkCompare model formats and quantization strategies (FP16, GPTQ, AWQ, GGUF) for inference
UnderstandLogo InfoSupport
CheckmarkDistinguish when to use fine-tuning, RAG, or prompt engineering for a given use case
UnderstandLogo InfoSupport
CheckmarkEvaluate which open source model and deployment strategy best fits the quality, latency, cost, and compliance requirements of your project
EvaluateLogo InfoSupport
For the above learning goals we use Bloom's Taxonomy

Prior Knowledge

  • Programming experience in Python
  • Basic understanding of machine learning (what training and inference are)
  • Familiarity with Docker and Linux is a plus
  • Access to a GPU machine (local or cloud) — arranged before the course

Subjects

  • The open source model landscape: comparing Llama, Mistral, Gemma, Phi, and Qwen
  • Model formats and quantization: FP16, GPTQ, AWQ, and GGUF
  • Inference with vLLM: continuous batching, PagedAttention, and OpenAI-compatible API
  • Inference with Ollama: running locally, Modelfile, and application integration
  • Performance tuning: batch size, context length, tensor parallelism, and GPU memory management
  • RAG vs fine-tuning vs prompt engineering: when to choose what
  • LoRA and QLoRA: parameter-efficient fine-tuning
  • Preparing training data: formats, quality, and quantity
  • Fine-tuning with Hugging Face TRL and PEFT
  • Evaluating the fine-tuned model against the base model
  • Moving to production: model registry, versioning, and monitoring

Schedule

All courses can also be conducted within your organization as customized or incompany training.

Our training advisors are happy to help you provide personal advice or find Incompany training within your organization.

Prior knowledge courses

"This training was immediately applicable to the project"
Attendee
  • icon

    Hoge waardering

  • icon

    Praktijkgerichte trainingen

  • icon

    Gecertificeerde trainers

  • icon

    Eigen docenten