SLM Training Tutorial

Train and Run a Small Language Model with Docker + LoRA + Ollama

Beginner friendly • Windows Terminal • Docker based

Build your own Java DTO assistant SLM

This tutorial explains how the Docker project works, how Ollama runs a custom local model, how Hugging Face + PyTorch fine-tunes an SLM with LoRA, and how to test everything from Windows Terminal.

What you are building

1. Ollama Custom Model
A local Java DTO assistant using a Modelfile.
2. Hugging Face LoRA Training
Fine-tune a small model on Java DTO examples.
3. FastAPI Model Server
Run your trained adapter behind a local API.
4. Docker Compose Dev Lab
One command starts the full environment.

Plain-English Overview

A Small Language Model, or SLM, is a smaller AI text model that can run locally or on smaller hardware. Instead of building a model from zero, you start with a pretrained model and teach it your style, task, and examples. In this project, the task is: generate Java DTO classes using Lombok, Jakarta Validation, and Swagger annotations.

How the System Works

The project has two model paths. The first path is an immediate Ollama custom model using a Modelfile. The second path is actual LoRA fine-tuning using Hugging Face Transformers, PyTorch, and PEFT.

Windows Terminal docker compose commands Docker Compose starts local services Ollama java-dto-assistant Dataset Java DTO examples Hugging Face Training Transformers + PyTorch + PEFT LoRA Adapter small trained weights FastAPI serves trained adapter

Step-by-step Tutorial

Install Docker Desktop for Windows. Use Windows Terminal or PowerShell. You do not need Python installed on Windows because Python runs inside Docker. Unzip the project, copy `.env.example` to `.env`, then run `docker compose up --build -d`. Docker Compose starts `slm-dev`, `slm-api`, `ollama`, and `ollama-setup`. Build `java-dto-assistant` from a Modelfile using `qwen2.5-coder:0.5b` as the base model. Convert examples into JSONL rows where each row has a prompt and a correct Java DTO answer. Run the training script. LoRA creates a small adapter instead of changing every model weight. Use `infer.py` or the FastAPI service to generate Java DTO code from your fine-tuned adapter. Add more high-quality examples, use a code-focused base model, test outputs, and repeat.

Windows Terminal Commands

Use these commands in PowerShell or Windows Terminal.

unzip slm-lora-java-dto-custom-ollama-builder.zip cd slm-lora-java-dto-complete copy .env.example .env docker compose up --build -d docker compose exec -e OLLAMA_BASE_URL=http://ollama:11434 -e OLLAMA_MODEL=java-dto-assistant slm-dev python scripts/build_custom_ollama_model.py curl http://localhost:11434/api/tags docker compose exec -e OLLAMA_BASE_URL=http://ollama:11434 -e OLLAMA_MODEL=java-dto-assistant slm-dev python scripts/query_ollama.py --prompt "Create a Java 21 DTO named OrderRequest with Lombok Builder, Swagger Schema examples, and Jakarta Validation." docker compose exec slm-dev python scripts/prepare_dataset.py docker compose exec slm-dev python scripts/train_lora.py docker compose restart slm-api curl -s http://localhost:8000/health curl -s http://localhost:8000/generate -H "Content-Type: application/json" -d "{\"prompt\":\"Create a Java DTO named PatientIntakeRequest with Lombok Builder and Jakarta Validation.\",\"max_new_tokens\":220}"

Dataset Format

The training dataset uses JSONL. JSONL means each line is a separate JSON object. For simple causal language model fine-tuning, one common beginner format is a single `text` field that includes the instruction and the answer.

{"text":"### Instruction:\nCreate a Java DTO named OrderRequest.\n\n### Response:\nimport jakarta.validation.constraints.NotBlank;\n..."}

LoRA Explained Simply

Full fine-tuning updates the whole model. That can be expensive. LoRA adds small trainable layers to parts of the model. You train those smaller layers, called adapters. This is faster and uses less memory.

Full Fine-tuning

Updates most or all model weights. Better control, but expensive.

LoRA Fine-tuning

Trains small adapter weights. Good for local experiments and task specialization.

Ollama Custom Model Explained

Ollama lets you run local models and create custom models with a Modelfile. A Modelfile is like a Dockerfile for a model. It can say which base model to use, what system prompt to apply, and what generation parameters to set.

FROM qwen2.5-coder:0.5b

PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

SYSTEM """
You are JavaDtoSLM...
"""

Troubleshooting

Reference Docs

Use these references when you want to go deeper.