Beginner Tutorial: Train and Run a Small Language Model with Docker, Hugging Face, LoRA, and Ollama

Plain-English Overview

A Small Language Model, or SLM, is a smaller AI text model that can run locally or on smaller hardware. Instead of building a model from zero, you start with a pretrained model and teach it your style, task, and examples. In this project, the task is: generate Java DTO classes using Lombok, Jakarta Validation, and Swagger annotations.

How the System Works

The project has two model paths. The first path is an immediate Ollama custom model using a Modelfile. The second path is actual LoRA fine-tuning using Hugging Face Transformers, PyTorch, and PEFT.

Step-by-step Tutorial

Install Docker Desktop for Windows. Use Windows Terminal or PowerShell. You do not need Python installed on Windows because Python runs inside Docker.

Unzip the project, copy `.env.example` to `.env`, then run `docker compose up --build -d`.

Docker Compose starts `slm-dev`, `slm-api`, `ollama`, and `ollama-setup`.

Build `java-dto-assistant` from a Modelfile using `qwen2.5-coder:0.5b` as the base model.

Convert examples into JSONL rows where each row has a prompt and a correct Java DTO answer.

Run the training script. LoRA creates a small adapter instead of changing every model weight.

Use `infer.py` or the FastAPI service to generate Java DTO code from your fine-tuned adapter.

Add more high-quality examples, use a code-focused base model, test outputs, and repeat.

Windows Terminal Commands

Use these commands in PowerShell or Windows Terminal.

unzip slm-lora-java-dto-custom-ollama-builder.zip cd slm-lora-java-dto-complete copy .env.example .env docker compose up --build -d

docker compose exec -e OLLAMA_BASE_URL=http://ollama:11434 -e OLLAMA_MODEL=java-dto-assistant slm-dev python scripts/build_custom_ollama_model.py

curl http://localhost:11434/api/tags

docker compose exec -e OLLAMA_BASE_URL=http://ollama:11434 -e OLLAMA_MODEL=java-dto-assistant slm-dev python scripts/query_ollama.py --prompt "Create a Java 21 DTO named OrderRequest with Lombok Builder, Swagger Schema examples, and Jakarta Validation."

docker compose exec slm-dev python scripts/prepare_dataset.py docker compose exec slm-dev python scripts/train_lora.py

docker compose restart slm-api curl -s http://localhost:8000/health curl -s http://localhost:8000/generate -H "Content-Type: application/json" -d "{\"prompt\":\"Create a Java DTO named PatientIntakeRequest with Lombok Builder and Jakarta Validation.\",\"max_new_tokens\":220}"

Dataset Format

The training dataset uses JSONL. JSONL means each line is a separate JSON object. For simple causal language model fine-tuning, one common beginner format is a single `text` field that includes the instruction and the answer.

{"text":"### Instruction:\nCreate a Java DTO named OrderRequest.\n\n### Response:\nimport jakarta.validation.constraints.NotBlank;\n..."}

LoRA Explained Simply

Full fine-tuning updates the whole model. That can be expensive. LoRA adds small trainable layers to parts of the model. You train those smaller layers, called adapters. This is faster and uses less memory.

Full Fine-tuning

Updates most or all model weights. Better control, but expensive.

LoRA Fine-tuning

Trains small adapter weights. Good for local experiments and task specialization.

Ollama Custom Model Explained

Ollama lets you run local models and create custom models with a Modelfile. A Modelfile is like a Dockerfile for a model. It can say which base model to use, what system prompt to apply, and what generation parameters to set.

FROM qwen2.5-coder:0.5b

PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

SYSTEM """
You are JavaDtoSLM...
"""

Troubleshooting

Reference Docs

Use these references when you want to go deeper.

Train and Run a Small Language Model with Docker + LoRA + Ollama

Build your own Java DTO assistant SLM

What you are building