Section 1

Overview

HTML Tailwind CSS JavaScript FastAPI Temporal ChromaDB Ollama S3

Local Context Query is a full-stack document Q&A application. Users upload files through a browser interface, the backend processes them into a vector database, and users then ask natural-language questions that are answered using only the content from their uploaded documents.

The system is built from four cooperating layers:

  • Frontend — A single-page HTML client using Tailwind CSS and vanilla JavaScript. Handles uploads, document management, chat, and real-time updates via WebSockets.
  • API layer — A FastAPI server that exposes REST endpoints and a WebSocket hub. It delegates long-running work to Temporal workflows rather than blocking request threads.
  • Orchestration — Temporal manages durable workflows for document processing and query execution. Each workflow is composed of retry-safe activities.
  • Infrastructure — S3 (or LocalStack) for file storage, ChromaDB for vector search, and Ollama for local LLM inference and embeddings.

This guide teaches you how every layer works, why specific design decisions were made, and where to look when things go wrong.

Section 2

System Architecture

The request flow follows two paths: upload and query. Both use Temporal for durable orchestration, which means the backend API returns immediately and the browser polls or listens on a WebSocket for results.

Browser FastAPI Temporal Workers │ │ │ │ │─── POST /api/upload ──────▶│ │ │ │◀── {doc_id, status} ──────│── start workflow ─────▶│ │ │ │ │── schedule activities ──▶│ │ │ │ │── extract text │ │ │ │── chunk text │ │ │ │── embed & store │ │ │◀── workflow complete ────│ │ │◀── callback ──────────│ │ │◀── WebSocket push ────────│ │ │ │ │ │ │ │─── POST /api/query ──────▶│ │ │ │◀── {query_id} ────────────│── start workflow ─────▶│ │ │ │ │── execute_query ────────▶│ │ │ │ │── embed question │ │ │ │── vector search │ │ │ │── LLM generate │ │ │ │── save answer → S3 │ │◀── HTTP callback ─────│ │ │◀── WebSocket push ────────│ │ │
Key insight: The browser never waits for a blocking response. The API starts a Temporal workflow and returns a job ID. The browser then receives the result through either a WebSocket push or by polling.
Section 3

Page Layout

The HTML is organized into three major regions:

  • Header — Fixed at the top. Contains the model selector dropdown, WebSocket status indicator (green/red dot), refresh button, theme toggle, and the sidebar toggle for mobile.
  • Chat panel — The central column. Holds the scrollable message history (#chatMessages) and the input composer bar with a growing textarea.
  • Sidebar — On desktop it sits to the right. On mobile it slides in as an overlay drawer. Contains the upload drop zone, the processing queue, and the document selection list.

This three-column pattern (nav, content, sidebar) is extremely common in web applications. Learning it here will help you build dashboards, email clients, and admin panels later.

body ├── header │ ├── model <select> │ ├── refresh btn │ ├── WS status dot │ ├── theme toggle │ └── sidebar toggle (mobile) └── main.flex ├── section.chat-column │ ├── #chatMessages (scrollable) │ └── .input-bar │ ├── <textarea> │ └── send <button> ├── #sidebarOverlay └── aside#sidebar ├── upload drop-zone │ └── hidden <input type="file"> ├── processing queue list └── document context list ├── "toggle all" checkbox └── per-doc checkbox + delete
Section 4

Tailwind CSS + Custom Styles

Tailwind CSS is a utility-first CSS framework. Instead of writing custom class names like .chat-container, you compose styles directly in HTML with small utility classes.

What Tailwind handles

Flexbox layouts (flex, gap-4), spacing (p-4, m-2), typography (text-sm, font-bold), responsive breakpoints (lg:hidden, sm:px-5), border radius (rounded-xl), and visibility toggling.

What custom CSS handles

Theme color variables (--bg, --card, --accent), the upload drop-zone hover animation, custom checkbox styles, scrollbar appearance, WebSocket status dot animations, and the sidebar slide transition.

How Tailwind utility classes work

<!-- Instead of writing separate CSS: --> <!-- .card { padding: 1rem; border-radius: 0.75rem; background: white; } --> <!-- You write utilities directly: --> <div class="p-4 rounded-xl bg-white shadow-md"> Content here </div> <!-- Responsive: hide on mobile, show on desktop --> <aside class="hidden lg:block w-72"> Sidebar only visible ≥1024px </aside>
Learning tip: Open your browser DevTools and hover over elements. You will see each Tailwind class maps to exactly one or two CSS properties. That one-to-one mapping is why Tailwind is fast to learn. The Utility-First Fundamentals page is the best starting point.
Section 5

Dark & Light Theming

Theme switching is built with CSS custom properties (also called CSS variables) and a data-theme attribute on the root <html> element. This is the recommended pattern because it works with any CSS framework and requires zero JavaScript to apply styles — JS only flips the attribute.

Step 1 — Define variables for each theme

/* Light theme (default) */ :root, [data-theme="light"] { --bg: #f8f7f4; --text: #1c1917; --accent: #b45309; --border: #e7e5e4; --card: #ffffff; } /* Dark theme */ [data-theme="dark"] { --bg: #0c0a09; --text: #e7e5e4; --accent: #f59e0b; --border: #292524; --card: #1c1917; }

Step 2 — Use variables everywhere

body { background: var(--bg); color: var(--text); transition: background 0.3s, color 0.3s; } .card { background: var(--card); border: 1px solid var(--border); }

Step 3 — Toggle with JavaScript

const toggle = document.getElementById('themeBtn'); toggle.addEventListener('click', () => { const html = document.documentElement; const next = html.dataset.theme === 'dark' ? 'light' : 'dark'; html.dataset.theme = next; localStorage.setItem('theme', next); // persist preference });
Why this works: Every element using var(--bg) automatically updates when the attribute changes. No class toggling on individual elements, no re-renders, no framework needed. See the MDN guide on CSS custom properties.
Section 6

JavaScript State Management

The app uses plain JavaScript variables to track state. This is appropriate for a small single-page application. As the app grows, you would migrate to a state container (like Zustand, Redux, or even a simple event bus).

let documents = []; // from GET /api/documents let allDocsEnabled = true; // master toggle let processingQueue = []; // uploads in progress let pendingQueries = {}; // query_id → DOM element const API = '/api'; // base path
Why track pendingQueries as a map?

When the user sends a question, the UI stores the query ID as a key and the loading bubble DOM element as the value. When the answer arrives (via WebSocket or polling), the code looks up the bubble by query ID and replaces its content. This avoids searching the DOM every time.

The safeFetch() helper

All HTTP calls go through a single helper that standardizes error handling:

async function safeFetch(url, opts = {}) { const resp = await fetch(url, opts); const text = await resp.text(); // always read as text first let data; try { data = JSON.parse(text); } catch { throw new Error(`Non-JSON: ${text.slice(0, 200)}`); } if (!resp.ok) throw new Error(data.detail || `HTTP ${resp.status}`); return data; }
Why read as text first? If the server returns an HTML error page (like a 502 from a reverse proxy), calling resp.json() directly throws a confusing parse error. Reading as text gives you a chance to handle the raw body and show a useful error message.
Section 7

REST API Endpoints

The frontend communicates with the FastAPI backend through these endpoints. Each one follows a consistent JSON response pattern.

GET /api/health

Returns the status of all backend services: ChromaDB, Ollama, Temporal, and S3. Useful for debugging when things stop working.

GET /api/models

Lists available LLM models from Ollama. Populates the model select dropdown in the header.

POST /api/upload

Accepts a file via FormData. Saves to S3, starts a Temporal upload workflow, returns a doc_id.

GET /api/upload/:id/status

Polls the Temporal workflow status for an upload. Returns processing, completed, or failed.

POST /api/query

Sends a question + selected doc IDs. Saves the query to S3, starts a Temporal query workflow, returns a query_id.

GET /api/query/:id/answer

Polls for the answer in S3. Returns processing until the answer is ready, then returns the full answer with sources.

GET /api/documents

Lists all indexed documents from ChromaDB, de-duplicated by doc_id. Populates the sidebar document list.

DELETE /api/documents/:id

Deletes a document from ChromaDB and cleans up its S3 objects. The frontend removes the card from the sidebar immediately.

POST /api/internal/query-complete

Internal callback from the worker. Loads the answer from S3 and broadcasts it to all WebSocket clients.

Section 8

WebSocket Design

WebSockets provide a persistent, bi-directional connection between the browser and the server. Unlike HTTP (where the client must ask for updates), the server can push messages at any time. This app uses WebSockets for instant notification when uploads finish or answers are ready.

Connection lifecycle

function connectWS() { // Match protocol: https → wss, http → ws const proto = location.protocol === 'https:' ? 'wss:' : 'ws:'; ws = new WebSocket(proto + '//' + location.host + '/ws'); ws.onopen = () => { // Set status dot to green // Reset retry counter // Start keepalive ping every 30 seconds setInterval(() => ws.send('ping'), 30000); }; ws.onclose = () => { // Set status dot to red // Reconnect with exponential backoff const delay = Math.min(1000 * 2 ** retries, 30000); setTimeout(connectWS, delay); retries++; }; ws.onmessage = (event) => { const msg = JSON.parse(event.data); if (msg.type === 'query_answer') handleAnswer(msg); if (msg.type === 'upload_complete') handleUploadDone(msg); if (msg.type === 'pong') /* keepalive ack */; }; }
Why this is good

The dual delivery model (WebSocket push + HTTP polling) means the UI still works even if the WebSocket disconnects temporarily. Exponential backoff prevents flooding the server during outages.

Protocol matching matters

If the page loads over HTTPS, you must use wss:. Browsers block mixed content — an insecure WebSocket on a secure page will be rejected. Always derive the protocol from location.protocol.

Server-side: the WSHub pattern

The FastAPI backend maintains a WSHub class that tracks all connected WebSocket clients. When an event happens (answer ready, upload complete), it broadcasts a JSON message to every client. Dead connections are cleaned up automatically.

class WSHub: def __init__(self): self.clients: Set[WebSocket] = set() async def broadcast(self, message: dict): dead = set() for ws in self.clients: try: await ws.send_json(message) except Exception: dead.add(ws) self.clients -= dead
Section 9

Upload Flow

The upload area supports both click-to-browse and drag-and-drop. Under the hood, a hidden <input type="file"> is triggered when the drop zone is clicked. The drag events (dragenter, dragleave, drop) toggle a visual highlight class.

The upload sequence

1 User selects or drops one or more files.

2 Each file is uploaded sequentially via POST /api/upload with a FormData body.

3 The server saves the raw file to S3 and starts a Temporal DocumentUploadWorkflow.

4 The returned doc_id is added to the processingQueue array. A progress card appears in the sidebar.

5 The UI polls GET /api/upload/:id/status every 2 seconds as a fallback.

6 When the WebSocket broadcasts upload_complete (or polling detects completion), the progress card is replaced with a full document card.

Drag-and-drop tip for beginners: You must call e.preventDefault() on both dragover and drop events. Without this, the browser will navigate to the dropped file instead of passing it to your JavaScript. See the MDN File drag and drop guide.
Section 10

Chat Flow

The chat composer uses a <textarea> that auto-grows as the user types. It grows by setting style.height = 'auto' then style.height = scrollHeight + 'px' on every input event. A max-height cap prevents it from consuming the entire screen.

Keyboard shortcuts: Enter sends the message. Shift+Enter inserts a newline. This is the same pattern used by Slack, Discord, and most chat applications.

The query sequence

1 User presses Enter. The UI adds a user chat bubble and a loading indicator bubble.

2 The UI collects the currently enabled document IDs from the sidebar checkboxes.

3 POST /api/query sends the question, selected model, and enabled doc IDs.

4 The returned query_id is stored in pendingQueries mapped to the loading bubble element.

5 Two answer paths race: WebSocket push and polling at GET /api/query/:id/answer.

6 Whichever arrives first replaces the loading bubble with the answer text and source citations.

Section 11

DOM Rendering Patterns

The app builds HTML strings with template literals and injects them via innerHTML. Helper functions keep this organized:

  • addUser(text) — Creates a user chat bubble (right-aligned).
  • addBot(html) — Creates a bot answer bubble (left-aligned) with optional source citations.
  • addSys(text) — Creates a system message (centered, muted).
  • renderDocList() — Rebuilds the entire document sidebar list from the documents array.

XSS prevention with esc()

function esc(str) { const el = document.createElement('div'); el.textContent = str; // the browser escapes all HTML entities return el.innerHTML; // return the safe string }

This is a critical security measure. Without escaping, a user could type <img src=x onerror=alert(1)> in the chat and it would execute as JavaScript. The esc() function converts special characters to their HTML entity equivalents (<&lt;).

The renderMd() helper adds simple markdown formatting (bold with **text**, inline code with backticks). It runs after escaping, so the formatting syntax is safe.

Future improvement: As the UI grows, building HTML with string concatenation becomes fragile. Consider migrating to HTML <template> elements, a lightweight library like Lit, or a full framework like React or Vue.
Section 12

FastAPI Backend

FastAPI is a modern Python web framework designed for building APIs. It uses Python type hints and Pydantic models to auto-validate request and response data, and generates OpenAPI documentation at /docs automatically.

Key patterns in main.py

Global exception handler

The @app.exception_handler(Exception) decorator catches all unhandled errors and returns a structured JSON response instead of an HTML stack trace. This prevents information leakage in production.

Lazy client initialization

Clients for Temporal, ChromaDB, and S3 are created on first use (get_temporal(), get_collection()) and cached as module-level globals. This avoids blocking startup if a service is temporarily down.

CORS middleware

Currently set to allow_origins=["*"] for development. In production, restrict this to your actual domain. Open CORS is a security risk (see Section 18).

Immediate returns

Upload and query endpoints start a Temporal workflow and return a job ID immediately. They never block waiting for processing to finish. This keeps HTTP response times under 500ms even for large files.

Section 13

Temporal Workflows

Durable Execution Retry Policies Task Queues

Temporal is a durable execution platform. A Temporal workflow is a function that orchestrates a sequence of steps. If the process crashes at any point, Temporal replays the workflow from the last completed step — without re-executing already-completed activities. This gives you automatic fault tolerance for free.

Document Upload Workflow

The DocumentUploadWorkflow orchestrates three activities in sequence:

DocumentUploadWorkflow │ ├──▶ extract_text_activity (5 min timeout, 3 retries) │ Read raw file from S3 → extract text → save text to S3 │ ├──▶ chunk_text_activity (2 min timeout, 3 retries) │ Read text from S3 → split into overlapping chunks │ └──▶ embed_and_store_activity (30 min timeout, 2 min heartbeat, 3 retries) For each chunk: embed via Ollama → store in ChromaDB

Query Workflow

The QueryWorkflow is simpler — it delegates to a single execute_query_activity that performs the entire RAG pipeline:

QueryWorkflow │ └──▶ execute_query_activity (10 min timeout, 3 min heartbeat, 2 retries) Load question from S3 → embed query → vector search ChromaDB → build prompt with context → generate answer via Ollama → save to S3 → notify backend via HTTP callback

Key Temporal concepts

Task Queues

Upload and query workflows run on separate task queues (document-upload-tasks and query-tasks). This lets you scale them independently — you might run 4 upload workers but only 1 query worker (since the LLM is GPU-bound).

Retry Policies

Each activity has a RetryPolicy with maximum_attempts and initial_interval. Permanent failures (bad file, model not found) use ApplicationError(non_retryable=True) to stop retries immediately.

Heartbeats

Long-running activities (embedding, LLM generation) call activity.heartbeat() periodically. If the heartbeat stops for longer than heartbeat_timeout, Temporal assumes the worker crashed and reschedules the activity.

Deterministic Workflows

Workflow code must be deterministic — no I/O, no random numbers, no datetime.now(). All side effects go in activities. Temporal replays the workflow function during recovery and needs identical decisions each time.

Read more: The Temporal Python SDK docs cover workflow definitions, activity patterns, and retry policies in detail. The Getting Started tutorial is excellent for hands-on learning.
Section 14

Activities & Dependency Injection

Activities are where real work happens: reading from S3, calling Ollama, writing to ChromaDB. The codebase uses class-based activities with constructor injection, which makes them trivially testable.

class DocumentActivities: def __init__(self, s3=None, chroma=None, ollama=None): self._s3 = s3 or S3Client() # real in prod self._chroma = chroma or ChromaStore() self._ollama = ollama or OllamaClient() @activity.defn(name="extract_text_activity") async def extract_text(self, inp: UploadInput) -> ExtractResult: raw = self._s3.get_bytes(inp.s3_raw_key) # ... extract, save, return result

In production, the module creates real instances at import time. In tests, you inject lightweight mocks:

# In tests: acts = DocumentActivities( s3=MockS3Client({"raw/doc.txt": b"Hello"}), chroma=MockChromaStore(), ollama=MockOllamaClient(), ) result = await acts.extract_text(inp)

The Ollama embed fix

The OllamaClient.embed() method includes a compatibility layer worth understanding. Ollama's /api/embed endpoint changed its input format across versions. The code tries the modern API first (string input), and if the server returns a 400 error, falls back to the legacy /api/embeddings endpoint. It also truncates long inputs to stay within the model's context window and passes truncate: true as a server-side safety net.

Section 15

RAG Pipeline

Retrieval-Augmented Generation Vector Search ChromaDB

RAG (Retrieval-Augmented Generation) is a technique where you ground an LLM's answer in specific documents rather than relying on its training data alone. This app implements a straightforward RAG pipeline:

Ingestion pipeline

1 Extract — Parse the uploaded file (PDF via pdfplumber, DOCX via python-docx, XLSX via openpyxl, or plain text decode). Save extracted text to S3.

2 Chunk — Split text into overlapping windows of 500 words with 100-word overlap. Overlap ensures that context at chunk boundaries is not lost.

3 Embed — Each chunk is sent to Ollama's nomic-embed-text model, which returns a 768-dimensional vector. Chunks are truncated to 6000 characters (~1500 tokens) to stay within the model's 2048-token context window.

4 Store — Vectors, metadata (doc_id, filename, chunk_index), and the original text are upserted into ChromaDB with cosine similarity as the distance metric.

Query pipeline

1 Embed query — The user's question is embedded using the same model.

2 Retrieve — ChromaDB returns the top 8 most similar chunks (filtered by enabled document IDs).

3 Generate — The retrieved chunks are assembled into a prompt context (capped at 12,000 chars) and sent to the LLM with a system prompt that instructs it to answer only from the provided context.

4 Persist — The answer and source citations are saved to S3, then the backend is notified to push the result via WebSocket.

Why overlapping chunks? If a sentence falls exactly at a chunk boundary, a non-overlapping split would break the sentence across two chunks. The 100-word overlap means both adjacent chunks contain that sentence, so it can be found by vector search regardless of which chunk the embedding is closer to.
Section 16

Testing Strategy

The test suite in test_activities.py demonstrates how to test Temporal activities without running a Temporal server. Because activities are just async functions on classes with injected dependencies, you call them directly with mock infrastructure.

MockS3Client

An in-memory dictionary that mimics get_bytes() and put_bytes(). Pre-load it with test data. Raises ClientError on missing keys, just like real S3.

MockOllamaClient

Returns deterministic vectors ([0.01] * 768) for embed calls and a fixed string for generate calls. No network needed.

MockChromaStore

Wraps a MagicMock collection. You can assert on .add(), .query(), and .count() calls.

ApplicationError assertions

Tests verify that permanent failures (empty file, missing S3 key, model not found) raise ApplicationError with non_retryable=True, ensuring Temporal won't retry them.

# Run the test suite: $ pytest test_activities.py -v # Example test — plain text extraction: async def test_plain_text_extraction(self): s3 = MockS3Client({"raw/doc1.txt": b"Hello world."}) acts = _make_doc_activities(s3=s3) result = await acts.extract_text( UploadInput(doc_id="doc1", filename="readme.txt", s3_raw_key="raw/doc1.txt") ) assert result.characters == 12 assert s3._store[result.s3_text_key] == b"Hello world."
Section 17

Responsive Design

The app is mobile-aware using Tailwind's responsive prefix system. Classes like lg:block apply only at the lg breakpoint (1024px) and above. Below that width, the sidebar becomes an off-canvas drawer.

Mobile (below 1024px)

The sidebar is hidden by default (hidden). A hamburger button in the header toggles it open. When open, a dark overlay covers the chat area (#sidebarOverlay). Tapping the overlay closes the sidebar.

Desktop (1024px and above)

The sidebar is always visible (lg:block). The overlay is hidden. The main layout uses flex with the chat column taking remaining space and the sidebar fixed at a set width.

<!-- Tailwind responsive pattern --> <aside class=" fixed inset-y-0 right-0 w-80 <!-- mobile: full-height drawer --> translate-x-full <!-- mobile: off-screen by default --> transition-transform duration-300 <!-- smooth slide --> lg:static lg:translate-x-0 <!-- desktop: normal flow --> lg:w-72 <!-- desktop: fixed width --> ">
Reference: The Tailwind Responsive Design docs explain the mobile-first breakpoint system in detail.
Section 18

Security Checklist

This section covers security issues you must address before deploying this (or any similar) application to production. Each item is marked with its severity.

🔴 Critical: CDN script integrity

Loading Tailwind from a CDN means a compromised CDN could inject malicious code. In production, either bundle Tailwind locally or use Subresource Integrity (SRI) hashes to ensure the file hasn't been tampered with.

🔴 Critical: CSRF protection

Upload, delete, and query endpoints change server state. If the app uses cookie-based authentication, an attacker could craft a page that submits requests on behalf of a logged-in user. Add CSRF tokens or use SameSite=Strict cookies with server-side origin validation. See the OWASP CSRF Prevention Cheat Sheet.

🔴 Critical: XSS from model output

The esc() function escapes user input, but the LLM's response may contain HTML-like text. If renderMd() or any rich formatting allows unescaped content through, it becomes an XSS vector. Sanitize all model output before injection. Consider using DOMPurify.

🟡 High: File upload validation

The browser's accept attribute is cosmetic — it does not enforce file types. The server must validate MIME type, file extension, file size, and scan for malware. Limit maximum file size, reject unexpected extensions, and process files in an isolated environment.

🟡 High: WebSocket authentication

The current WebSocket endpoint accepts all connections without authentication. In production, verify the user's session token during the handshake. Also authorize document access per user — don't trust enabled_doc_ids from the client blindly.

🟡 High: CORS restriction

The backend currently uses allow_origins=["*"]. In production, restrict this to your actual frontend domain. Open CORS allows any website to make requests to your API.

🟠 Medium: Transport security (HTTPS/WSS)

All traffic — uploads, chat messages, API calls, WebSocket data — must go over TLS in production. Without it, content can be intercepted by anyone on the network. Use HTTPS for the site and WSS for WebSocket connections.

🟠 Medium: Rate limiting

Without rate limits, a malicious user can flood the upload endpoint, exhaust LLM resources, or spam the WebSocket. Add rate limiting at the API gateway or application layer for uploads, queries, deletes, and WebSocket messages.

🔵 Low: Access control on documents

The client sends enabled_doc_ids with queries. The server must verify the user actually owns those documents. A multi-tenant deployment without this check would allow users to query other users' documents.

🔵 Low: Logging hygiene

Avoid logging raw file content, user prompts, API keys, or session tokens. Use structured logging with redaction for sensitive fields. The OWASP Logging Cheat Sheet has good guidelines.

Section 19

Best Practices Applied

The codebase follows several best practices worth internalizing for your own projects:

Separate task queues for different workloads

Upload processing (CPU-bound text extraction) and query execution (GPU-bound LLM inference) run on different Temporal task queues. This lets you scale each independently: many upload workers, fewer query workers.

Concurrency limits on GPU workers

The query worker sets max_concurrent_activities=1 because LLM inference is GPU-bound. Running multiple inferences simultaneously would cause OOM errors or severe slowdowns.

Non-retryable errors for permanent failures

Using ApplicationError(non_retryable=True) for things like "model not found" or "empty file" prevents Temporal from wasting retries on failures that will never succeed.

Graceful worker shutdown

The worker process handles SIGINT/SIGTERM signals, allows in-flight activities to drain, and reports any worker crashes. This prevents data loss during deployments.

Dependency injection for testability

Activity classes accept S3, ChromaDB, and Ollama clients as constructor parameters. Tests inject mocks; production uses real clients. No monkey-patching needed.

Dual delivery (push + poll)

The frontend uses WebSocket push as the primary delivery path and HTTP polling as a fallback. If the WebSocket reconnects after the answer was sent, polling still picks it up.

Heartbeats on long-running activities

Embedding 50 chunks can take minutes. Regular activity.heartbeat() calls tell Temporal the worker is alive. Without heartbeats, Temporal would assume the worker crashed and retry the entire activity.

Structured error results instead of workflow failures

Both workflows catch exceptions and return a typed result with status="failed" and an error message. This gives callers a clean response instead of requiring them to handle Temporal workflow failure exceptions.

Section 20

Suggested Next Steps

If you're a new developer working on this codebase, here are concrete improvements to build, roughly ordered by difficulty:

Beginner: Move inline handlers to addEventListener()

Replace onclick="doSomething()" in the HTML with element.addEventListener('click', doSomething) in the JavaScript. This separates structure from behavior and makes the code easier to maintain and debug.

Beginner: Extract chat bubbles into template functions

The addUser(), addBot(), and addSys() functions all build HTML strings. Create a single createBubble(type, content) function that returns a consistent structure.

Intermediate: Add a real markdown parser

Replace the simple renderMd() regex with a library like marked.js combined with DOMPurify for sanitization. This enables headings, lists, code blocks, and links in answers.

Intermediate: Add cancel support

Allow users to cancel an in-progress upload or query. On the frontend, use AbortController for fetch requests. On the backend, use Temporal's cancellation API to terminate workflows.

Intermediate: Split JavaScript into modules

Break the monolithic <script> into ES modules: api.js (HTTP helpers), ws.js (WebSocket logic), ui.js (DOM manipulation), and state.js (data management). Use native import/export.

Advanced: Add authentication

Implement user sessions with JWT or session cookies. Scope documents per user. Authenticate WebSocket connections during the handshake. Add authorization checks on every API endpoint.

Advanced: Streaming LLM responses

Instead of waiting for the full answer, stream tokens from Ollama through the WebSocket to the browser. This gives users immediate feedback as the answer is generated, similar to ChatGPT's typing effect.

Section 21

Resources & Documentation

Links to official documentation for every technology used in this project: