Minimal RAG (Retrieval-Augmented Generation) Example

This repository contains a minimal RAG pipeline using:

sentence-transformers for embeddings (all-MiniLM-L6-v2)
faiss for vector search

Quick usage:

Create venv:

mkdir -p .tmp
python3 -m venv ./venv
source ./venv/bin/activate
export TMPDIR="$(pwd)/.tmp"

Install dependencies:

pip install -r requirements.txt

Ingest a folder of .txt/.md files:

git clone https://github.com/kelseyhightower/kubernetes-the-hard-way.git ./docs
python ingest.py --path ./docs --index-path index.faiss --meta-path docs.pkl

Query (Ollama HTTP API):

# Ensure Ollama server is reachable (example endpoint: http://localhost:11434)
python query.py --question "How to deploy kubernetes?" --index-path index.faiss --meta-path docs.pkl --ollama-model qwen3:4b --ollama-url http://localhost:11434

This script requests streaming output from the Ollama HTTP API and prints generated text chunks as they arrive. If the Ollama server is unreachable or returns an error, the script prints diagnostics and falls back to the top retrieved snippets.

Notes:

This is a minimal example. For production use, consider better chunking, persistence, and safety checks.
Installing sentence-transformers and faiss-cpu may require system packages on slim images.