main
Minimal RAG (Retrieval-Augmented Generation) Example
This repository contains a minimal RAG pipeline using:
sentence-transformersfor embeddings (all-MiniLM-L6-v2)faissfor vector search
Quick usage:
- Create venv:
mkdir -p .tmp
python3 -m venv ./venv
source ./venv/bin/activate
export TMPDIR="$(pwd)/.tmp"
- Install dependencies:
pip install -r requirements.txt
- Ingest a folder of
.txt/.mdfiles:
git clone https://github.com/kelseyhightower/kubernetes-the-hard-way.git ./docs
python ingest.py --path ./docs --index-path index.faiss --meta-path docs.pkl
- Query (Ollama HTTP API):
# Ensure Ollama server is reachable (example endpoint: http://localhost:11434)
python query.py --question "How to deploy kubernetes?" --index-path index.faiss --meta-path docs.pkl --ollama-model qwen3:4b --ollama-url http://localhost:11434
This script requests streaming output from the Ollama HTTP API and prints generated text chunks as they arrive. If the Ollama server is unreachable or returns an error, the script prints diagnostics and falls back to the top retrieved snippets.
Notes:
- This is a minimal example. For production use, consider better chunking, persistence, and safety checks.
- Installing
sentence-transformersandfaiss-cpumay require system packages on slim images.
Description
Languages
Python
100%