Andranik Sarkisyan ef840c929b First commit
2025-11-19 00:07:02 +03:00
2025-11-19 00:07:02 +03:00
2025-11-19 00:07:02 +03:00
2025-11-19 00:07:02 +03:00
2025-11-19 00:07:02 +03:00

Minimal RAG (Retrieval-Augmented Generation) Example

This repository contains a minimal RAG pipeline using:

  • sentence-transformers for embeddings (all-MiniLM-L6-v2)
  • faiss for vector search

Quick usage:

  1. Create venv:
mkdir -p .tmp
python3 -m venv ./venv
source ./venv/bin/activate
export TMPDIR="$(pwd)/.tmp"
  1. Install dependencies:
pip install -r requirements.txt
  1. Ingest a folder of .txt/.md files:
git clone https://github.com/kelseyhightower/kubernetes-the-hard-way.git ./docs
python ingest.py --path ./docs --index-path index.faiss --meta-path docs.pkl
  1. Query (Ollama HTTP API):
# Ensure Ollama server is reachable (example endpoint: http://localhost:11434)
python query.py --question "How to deploy kubernetes?" --index-path index.faiss --meta-path docs.pkl --ollama-model qwen3:4b --ollama-url http://localhost:11434

This script requests streaming output from the Ollama HTTP API and prints generated text chunks as they arrive. If the Ollama server is unreachable or returns an error, the script prints diagnostics and falls back to the top retrieved snippets.

Notes:

  • This is a minimal example. For production use, consider better chunking, persistence, and safety checks.
  • Installing sentence-transformers and faiss-cpu may require system packages on slim images.
S
Description
No description provided
Readme 30 KiB
Languages
Python 100%