Files
Michele Dolfi ed32c5e993 feat: Introduce modular docling-slim package (#3285)
* plans folder structure

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* initial plan

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* updated plan

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* restructure repo for docling and docling-slim

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* transpose package structures

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add all-packages

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* updated  lock and deps

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* align deps

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more lock like main

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* more locked pinning

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* rename extras

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add simple README for docling-slim

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix scikit-image issue

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add readme placeholder

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* add all extras in package test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* cli in docling-slim

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* apply formatting

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix testing package

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* override grpcio in no-header test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update lock

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update package description

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* updated extras

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* fix publish scripts

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

* update package test

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>

---------

Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
2026-04-24 15:14:57 +02:00
..

Docling Slim

Lightweight SDK for parsing documents with minimal dependencies and opt-in extras

Docling Slim is a minimal-dependency version of Docling that allows you to install only the components you need. It provides the core document processing functionality with ~50MB of base dependencies, and you can add specific features through optional extras.

When to Use Docling Slim

  • Use docling (recommended): If you want the full-featured experience with all standard capabilities
  • Use docling-slim: If you need fine-grained control over dependencies or want to minimize installation size

For Most Users: Use the Main Docling Package

We recommend most users install the full-featured docling package instead:

pip install docling

The docling package includes all standard features, the CLI tools, and is the easiest way to get started. Visit the main Docling documentation for complete guides and examples.

Installation

With Specific Features

# PDF support with local models
pip install docling-slim[format-pdf,models-local]

# Office formats only
pip install docling-slim[format-office]

# PDF + CLI
pip install docling-slim[format-pdf,cli]

# Docling service client for using the Docling Serve API
pip install docling-slim[service-client]

Available Extras

Convenience Bundles

Extra Description Use Case
standard All standard features (same as docling package) Full-featured usage
all All available extras Complete installation

CLI

Extra Description Use Case
cli Command-line interface (typer, rich) CLI tools (docling, docling-tools)

Core Components

Extra Description Use Case
convert-core Core conversion components (numpy, pillow, scipy) Basic document conversion
extract-core Structured information extraction Data extraction from documents

Format Support

PDF Formats

Extra Description Use Case
format-pdf PDF parsing (pypdfium2 + docling-parse) PDF documents
format-pdf-pypdfium2 PDF rendering only Lightweight PDF support
format-pdf-docling Advanced PDF parsing Complex PDF layouts

Office Formats (office = docx + pptx + xlsx)

Extra Description Use Case
format-office All Office formats Microsoft Office documents
format-docx Microsoft Word documents .docx files
format-pptx Microsoft PowerPoint .pptx files
format-xlsx Microsoft Excel .xlsx files

Web Formats (web = html + markdown)

Extra Description Use Case
format-web HTML and Markdown Web content
format-html HTML parsing Web pages and HTML files
format-markdown Markdown parsing .md files

Other Formats

Extra Description Use Case
format-latex LaTeX documents .tex files
format-xml-xbrl XBRL financial reports Financial documents
format-html-render HTML rendering with Playwright Dynamic web content
format-audio Audio transcription (Whisper) .wav, .mp3 files

OCR Engines

Extra Description Use Case
feat-ocr-rapidocr RapidOCR (lightweight) Fast OCR
feat-ocr-rapidocr-onnx RapidOCR with ONNX runtime Optimized OCR
feat-ocr-easyocr EasyOCR Multi-language OCR
feat-ocr-tesserocr Tesseract OCR High-accuracy OCR
feat-ocr-mac macOS native OCR macOS only

Models

Extra Description Use Case
models-local Local PyTorch models GPU/CPU inference
models-remote Remote model serving (Triton) Production deployments
models-onnxruntime ONNX Runtime acceleration Optimized inference
models-vlm-inline Vision Language Models Image understanding, inline processing

Other features

Extra Description Use Case
feat-chunking Document chunking RAG applications
service-client Docling service client Remote processing

License

MIT License - See LICENSE