docs: add agent skill bundle for coding assistants (SKILL.md, pipelines, convert/evaluate) (#3174)

* docs: add agent skill bundle with convert/evaluate helpers - Add docs/examples/agent_skill/docling-document-intelligence/ with SKILL.md, pipelines.md, EXAMPLE.md, improvement-log template, and scripts/docling-convert.py + docling-evaluate.py (standard/vlm-local/vlm-api). - Document InputFormat.PDF + PdfFormatOption for explicit PdfPipelineOptions. - Link from examples index and mkdocs nav. Made-with: Cursor * docs: align agent skill README and EXAMPLE with Cursor bundle - Document both ~/.cursor/skills and docs/examples paths. - README notes repo parity for PRs and local installs. Made-with: Cursor * DCO Remediation Commit for jehlum11 <jehlum11@gmail.com> I, jehlum11 <jehlum11@gmail.com>, hereby add my Signed-off-by to this commit: 2d268ffb6f I, jehlum11 <jehlum11@gmail.com>, hereby add my Signed-off-by to this commit: 041e709c66 Signed-off-by: jehlum11 <jehlum11@gmail.com> Made-with: Cursor * docs: refactor agent skill to use docling CLI for conversion Address maintainer feedback: the custom docling-convert.py script was largely redundant with the existing docling CLI. This commit: - Removes scripts/docling-convert.py (redundant with `docling` CLI) - Refactors SKILL.md (v1.4 → v2.0) to use `docling` CLI for all conversion tasks, reserving the Python API only for features the CLI does not expose (chunking, VLM API endpoint config, force_backend_text hybrid mode) - Updates docling-evaluate.py recommended_actions to reference `docling` CLI flags instead of the removed script - Updates README.md, EXAMPLE.md, pipelines.md to use `docling` CLI examples throughout - Simplifies requirements.txt (removes packaging dependency) The only custom script retained is docling-evaluate.py, which provides heuristic quality evaluation — functionality the CLI does not cover. Signed-off-by: jehlum11 <jehlum11@gmail.com> Made-with: Cursor * docs: fix ruff format on docling-evaluate.py Signed-off-by: jehlum11 <jehlum11@gmail.com> Made-with: Cursor --------- Signed-off-by: jehlum11 <jehlum11@gmail.com>
2026-05-17 13:10:38 +00:00 · 2026-04-13 09:02:51 -04:00
parent 42157a3e10
commit c23622f6f5
9 changed files with 1109 additions and 0 deletions
@@ -0,0 +1,99 @@
+# Using the Docling agent skill
+
+[Agent Skills](https://agentskills.io/specification) are folders of instructions that AI coding agents (Cursor, Claude Code, GitHub Copilot, etc.) can load when relevant.
+
+## Where this bundle lives
+
+- **Cursor (local):** `~/.cursor/skills/docling-document-intelligence/` (or copy this folder there).
+- **Docling repository (docs + PRs):** `docs/examples/agent_skill/docling-document-intelligence/` in [github.com/docling-project/docling](https://github.com/docling-project/docling).
+
+The two trees are kept in sync; use either source.
+
+## Install (copy into your agent's skills directory)
+
+```bash
+# From a checkout of the Docling repo
+cp -r docs/examples/agent_skill/docling-document-intelligence ~/.cursor/skills/
+
+# Or copy from another machine / archive into e.g. ~/.claude/skills/
+```
+
+No extra config is required beyond installing Python dependencies (below).
+
+## Usage
+
+Open your agent-enabled IDE and ask, for example:
+
+```
+Parse report.pdf and give me a structural outline
+```
+
+```
+Convert https://arxiv.org/pdf/2408.09869 to markdown
+```
+
+```
+Chunk invoice.pdf for RAG ingestion with 512 token chunks
+```
+
+```
+Process scanned.pdf using the VLM pipeline
+```
+
+The agent should read `SKILL.md`, match the task, and run the appropriate
+`docling` CLI command or Python API call.
+
+## Running the docling CLI directly
+
+```bash
+pip install docling docling-core
+
+# Basic conversion to Markdown
+docling report.pdf --output /tmp/
+
+# JSON output
+docling report.pdf --to json --output /tmp/
+
+# Custom OCR engine
+docling report.pdf --ocr-engine rapidocr --output /tmp/
+
+# VLM pipeline
+docling scanned.pdf --pipeline vlm --output /tmp/
+
+# VLM with specific model
+docling scanned.pdf --pipeline vlm --vlm-model granite_docling --output /tmp/
+
+# Remote VLM services
+docling doc.pdf --pipeline vlm --enable-remote-services --output /tmp/
+```
+
+## Evaluate and refine
+
+```bash
+docling report.pdf --to json --output /tmp/
+docling report.pdf --to md --output /tmp/
+python3 scripts/docling-evaluate.py /tmp/report.json --markdown /tmp/report.md
+```
+
+If the report shows `warn` or `fail`, follow `recommended_actions`, re-convert
+with `docling` using the suggested flags, and optionally append a note to
+`improvement-log.md` (see `SKILL.md` section 7).
+
+## What the skill covers
+
+| Task | How to ask |
+|---|---|
+| Parse PDF / DOCX / PPTX / HTML / image | "parse this file" |
+| Convert to Markdown | "convert to markdown" |
+| Export as structured JSON | "export as JSON" |
+| Chunk for RAG | "chunk for RAG", "prepare for ingestion" |
+| Analyze structure | "show me the headings and tables" |
+| Use VLM pipeline | "use the VLM pipeline", "process scanned PDF" |
+| Use remote inference | "use vLLM", "call the API pipeline" |
+
+## Further reading
+
+- [Agent Skills specification](https://agentskills.io/specification)
+- [Docling documentation](https://docling-project.github.io/docling/)
+- [Docling CLI reference](https://docling-project.github.io/docling/reference/cli/)
+- [Docling GitHub](https://github.com/docling-project/docling)
@@ -0,0 +1,43 @@
+# Docling agent skill (Cursor & compatible assistants)
+
+This folder is an **[Agent Skill](https://agentskills.io/specification)**-style bundle for AI coding assistants: structured instructions (`SKILL.md`), a pipeline reference (`pipelines.md`), and a quality evaluator (`scripts/docling-evaluate.py`).
+
+Conversion is done via the **`docling` CLI** (included with `pip install docling`).
+The evaluator provides a **convert → evaluate → refine** feedback loop that the
+existing CLI does not cover.
+
+It complements the official [Docling documentation](https://docling-project.github.io/docling/) and the [`docling` CLI reference](https://docling-project.github.io/docling/reference/cli/).
+
+The same layout is published in the Docling repo at `docs/examples/agent_skill/docling-document-intelligence/` (for docs and PRs).
+
+## Contents
+
+| Path | Purpose |
+|------|---------|
+| [`SKILL.md`](SKILL.md) | Full skill instructions (pipelines, chunking, evaluation loop) |
+| [`pipelines.md`](pipelines.md) | Standard vs VLM pipelines, OCR engines, API notes |
+| [`EXAMPLE.md`](EXAMPLE.md) | Installing into `~/.cursor/skills/`; running the CLI and evaluator |
+| [`improvement-log.md`](improvement-log.md) | Optional template for local "what worked" notes |
+| [`scripts/docling-evaluate.py`](scripts/docling-evaluate.py) | Heuristic quality report on JSON (+ optional Markdown) |
+| [`scripts/requirements.txt`](scripts/requirements.txt) | Minimal pip deps for the evaluator |
+
+## Quick start
+
+```bash
+pip install docling docling-core
+
+# Convert to Markdown
+docling https://arxiv.org/pdf/2408.09869 --output /tmp/
+
+# Convert to JSON
+docling https://arxiv.org/pdf/2408.09869 --to json --output /tmp/
+
+# Evaluate quality
+python3 scripts/docling-evaluate.py /tmp/2408.09869.json --markdown /tmp/2408.09869.md
+```
+
+Use `--pipeline vlm` for vision-model pipelines; see `SKILL.md` and `pipelines.md`.
+
+## License
+
+MIT (aligned with [Docling](https://github.com/docling-project/docling)).
@@ -0,0 +1,393 @@
+---
+name: docling-document-intelligence
+description: >
+  Parse, convert, chunk, and analyze documents using Docling. Use this skill
+  when the user provides a document (PDF, DOCX, PPTX, HTML, image) as a file
+  path or URL and wants to: extract text or structured content, convert to
+  Markdown or JSON, chunk the document for RAG ingestion, analyze document
+  structure (headings, tables, figures, reading order), or run quality
+  evaluation with iterative pipeline tuning. Triggers: "parse this PDF",
+  "convert to markdown", "chunk for RAG", "extract tables", "analyze document
+  structure", "prepare for ingestion", "process document", "evaluate docling
+  output", "improve conversion quality".
+license: MIT
+compatibility: Requires Python 3.10+, docling>=2.81.0, docling-core>=2.67.1
+metadata:
+  author: docling-project
+  version: "2.0"
+  upstream: https://github.com/docling-project/docling
+allowed-tools: Bash(docling:*) Bash(python3:*) Bash(pip:*)
+---
+
+# Docling Document Intelligence Skill
+
+Use this skill to parse, convert, chunk, and analyze documents with Docling.
+It handles both local file paths and URLs, and outputs either Markdown or
+structured JSON (`DoclingDocument`).
+
+Conversion uses the **`docling` CLI** (installed with `pip install docling`).
+The Python API is used only for features the CLI does not expose (chunking,
+VLM remote-API endpoint configuration, hybrid `force_backend_text` mode).
+
+## Scope
+
+| Task | Covered |
+|---|---|
+| Parse PDF / DOCX / PPTX / HTML / image | ✅ |
+| Convert to Markdown | ✅ |
+| Export as DoclingDocument JSON | ✅ |
+| Chunk for RAG (hybrid: heading + token) | ✅ (Python API) |
+| Analyze structure (headings, tables, figures) | ✅ (Python API) |
+| OCR for scanned PDFs | ✅ (auto-enabled) |
+| Multi-source batch conversion | ✅ |
+
+## Step-by-Step Instructions
+
+### 1. Resolve the input
+
+Determine whether the user supplied a **local path** or a **URL**.
+The `docling` CLI accepts both directly.
+
+```bash
+docling path/to/file.pdf
+docling https://example.com/a.pdf
+```
+
+### 2. Choose a pipeline
+
+Docling has two pipeline families. Pick based on document type and hardware.
+
+| Pipeline | CLI flag | Best for | Key tradeoff |
+|---|---|---|---|
+| **Standard** (default) | `--pipeline standard` | Born-digital PDFs, speed | No GPU needed; OCR for scanned pages |
+| **VLM** | `--pipeline vlm` | Complex layouts, handwriting, formulas | Needs GPU; slower |
+
+See [pipelines.md](pipelines.md) for the full decision matrix, OCR engine table
+(EasyOCR, RapidOCR, Tesseract, macOS), and VLM model presets.
+
+### 3. Convert the document
+
+#### CLI (preferred for straightforward conversions)
+
+```bash
+# Markdown (default output)
+docling report.pdf --output /tmp/
+
+# JSON (structured, lossless)
+docling report.pdf --to json --output /tmp/
+
+# VLM pipeline
+docling report.pdf --pipeline vlm --output /tmp/
+
+# VLM with specific model
+docling report.pdf --pipeline vlm --vlm-model granite_docling --output /tmp/
+
+# Custom OCR engine
+docling report.pdf --ocr-engine tesserocr --output /tmp/
+
+# Disable OCR or tables for speed
+docling report.pdf --no-ocr --output /tmp/
+docling report.pdf --no-tables --output /tmp/
+
+# Remote VLM services
+docling report.pdf --pipeline vlm --enable-remote-services --output /tmp/
+```
+
+The CLI writes output files to the `--output` directory, named after the
+input file (e.g. `report.pdf` → `report.md` or `report.json`).
+
+**CLI reference:** <https://docling-project.github.io/docling/reference/cli/>
+
+#### Python API (for advanced features)
+
+Use the Python API when you need features the CLI does not expose:
+chunking, VLM remote-API endpoint configuration, or hybrid
+`force_backend_text` mode.
+
+**Docling 2.81+ API note:** `DocumentConverter(format_options=...)` expects
+`dict[InputFormat, FormatOption]` (e.g. `InputFormat.PDF` → `PdfFormatOption`).
+Using string keys like `{"pdf": PdfPipelineOptions(...)}` fails at runtime with
+`AttributeError: 'PdfPipelineOptions' object has no attribute 'backend'`.
+
+**Standard pipeline (default):**
+```python
+from docling.document_converter import DocumentConverter, PdfFormatOption
+from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import PdfPipelineOptions
+
+converter = DocumentConverter()
+result = converter.convert("report.pdf")
+
+converter = DocumentConverter(
+    format_options={
+        InputFormat.PDF: PdfFormatOption(
+            pipeline_options=PdfPipelineOptions(do_ocr=True, do_table_structure=True),
+        ),
+    }
+)
+result = converter.convert("report.pdf")
+```
+
+**VLM pipeline — local (GraniteDocling via HF Transformers):**
+```python
+from docling.document_converter import DocumentConverter, PdfFormatOption
+from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import VlmPipelineOptions
+from docling.datamodel import vlm_model_specs
+from docling.pipeline.vlm_pipeline import VlmPipeline
+
+pipeline_options = VlmPipelineOptions(
+    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,
+    generate_page_images=True,
+)
+converter = DocumentConverter(
+    format_options={
+        InputFormat.PDF: PdfFormatOption(
+            pipeline_cls=VlmPipeline,
+            pipeline_options=pipeline_options,
+        )
+    }
+)
+result = converter.convert("report.pdf")
+```
+
+**VLM pipeline — remote API (vLLM / LM Studio / Ollama):**
+
+This is only available via the Python API; the CLI does not expose endpoint
+URL, model name, or API key configuration.
+
+```python
+from docling.document_converter import DocumentConverter, PdfFormatOption
+from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import VlmPipelineOptions
+from docling.datamodel.pipeline_options_vlm_model import ApiVlmOptions, ResponseFormat
+from docling.pipeline.vlm_pipeline import VlmPipeline
+
+vlm_opts = ApiVlmOptions(
+    url="http://localhost:8000/v1/chat/completions",
+    params=dict(model="ibm-granite/granite-docling-258M", max_tokens=4096),
+    prompt="Convert this page to docling.",
+    response_format=ResponseFormat.DOCTAGS,
+    timeout=120,
+)
+pipeline_options = VlmPipelineOptions(
+    vlm_options=vlm_opts,
+    generate_page_images=True,
+    enable_remote_services=True,  # required — gates all outbound HTTP
+)
+converter = DocumentConverter(
+    format_options={
+        InputFormat.PDF: PdfFormatOption(
+            pipeline_cls=VlmPipeline,
+            pipeline_options=pipeline_options,
+        )
+    }
+)
+result = converter.convert("report.pdf")
+```
+
+**Hybrid mode (force_backend_text) — Python API only:**
+
+Uses deterministic PDF text extraction for text regions while routing
+images and tables through the VLM. Reduces hallucination on text-heavy pages.
+
+```python
+pipeline_options = VlmPipelineOptions(
+    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,
+    force_backend_text=True,
+    generate_page_images=True,
+)
+```
+
+`result.document` is a `DoclingDocument` object in all cases.
+
+### 4. Choose output format
+
+**Markdown** (default, human-readable):
+```bash
+docling report.pdf --to md --output /tmp/
+```
+Or via Python: `result.document.export_to_markdown()`
+
+**JSON / DoclingDocument** (structured, lossless):
+```bash
+docling report.pdf --to json --output /tmp/
+```
+Or via Python: `result.document.export_to_dict()`
+
+> If the user does not specify a format, ask: "Should I output Markdown or
+> structured JSON (DoclingDocument)?"
+
+### 5. Chunk for RAG (hybrid strategy)
+
+Chunking is only available via the Python API.
+
+Default: **hybrid chunker** — splits first by heading hierarchy, then
+subdivides oversized sections by token count. This preserves semantic
+boundaries while respecting model context limits.
+
+The tokenizer API changed in docling-core 2.8.0. Pass a `BaseTokenizer`
+object, not a raw string:
+
+**HuggingFace tokenizer (default):**
+```python
+from docling.chunking import HybridChunker
+from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer
+
+tokenizer = HuggingFaceTokenizer.from_pretrained(
+    model_name="sentence-transformers/all-MiniLM-L6-v2",
+    max_tokens=512,
+)
+chunker = HybridChunker(tokenizer=tokenizer, merge_peers=True)
+chunks = list(chunker.chunk(result.document))
+
+for chunk in chunks:
+    embed_text = chunker.contextualize(chunk)
+    print(chunk.meta.headings)        # heading breadcrumb list
+    print(chunk.meta.origin.page_no)  # source page number
+```
+
+**OpenAI tokenizer (for OpenAI embedding models):**
+```python
+import tiktoken
+from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer
+
+tokenizer = OpenAITokenizer(
+    tokenizer=tiktoken.encoding_for_model("text-embedding-3-small"),
+    max_tokens=8192,
+)
+# Requires: pip install 'docling-core[chunking-openai]'
+```
+
+For chunking strategies and tokenizer details, see the Docling documentation
+on chunking and `HybridChunker`.
+
+### 6. Analyze document structure
+
+Use the `DoclingDocument` object directly to inspect structure:
+
+```python
+doc = result.document
+
+for item, level in doc.iterate_items():
+    if hasattr(item, 'label') and item.label.name == 'SECTION_HEADER':
+        print(f"{'#' * level} {item.text}")
+
+for table in doc.tables:
+    print(table.export_to_dataframe())   # pandas DataFrame
+    print(table.export_to_markdown())
+
+for picture in doc.pictures:
+    print(picture.caption_text(doc))     # caption if present
+```
+
+For the full API surface, see Docling's structure and table export docs.
+
+### 7. Evaluate output and iterate (required for "best effort" conversions)
+
+After **every** conversion where the user cares about fidelity (not quick
+previews), run the bundled evaluator on the JSON export, then refine the
+pipeline if needed. This is how the agent **checks its work** and **improves
+the run** without guessing.
+
+**Step A — Produce JSON and optional Markdown**
+
+```bash
+docling "<source>" --to json --output /tmp/
+docling "<source>" --to md --output /tmp/
+```
+
+**Step B — Evaluate**
+
+```bash
+python3 scripts/docling-evaluate.py /tmp/<filename>.json --markdown /tmp/<filename>.md
+```
+
+If the user expects tables (invoices, spreadsheets in PDF), add
+`--expect-tables`. Tighten gates with `--fail-on-warn` in CI-style checks.
+
+The script prints a JSON report to stdout: `status` (`pass` | `warn` | `fail`),
+`metrics`, `issues`, and `recommended_actions` (concrete `docling` CLI
+flags to try next).
+
+**Step C — Refinement loop (max 3 attempts unless the user says otherwise)**
+
+1. If `status` is `warn` or `fail`, apply **one** primary change from
+   `recommended_actions` (e.g. switch `--pipeline vlm`, change
+   `--ocr-engine`, ensure tables are enabled).
+2. Re-convert with `docling`, re-run `scripts/docling-evaluate.py`.
+3. Stop when `status` is `pass`, or after 3 iterations — then summarize what
+   worked and any remaining issues for the user.
+
+**Step D — Self-improvement log (skill memory)**
+
+After a successful pass **or** after the final iteration, append one entry to
+[improvement-log.md](improvement-log.md) in this skill directory:
+
+- Source type (e.g. scanned PDF, digital PDF, DOCX)
+- First-run problems (from `issues`)
+- Pipeline + flags that fixed or best mitigated them
+- Final `status` and one line of subjective quality notes
+
+This log is optional for the user to git-ignore; it is for **local** learning
+so future runs on similar documents start closer to the right pipeline.
+
+### 8. Agent quality checklist (manual, if script unavailable)
+
+If `scripts/docling-evaluate.py` cannot run, still verify:
+
+| Check | Action if bad |
+|---|---|
+| Page count matches source (roughly) | Re-run; try `--pipeline vlm` if layout is complex |
+| Markdown is not near-empty | Enable OCR / VLM |
+| Tables missing when visually obvious | Remove `--no-tables`; try `--pipeline vlm` |
+| `\ufffd` replacement characters | Different `--ocr-engine` or `--pipeline vlm` |
+| Same line repeated many times | `--pipeline vlm` or hybrid `force_backend_text` (Python API) |
+
+## Common Edge Cases
+
+| Situation | Handling |
+|---|---|
+| Scanned / image-only PDF | Standard pipeline with OCR, or `--pipeline vlm` for best quality |
+| Password-protected PDF | `--pdf-password PASSWORD`; will raise `ConversionError` if wrong |
+| Very large document (500+ pages) | Standard pipeline with `--no-tables` for speed |
+| Complex layout / multi-column | `--pipeline vlm`; standard may misorder reading flow |
+| Handwriting or formulas | `--pipeline vlm` only — standard OCR will not handle these |
+| URL behind auth | Pre-download to temp file; pass local path |
+| Tables with merged cells | `table.export_to_markdown()` handles spans; VLM often more accurate |
+| Non-UTF-8 encoding | Docling normalises internally; no special handling needed |
+| VLM hallucinating text | `force_backend_text=True` via Python API for hybrid mode |
+| VLM API call blocked | `--enable-remote-services` (CLI) or `enable_remote_services=True` (Python) |
+| Apple Silicon | `--vlm-model granite_docling` with MLX backend, or `GRANITEDOCLING_MLX` preset (Python API) |
+
+## Pipeline reference
+
+Full decision matrix, all OCR engine options, VLM model presets, and API
+server configuration: [pipelines.md](pipelines.md)
+
+## Output conventions
+
+- Always report the number of pages and conversion status.
+- When evaluation is in scope, report evaluator `status`, top `issues`, and
+  which refinement attempt produced the final output.
+- For Markdown output: wrap in a fenced code block only if the user will copy/paste it; otherwise render directly.
+- For JSON output: pretty-print with `indent=2` unless the user specifies otherwise.
+- For chunks: report total chunk count, min/max/avg token counts.
+- For structure analysis: summarise heading tree + table count + figure count before going into detail.
+
+## Dependencies
+
+```bash
+pip install docling docling-core
+# For OpenAI tokenizer support:
+pip install 'docling-core[chunking-openai]'
+```
+
+The `docling` CLI is included with the `docling` package — no separate install needed.
+
+Check installed versions (prefer distribution metadata — `docling` may not set `__version__`):
+
+```python
+from importlib.metadata import version
+print(version("docling"), version("docling-core"))
+```
@@ -0,0 +1,20 @@
+# Docling agent skill — improvement log
+
+Agents may append a short entry after running **evaluate → refine** on a document
+so similar files are faster to process next time. This file is optional and is
+not tracked by every user; it is meant for **local** learning.
+
+## Template (copy for each entry)
+
+```markdown
+### YYYY-MM-DD — <short source label>
+- **Source type:** (e.g. scanned PDF / digital PDF / DOCX / URL)
+- **Issues (first run):** …
+- **Pipeline / flags that helped:** …
+- **Final evaluator status:** pass | warn | fail
+- **Notes:** …
+```
+
+## Entries
+
+_(None — add your own after running conversions.)_
@@ -0,0 +1,253 @@
+# Docling Pipelines Reference
+
+Docling has two pipeline families for PDFs: **standard** (parse + OCR + layout/tables)
+and **VLM** (page images through a vision-language model). The `docling` CLI
+exposes both via `--pipeline standard` (default) and `--pipeline vlm`.
+The right choice depends on document type, hardware, and latency budget.
+
+---
+
+## Decision matrix
+
+| Document type | Recommended pipeline | Reason |
+|---|---|---|
+| Born-digital PDF (text selectable) | Standard | Fast, accurate, no GPU needed |
+| Scanned PDF / image-only | Standard + OCR or VLM | Depends on quality |
+| Complex layout (multi-column, dense tables) | VLM | Better structural understanding |
+| Handwriting, formulas, figures with embedded text | VLM | Only viable option |
+| Air-gapped / no GPU | Standard | Runs on CPU |
+| Production scale, GPU server available | VLM (vLLM) | Best throughput |
+| Apple Silicon / local dev | VLM (MLX) | MPS acceleration |
+| Speed-critical, accuracy secondary | Standard, no tables | Fastest path |
+
+---
+
+## Pipeline 1: Standard PDF Pipeline
+
+Uses deterministic PDF parsing (docling-parse) + optional neural OCR + neural
+table structure detection.
+
+### CLI usage
+
+```bash
+# Default (standard pipeline, OCR + tables enabled)
+docling report.pdf --output /tmp/
+
+# Custom OCR engine
+docling report.pdf --ocr-engine tesserocr --output /tmp/
+
+# Disable OCR or tables
+docling report.pdf --no-ocr --output /tmp/
+docling report.pdf --no-tables --output /tmp/
+```
+
+### Python API
+
+```python
+from docling.document_converter import DocumentConverter, PdfFormatOption
+from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import PdfPipelineOptions
+
+# Minimal — library defaults (standard PDF pipeline)
+converter = DocumentConverter()
+
+# Explicit PdfPipelineOptions (docling 2.81+): use InputFormat.PDF + PdfFormatOption.
+# Do not use format_options={"pdf": opts}; that raises AttributeError on pipeline options.
+opts = PdfPipelineOptions(
+    do_ocr=True,                 # False = skip OCR entirely
+    do_table_structure=True,     # False = skip table detection (faster)
+)
+converter = DocumentConverter(
+    format_options={
+        InputFormat.PDF: PdfFormatOption(pipeline_options=opts),
+    }
+)
+```
+
+### OCR engine options
+
+All engines are plug-and-play via the CLI `--ocr-engine` flag or the Python
+`ocr_options` parameter. Default is EasyOCR.
+
+#### CLI flags
+
+| Engine | CLI flag | Notes |
+|--------|----------|-------|
+| EasyOCR | `--ocr-engine easyocr` (default) | No extra pip beyond docling defaults |
+| RapidOCR | `--ocr-engine rapidocr` | Lightweight; see Docling notes on read-only FS |
+| Tesseract (Python) | `--ocr-engine tesserocr` | Needs `pip install tesserocr` and system Tesseract |
+| Tesseract (CLI) | `--ocr-engine tesseract` | Shells out to `tesseract` binary |
+| macOS Vision | `--ocr-engine ocrmac` | macOS only |
+
+#### Python API
+
+```python
+# EasyOCR (default — no extra install needed)
+from docling.datamodel.pipeline_options import PdfPipelineOptions
+opts = PdfPipelineOptions(do_ocr=True)  # uses EasyOCR by default
+
+# Tesseract (requires system Tesseract + pip install tesserocr — see Docling install docs)
+from docling.datamodel.pipeline_options import TesseractOcrOptions
+opts = PdfPipelineOptions(do_ocr=True, ocr_options=TesseractOcrOptions())
+
+# RapidOCR (lightweight, no C deps)
+from docling.datamodel.pipeline_options import RapidOcrOptions
+opts = PdfPipelineOptions(do_ocr=True, ocr_options=RapidOcrOptions())
+
+# macOS native OCR
+from docling.datamodel.pipeline_options import OcrMacOptions
+opts = PdfPipelineOptions(do_ocr=True, ocr_options=OcrMacOptions())
+```
+
+---
+
+## Pipeline 2: VLM Pipeline — local inference
+
+Processes each page as an image through a vision-language model. Replaces the
+standard layout detection + OCR stack entirely.
+
+### CLI usage
+
+```bash
+# Default VLM model (granite_docling)
+docling report.pdf --pipeline vlm --output /tmp/
+
+# Specific model
+docling report.pdf --pipeline vlm --vlm-model smoldocling --output /tmp/
+```
+
+### Python API
+
+```python
+from docling.document_converter import DocumentConverter, PdfFormatOption
+from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import VlmPipelineOptions
+from docling.datamodel import vlm_model_specs
+from docling.pipeline.vlm_pipeline import VlmPipeline
+
+pipeline_options = VlmPipelineOptions(
+    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,
+    generate_page_images=True,
+)
+
+converter = DocumentConverter(
+    format_options={
+        InputFormat.PDF: PdfFormatOption(
+            pipeline_cls=VlmPipeline,
+            pipeline_options=pipeline_options,
+        )
+    }
+)
+```
+
+### Available model presets
+
+| CLI `--vlm-model` | Python preset (`vlm_model_specs`) | Backend | Device | Notes |
+|---|---|---|---|---|
+| `granite_docling` | `GRANITEDOCLING_TRANSFORMERS` | HF Transformers | CPU/GPU | Default |
+| `smoldocling` | `SMOLDOCLING_TRANSFORMERS` | HF Transformers | CPU/GPU | Lighter |
+| (Python API only) | `GRANITEDOCLING_VLLM` | vLLM | GPU | Fast batch |
+| (Python API only) | `GRANITEDOCLING_MLX` | MLX | Apple MPS | M-series Macs |
+
+### Hybrid mode: PDF text + VLM for images/tables
+
+Set `force_backend_text=True` (Python API only) to use deterministic text
+extraction for normal text regions while routing images and tables through the
+VLM. Reduces hallucination risk on text-heavy pages.
+
+```python
+pipeline_options = VlmPipelineOptions(
+    vlm_options=vlm_model_specs.GRANITEDOCLING_TRANSFORMERS,
+    force_backend_text=True,   # <-- hybrid mode
+    generate_page_images=True,
+)
+```
+
+---
+
+## Pipeline 3: VLM Pipeline — remote API
+
+Sends page images to any OpenAI-compatible endpoint. Works with vLLM,
+LM Studio, Ollama, or a hosted model API.
+
+This is available via the CLI with `--pipeline vlm --enable-remote-services`,
+but endpoint URL, model name, and API key configuration require the Python API.
+
+### CLI usage (basic)
+
+```bash
+docling report.pdf --pipeline vlm --enable-remote-services --output /tmp/
+```
+
+### Python API (full configuration)
+
+```python
+from docling.document_converter import DocumentConverter, PdfFormatOption
+from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import VlmPipelineOptions
+from docling.datamodel.pipeline_options_vlm_model import ApiVlmOptions, ResponseFormat
+from docling.pipeline.vlm_pipeline import VlmPipeline
+
+vlm_opts = ApiVlmOptions(
+    url="http://localhost:8000/v1/chat/completions",
+    params=dict(
+        model="ibm-granite/granite-docling-258M",
+        max_tokens=4096,
+    ),
+    headers={"Authorization": "Bearer YOUR_KEY"},  # omit if not needed
+    prompt="Convert this page to docling.",
+    response_format=ResponseFormat.DOCTAGS,
+    timeout=120,
+    scale=2.0,
+)
+
+pipeline_options = VlmPipelineOptions(
+    vlm_options=vlm_opts,
+    generate_page_images=True,
+    enable_remote_services=True,  # required — gates any HTTP call
+)
+
+converter = DocumentConverter(
+    format_options={
+        InputFormat.PDF: PdfFormatOption(
+            pipeline_cls=VlmPipeline,
+            pipeline_options=pipeline_options,
+        )
+    }
+)
+```
+
+**`enable_remote_services=True` is mandatory** for API pipelines. Docling
+blocks outbound HTTP by default as a safety measure.
+
+### Common API targets
+
+| Server | Default URL | Notes |
+|---|---|---|
+| vLLM | `http://localhost:8000/v1/chat/completions` | Best throughput |
+| LM Studio | `http://localhost:1234/v1/chat/completions` | Local dev |
+| Ollama | `http://localhost:11434/v1/chat/completions` | Model: `ibm/granite-docling:258m` |
+| OpenAI-compatible cloud | Provider URL | Set Authorization header |
+
+---
+
+## VLM install requirements
+
+Local inference requires PyTorch + Transformers:
+
+```bash
+pip install docling[vlm]
+# or manually:
+pip install torch transformers accelerate
+```
+
+MLX (Apple Silicon only):
+```bash
+pip install mlx mlx-lm
+```
+
+vLLM backend (server-side):
+```bash
+pip install vllm
+vllm serve ibm-granite/granite-docling-258M
+```
@@ -0,0 +1,296 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: MIT
+"""
+Evaluate a Docling JSON export and suggest pipeline / option changes.
+
+Typical flow (agent or human):
+
+  docling input.pdf --to json --output /tmp/
+  docling input.pdf --to md --output /tmp/
+  python3 scripts/docling-evaluate.py /tmp/input.json --markdown /tmp/input.md
+
+Exit codes: 0 = pass; 1 = fail or --fail-on-warn with status warn
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from collections import Counter
+from pathlib import Path
+from typing import Any
+
+
+def load_document(path: Path):
+    data = json.loads(path.read_text(encoding="utf-8"))
+    try:
+        from docling_core.types.doc.document import DoclingDocument
+
+        return DoclingDocument.model_validate(data), data
+    except Exception:
+        return None, data
+
+
+def page_numbers_from_doc(doc) -> set[int]:
+    pages: set[int] = set()
+    for item, _ in doc.iterate_items():
+        for prov in getattr(item, "prov", None) or []:
+            p = getattr(prov, "page_no", None)
+            if p is not None:
+                pages.add(int(p))
+    return pages
+
+
+def collect_text_samples(doc, limit: int = 200) -> list[str]:
+    texts: list[str] = []
+    for item, _ in doc.iterate_items():
+        t = getattr(item, "text", None)
+        if t and str(t).strip():
+            texts.append(str(t).strip())
+            if len(texts) >= limit:
+                break
+    return texts
+
+
+def metrics_from_doc(doc) -> dict[str, Any]:
+    n_tables = len(getattr(doc, "tables", []) or [])
+    n_pictures = len(getattr(doc, "pictures", []) or [])
+    n_headers = 0
+    n_text_items = 0
+    total_chars = 0
+    for item, _ in doc.iterate_items():
+        label = getattr(getattr(item, "label", None), "name", None) or ""
+        if label == "SECTION_HEADER":
+            n_headers += 1
+        t = getattr(item, "text", None)
+        if t:
+            n_text_items += 1
+            total_chars += len(str(t))
+
+    pages = page_numbers_from_doc(doc)
+    n_pages = len(pages) if pages else 0
+    density = (total_chars / n_pages) if n_pages else total_chars
+
+    samples = collect_text_samples(doc)
+    rep = Counter(samples)
+    top_rep = rep.most_common(1)[0] if rep else ("", 0)
+    dup_ratio = (
+        sum(c for _, c in rep.items() if c > 2) / max(len(rep), 1) if rep else 0.0
+    )
+
+    md = ""
+    try:
+        md = doc.export_to_markdown()
+    except Exception:
+        pass
+
+    replacement = md.count("\ufffd") + sum(str(t).count("\ufffd") for t in samples)
+
+    return {
+        "page_count": n_pages,
+        "section_headers": n_headers,
+        "text_items": n_text_items,
+        "total_text_chars": total_chars,
+        "chars_per_page": round(density, 2),
+        "tables": n_tables,
+        "pictures": n_pictures,
+        "markdown_chars": len(md),
+        "replacement_chars": replacement,
+        "most_repeated_text_count": int(top_rep[1]) if top_rep else 0,
+        "duplicate_heavy": dup_ratio > 0.15 and len(samples) > 10,
+    }
+
+
+def heuristic_metrics(data: dict) -> dict[str, Any]:
+    """Fallback when DoclingDocument cannot be validated (older export / drift)."""
+    texts = data.get("texts") or []
+    tables = data.get("tables") or []
+    body = data.get("body") or {}
+    children = body.get("children") if isinstance(body, dict) else None
+    n_children = len(children) if isinstance(children, list) else 0
+    char_sum = 0
+    for t in texts:
+        if isinstance(t, dict):
+            char_sum += len(str(t.get("text") or ""))
+    return {
+        "page_count": 0,
+        "section_headers": 0,
+        "text_items": len(texts),
+        "total_text_chars": char_sum,
+        "chars_per_page": 0.0,
+        "tables": len(tables),
+        "pictures": len(data.get("pictures") or []),
+        "markdown_chars": 0,
+        "replacement_chars": 0,
+        "most_repeated_text_count": 0,
+        "duplicate_heavy": False,
+        "heuristic_only": True,
+        "body_children": n_children,
+    }
+
+
+def evaluate(
+    m: dict[str, Any],
+    *,
+    expect_tables: bool,
+    min_chars_per_page: float,
+    min_markdown_chars: int,
+) -> tuple[str, list[str], list[str]]:
+    issues: list[str] = []
+    actions: list[str] = []
+
+    if m.get("heuristic_only"):
+        issues.append("Could not load full DoclingDocument; metrics are partial.")
+        actions.append(
+            "Ensure docling-core matches export; re-export with: docling <source> --to json --output <dir>"
+        )
+
+    cpp = m.get("chars_per_page") or 0
+    if m.get("page_count", 0) >= 2 and cpp < min_chars_per_page:
+        issues.append(
+            f"Low text density ({cpp} chars/page); likely scan, image-heavy PDF, or extraction gap."
+        )
+        actions.append(
+            "Retry: docling <source> --ocr-engine tesserocr (or rapidocr, ocrmac)"
+        )
+        actions.append("Retry: docling <source> --pipeline vlm")
+
+    if m.get("replacement_chars", 0) > 5:
+        issues.append(
+            "Unicode replacement characters detected; OCR may be garbling text."
+        )
+        actions.append("Retry: docling <source> --ocr-engine tesserocr (or rapidocr)")
+        actions.append(
+            "Retry: docling <source> --pipeline vlm (use force_backend_text=True via Python API for hybrid)"
+        )
+
+    if m.get("duplicate_heavy") or (m.get("most_repeated_text_count", 0) > 8):
+        issues.append(
+            "Repeated text blocks; possible layout/OCR loop or bad reading order."
+        )
+        actions.append("Retry: docling <source> --pipeline vlm")
+        actions.append(
+            "If using VLM: try force_backend_text=True via Python API for text-heavy pages"
+        )
+
+    if expect_tables and m.get("tables", 0) == 0:
+        issues.append("No tables detected but tables were expected.")
+        actions.append(
+            "Retry: docling <source> (tables are enabled by default; remove --no-tables if set)"
+        )
+        actions.append(
+            "Retry: docling <source> --pipeline vlm (better for merged-cell or visual tables)"
+        )
+
+    mc = m.get("markdown_chars", 0)
+    if mc > 0 and mc < min_markdown_chars and m.get("page_count", 0) >= 1:
+        issues.append(f"Markdown export is very short ({mc} chars) for the page count.")
+        actions.append(
+            "Retry: docling <source> --pipeline vlm (or try different --ocr-engine)"
+        )
+
+    if m.get("text_items", 0) == 0 and m.get("page_count", 0) == 0:
+        issues.append(
+            "No text items and no page provenance; export may be empty or invalid."
+        )
+        actions.append(
+            "Verify source file opens correctly; retry with: docling <source> --pipeline standard"
+        )
+
+    seen = set()
+    uniq_actions = []
+    for a in actions:
+        if a not in seen:
+            seen.add(a)
+            uniq_actions.append(a)
+
+    if not issues:
+        return "pass", [], []
+
+    severe = m.get("text_items", 0) == 0 or (
+        m.get("page_count", 0) >= 1 and mc < 50 and mc > 0
+    )
+    status = "fail" if severe or m.get("replacement_chars", 0) > 20 else "warn"
+    return status, issues, uniq_actions
+
+
+def parse_args():
+    p = argparse.ArgumentParser(description="Evaluate Docling JSON export quality")
+    p.add_argument(
+        "json_path", type=Path, help="Path to DoclingDocument JSON (export_to_dict)"
+    )
+    p.add_argument(
+        "--markdown",
+        type=Path,
+        default=None,
+        help="Optional markdown file to cross-check length",
+    )
+    p.add_argument("--expect-tables", action="store_true")
+    p.add_argument("--min-chars-per-page", type=float, default=120.0)
+    p.add_argument("--min-markdown-chars", type=int, default=200)
+    p.add_argument("--fail-on-warn", action="store_true")
+    p.add_argument(
+        "--quiet", action="store_true", help="Only print JSON report to stdout"
+    )
+    return p.parse_args()
+
+
+def main() -> None:
+    args = parse_args()
+    if not args.json_path.is_file():
+        print(json.dumps({"error": f"not found: {args.json_path}"}), file=sys.stderr)
+        sys.exit(1)
+
+    doc, raw = load_document(args.json_path)
+    if doc is not None:
+        m = metrics_from_doc(doc)
+    else:
+        m = heuristic_metrics(raw)
+
+    if args.markdown and args.markdown.is_file():
+        md_len = len(args.markdown.read_text(encoding="utf-8"))
+        m["markdown_file_chars"] = md_len
+        if m.get("markdown_chars", 0) == 0:
+            m["markdown_chars"] = md_len
+
+    status, issues, actions = evaluate(
+        m,
+        expect_tables=args.expect_tables,
+        min_chars_per_page=args.min_chars_per_page,
+        min_markdown_chars=args.min_markdown_chars,
+    )
+
+    report = {
+        "status": status,
+        "metrics": m,
+        "issues": issues,
+        "recommended_actions": actions,
+        "next_steps_for_agent": [
+            "Re-run docling with flags from recommended_actions.",
+            "Re-export JSON and run this script again until status is pass.",
+            "Append a row to improvement-log.md (see SKILL.md).",
+        ],
+    }
+
+    print(json.dumps(report, indent=2, ensure_ascii=False))
+    if not args.quiet:
+        print(f"\nstatus={status}", file=sys.stderr)
+        if issues:
+            print("issues:", file=sys.stderr)
+            for i in issues:
+                print(f"  - {i}", file=sys.stderr)
+        if actions:
+            print("recommended_actions:", file=sys.stderr)
+            for a in actions:
+                print(f"  - {a}", file=sys.stderr)
+
+    if status == "fail":
+        sys.exit(1)
+    if status == "warn" and args.fail_on_warn:
+        sys.exit(1)
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,3 @@
+# pip install -r scripts/requirements.txt
+docling>=2.81.0
+docling-core>=2.67.1
@@ -7,6 +7,7 @@ Here some of our picks to get you started:
 - 📤 [{==\[:fontawesome-solid-flask:{ title="beta feature" } beta\]==} structured data extraction](./extraction.ipynb)
 - examples for ✍️ [serialization](./serialization.ipynb) and ✂️ [chunking](./hybrid_chunking.ipynb), including [user-defined customizations](./advanced_chunking_and_serialization.ipynb)
 - 🖼️ [picture annotations](./pictures_description.ipynb) and [enrichments](./enrich_doclingdocument.py)
+- 🤝 [**Agent skill**](./agent_skill/docling-document-intelligence/README.md) for Cursor and other assistants (`SKILL.md`, pipeline reference, `docling-convert.py` / `docling-evaluate.py` helpers)

 👈 ... and there is much more: explore all the examples using the navigation menu on the side

@@ -80,6 +80,7 @@ nav:
    - Plugins: concepts/plugins.md
  - Examples:
    - Examples: examples/index.md
+    - "🤝 Agent skill (Cursor / assistants)": examples/agent_skill/docling-document-intelligence/README.md
    - 🔀 Conversion:
      - "Simple conversion": examples/minimal.py
      - "Custom conversion": examples/custom_convert.py