mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
c23622f6f5
* docs: add agent skill bundle with convert/evaluate helpers - Add docs/examples/agent_skill/docling-document-intelligence/ with SKILL.md, pipelines.md, EXAMPLE.md, improvement-log template, and scripts/docling-convert.py + docling-evaluate.py (standard/vlm-local/vlm-api). - Document InputFormat.PDF + PdfFormatOption for explicit PdfPipelineOptions. - Link from examples index and mkdocs nav. Made-with: Cursor * docs: align agent skill README and EXAMPLE with Cursor bundle - Document both ~/.cursor/skills and docs/examples paths. - README notes repo parity for PRs and local installs. Made-with: Cursor * DCO Remediation Commit for jehlum11 <jehlum11@gmail.com> I, jehlum11 <jehlum11@gmail.com>, hereby add my Signed-off-by to this commit:2d268ffb6fI, jehlum11 <jehlum11@gmail.com>, hereby add my Signed-off-by to this commit:041e709c66Signed-off-by: jehlum11 <jehlum11@gmail.com> Made-with: Cursor * docs: refactor agent skill to use docling CLI for conversion Address maintainer feedback: the custom docling-convert.py script was largely redundant with the existing docling CLI. This commit: - Removes scripts/docling-convert.py (redundant with `docling` CLI) - Refactors SKILL.md (v1.4 → v2.0) to use `docling` CLI for all conversion tasks, reserving the Python API only for features the CLI does not expose (chunking, VLM API endpoint config, force_backend_text hybrid mode) - Updates docling-evaluate.py recommended_actions to reference `docling` CLI flags instead of the removed script - Updates README.md, EXAMPLE.md, pipelines.md to use `docling` CLI examples throughout - Simplifies requirements.txt (removes packaging dependency) The only custom script retained is docling-evaluate.py, which provides heuristic quality evaluation — functionality the CLI does not cover. Signed-off-by: jehlum11 <jehlum11@gmail.com> Made-with: Cursor * docs: fix ruff format on docling-evaluate.py Signed-off-by: jehlum11 <jehlum11@gmail.com> Made-with: Cursor --------- Signed-off-by: jehlum11 <jehlum11@gmail.com>
3.0 KiB
Vendored
3.0 KiB
Vendored
Using the Docling agent skill
Agent Skills are folders of instructions that AI coding agents (Cursor, Claude Code, GitHub Copilot, etc.) can load when relevant.
Where this bundle lives
- Cursor (local):
~/.cursor/skills/docling-document-intelligence/(or copy this folder there). - Docling repository (docs + PRs):
docs/examples/agent_skill/docling-document-intelligence/in github.com/docling-project/docling.
The two trees are kept in sync; use either source.
Install (copy into your agent's skills directory)
# From a checkout of the Docling repo
cp -r docs/examples/agent_skill/docling-document-intelligence ~/.cursor/skills/
# Or copy from another machine / archive into e.g. ~/.claude/skills/
No extra config is required beyond installing Python dependencies (below).
Usage
Open your agent-enabled IDE and ask, for example:
Parse report.pdf and give me a structural outline
Convert https://arxiv.org/pdf/2408.09869 to markdown
Chunk invoice.pdf for RAG ingestion with 512 token chunks
Process scanned.pdf using the VLM pipeline
The agent should read SKILL.md, match the task, and run the appropriate
docling CLI command or Python API call.
Running the docling CLI directly
pip install docling docling-core
# Basic conversion to Markdown
docling report.pdf --output /tmp/
# JSON output
docling report.pdf --to json --output /tmp/
# Custom OCR engine
docling report.pdf --ocr-engine rapidocr --output /tmp/
# VLM pipeline
docling scanned.pdf --pipeline vlm --output /tmp/
# VLM with specific model
docling scanned.pdf --pipeline vlm --vlm-model granite_docling --output /tmp/
# Remote VLM services
docling doc.pdf --pipeline vlm --enable-remote-services --output /tmp/
Evaluate and refine
docling report.pdf --to json --output /tmp/
docling report.pdf --to md --output /tmp/
python3 scripts/docling-evaluate.py /tmp/report.json --markdown /tmp/report.md
If the report shows warn or fail, follow recommended_actions, re-convert
with docling using the suggested flags, and optionally append a note to
improvement-log.md (see SKILL.md section 7).
What the skill covers
| Task | How to ask |
|---|---|
| Parse PDF / DOCX / PPTX / HTML / image | "parse this file" |
| Convert to Markdown | "convert to markdown" |
| Export as structured JSON | "export as JSON" |
| Chunk for RAG | "chunk for RAG", "prepare for ingestion" |
| Analyze structure | "show me the headings and tables" |
| Use VLM pipeline | "use the VLM pipeline", "process scanned PDF" |
| Use remote inference | "use vLLM", "call the API pipeline" |