Files
docling/docs/examples/ag2_multiagent_document_analysis.ipynb
Faridun Mirzoev 1fed840506 docs: add AG2 multi-agent document analysis example (#3261)
* docs: add AG2 multi-agent document analysis example

Add a Jupyter notebook demonstrating how to combine Docling document
conversion with AG2 multi-agent orchestration. A Document Processor
agent uses Docling tools to convert PDFs to markdown and extract tables,
while an Analyst agent synthesizes findings into a structured summary.

* DCO Remediation Commit for Faridun Mirzoev <faridun@ag2.ai>

I, Faridun Mirzoev <faridun@ag2.ai>, hereby add my Signed-off-by to this commit: e80e0f3375

Signed-off-by: Faridun Mirzoev <faridun@ag2.ai>

* docs: fix ruff PD901 lint — rename df to table_df

Signed-off-by: Faridun Mirzoev <faridun@ag2.ai>

---------

Signed-off-by: Faridun Mirzoev <faridun@ag2.ai>
2026-04-12 07:30:39 +02:00

282 lines
11 KiB
Plaintext
Vendored
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/ag2_multiagent_document_analysis.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi-Agent Document Analysis with AG2 and Docling\n",
"\n",
"| Step | Tech | Execution |\n",
"| --- | --- | --- |\n",
"| Document conversion | [Docling](https://docling-project.github.io/docling/) | 💻 Local |\n",
"| Multi-agent orchestration | [AG2](https://docs.ag2.ai/) | 🌐 Remote (LLM) |\n",
"\n",
"This example demonstrates how to combine **Docling** for document conversion with **AG2** for\n",
"multi-agent analysis. Docling converts PDF, DOCX, HTML, and other formats into structured\n",
"Markdown and tables. AG2 agents then collaborate to analyze the extracted content.\n",
"\n",
"The pipeline:\n",
"1. A **Document Processor** agent uses Docling tools to convert documents and extract tables.\n",
"2. An **Analyst** agent synthesizes the extracted content into a structured summary.\n",
"3. A **UserProxy** orchestrates the conversation via a GroupChat."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"- 👉 For best conversion speed, use GPU acceleration whenever available; e.g. if running on Colab, use GPU-enabled runtime.\n",
"- Requires an OpenAI API key set as the `OPENAI_API_KEY` environment variable.\n",
"- First run downloads ML models (~12 GB). Subsequent runs use cached models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -q --progress-bar off --no-warn-conflicts docling \"ag2[openai]>=0.11.4,<1.0\" pandas"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import os\n",
"\n",
"from autogen import (\n",
" AssistantAgent,\n",
" GroupChat,\n",
" GroupChatManager,\n",
" LLMConfig,\n",
" UserProxyAgent,\n",
")\n",
"\n",
"from docling.datamodel.base_models import ConversionStatus\n",
"from docling.document_converter import DocumentConverter\n",
"\n",
"# Set your OpenAI API key (or configure via .env / Colab secrets)\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Document Conversion with Docling\n",
"\n",
"First, let's convert a sample document and inspect the output. We use the\n",
"[Docling Technical Report](https://arxiv.org/pdf/2408.09869) as the demo document."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"DOC_SOURCE = \"https://arxiv.org/pdf/2408.09869\"\n",
"\n",
"converter = DocumentConverter()\n",
"result = converter.convert(DOC_SOURCE)\n",
"\n",
"print(f\"Status: {result.status}\")\n",
"print(f\"Pages: {len(list(result.document.pages))}\")\n",
"print()\n",
"\n",
"# Preview the first 2000 characters of extracted Markdown\n",
"markdown = result.document.export_to_markdown()\n",
"print(f\"Markdown length: {len(markdown):,} characters\")\n",
"print(\"---\")\n",
"print(markdown[:2000])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Table Extraction\n",
"\n",
"Docling automatically detects and extracts tables. Let's inspect them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "tables = list(result.document.tables)\nprint(f\"Found {len(tables)} table(s)\")\n\nfor i, table in enumerate(tables):\n table_df = table.export_to_dataframe(doc=result.document)\n print(f\"\\n### Table {i + 1} (shape: {table_df.shape})\")\n print(table_df.to_markdown())"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## AG2 Multi-Agent Setup\n",
"\n",
"Now we set up AG2 agents that use Docling as their document processing backend.\n",
"\n",
"**Architecture:**\n",
"- `document_processor` — calls Docling tools to convert documents and extract tables\n",
"- `analyst` — analyzes the extracted content and produces a structured summary\n",
"- `user_proxy` — orchestrates the conversation, executes tool calls\n",
"\n",
"The agents communicate via a `GroupChat` managed by a `GroupChatManager`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_config = LLMConfig(\n",
" {\n",
" \"model\": \"gpt-4o-mini\",\n",
" \"api_key\": os.environ.get(\"OPENAI_API_KEY\"),\n",
" \"api_type\": \"openai\",\n",
" }\n",
")\n",
"\n",
"MAX_CONTENT_CHARS = 15000 # Truncation limit to stay within LLM context\n",
"\n",
"\n",
"def is_termination_msg(msg):\n",
" content = msg.get(\"content\", \"\") or \"\"\n",
" return \"TERMINATE\" in content\n",
"\n",
"\n",
"proxy = UserProxyAgent(\n",
" name=\"user_proxy\",\n",
" human_input_mode=\"NEVER\",\n",
" max_consecutive_auto_reply=10,\n",
" code_execution_config=False,\n",
" is_termination_msg=is_termination_msg,\n",
")\n",
"\n",
"processor = AssistantAgent(\n",
" name=\"document_processor\",\n",
" system_message=(\n",
" \"You are a document processing agent. Use the convert_document tool to \"\n",
" \"extract text from a document, and extract_tables to get structured table \"\n",
" \"data. Always call convert_document first, then extract_tables if the user \"\n",
" \"asks about tables or data.\"\n",
" ),\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"analyst = AssistantAgent(\n",
" name=\"analyst\",\n",
" system_message=(\n",
" \"You are a document analyst. Based on the content extracted by the \"\n",
" \"document_processor, provide a clear and structured analysis including:\\n\"\n",
" \"- A concise summary of the document\\n\"\n",
" \"- Key findings or contributions\\n\"\n",
" \"- Notable data from any tables\\n\\n\"\n",
" \"When your analysis is complete, end your message with TERMINATE.\"\n",
" ),\n",
" llm_config=llm_config,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tool Registration\n",
"\n",
"We register Docling operations as AG2 tools. The `converter` instance created earlier\n",
"is reused — `DocumentConverter` is stateless and thread-safe."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "@proxy.register_for_execution()\n@processor.register_for_llm(\n description=\"Convert a document (PDF, DOCX, HTML, or URL) to markdown text\"\n)\ndef convert_document(source: str) -> str:\n \"\"\"Convert a document to markdown using Docling.\"\"\"\n conv_result = converter.convert(source)\n if conv_result.status == ConversionStatus.FAILURE:\n return f\"Error: Document conversion failed for {source}\"\n md = conv_result.document.export_to_markdown()\n if len(md) > MAX_CONTENT_CHARS:\n return (\n md[:MAX_CONTENT_CHARS]\n + f\"\\n\\n[Truncated — showing first {MAX_CONTENT_CHARS:,} of {len(md):,} characters]\"\n )\n return md\n\n\n@proxy.register_for_execution()\n@processor.register_for_llm(\n description=\"Extract tables from a document as JSON. Returns a list of tables, each as a list of row records.\"\n)\ndef extract_tables(source: str) -> str:\n \"\"\"Extract tables from a document using Docling.\"\"\"\n conv_result = converter.convert(source)\n if conv_result.status == ConversionStatus.FAILURE:\n return f\"Error: Document conversion failed for {source}\"\n tables = list(conv_result.document.tables)\n if not tables:\n return \"No tables found in the document.\"\n table_data = []\n for i, table in enumerate(tables):\n table_df = table.export_to_dataframe(doc=conv_result.document)\n table_data.append(\n {\n \"table_index\": i + 1,\n \"rows\": table_df.shape[0],\n \"columns\": table_df.shape[1],\n \"data\": table_df.to_dict(orient=\"records\"),\n }\n )\n return json.dumps(table_data, indent=2)\n\n\nprint(f\"Tools registered on proxy: {list(proxy._function_map.keys())}\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the Multi-Agent Analysis\n",
"\n",
"The `user_proxy` sends a task to the group chat. The `document_processor` will use\n",
"Docling tools to extract content, and the `analyst` will synthesize the findings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"group_chat = GroupChat(\n",
" agents=[proxy, processor, analyst],\n",
" messages=[],\n",
" max_round=10,\n",
")\n",
"\n",
"manager = GroupChatManager(\n",
" groupchat=group_chat,\n",
" llm_config=llm_config,\n",
" is_termination_msg=is_termination_msg,\n",
")\n",
"\n",
"result = proxy.run(\n",
" manager,\n",
" message=(\n",
" f\"Analyze the document at {DOC_SOURCE} — \"\n",
" \"summarize its key findings and extract any tables.\"\n",
" ),\n",
").process()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Further Reading\n",
"\n",
"- [Docling documentation](https://docling-project.github.io/docling/)\n",
"- [AG2 documentation](https://docs.ag2.ai/)\n",
"- [Docling examples](https://docling-project.github.io/docling/examples/)\n",
"- [AG2 tool use guide](https://docs.ag2.ai/user-guide/agentchat-user-guide/tutorial/tool-use)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbformat_minor": 2,
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}