mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
1fed840506
* docs: add AG2 multi-agent document analysis example
Add a Jupyter notebook demonstrating how to combine Docling document
conversion with AG2 multi-agent orchestration. A Document Processor
agent uses Docling tools to convert PDFs to markdown and extract tables,
while an Analyst agent synthesizes findings into a structured summary.
* DCO Remediation Commit for Faridun Mirzoev <faridun@ag2.ai>
I, Faridun Mirzoev <faridun@ag2.ai>, hereby add my Signed-off-by to this commit: e80e0f3375
Signed-off-by: Faridun Mirzoev <faridun@ag2.ai>
* docs: fix ruff PD901 lint — rename df to table_df
Signed-off-by: Faridun Mirzoev <faridun@ag2.ai>
---------
Signed-off-by: Faridun Mirzoev <faridun@ag2.ai>
282 lines
11 KiB
Plaintext
Vendored
282 lines
11 KiB
Plaintext
Vendored
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/ag2_multiagent_document_analysis.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Multi-Agent Document Analysis with AG2 and Docling\n",
|
||
"\n",
|
||
"| Step | Tech | Execution |\n",
|
||
"| --- | --- | --- |\n",
|
||
"| Document conversion | [Docling](https://docling-project.github.io/docling/) | 💻 Local |\n",
|
||
"| Multi-agent orchestration | [AG2](https://docs.ag2.ai/) | 🌐 Remote (LLM) |\n",
|
||
"\n",
|
||
"This example demonstrates how to combine **Docling** for document conversion with **AG2** for\n",
|
||
"multi-agent analysis. Docling converts PDF, DOCX, HTML, and other formats into structured\n",
|
||
"Markdown and tables. AG2 agents then collaborate to analyze the extracted content.\n",
|
||
"\n",
|
||
"The pipeline:\n",
|
||
"1. A **Document Processor** agent uses Docling tools to convert documents and extract tables.\n",
|
||
"2. An **Analyst** agent synthesizes the extracted content into a structured summary.\n",
|
||
"3. A **UserProxy** orchestrates the conversation via a GroupChat."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Setup\n",
|
||
"\n",
|
||
"- 👉 For best conversion speed, use GPU acceleration whenever available; e.g. if running on Colab, use GPU-enabled runtime.\n",
|
||
"- Requires an OpenAI API key set as the `OPENAI_API_KEY` environment variable.\n",
|
||
"- First run downloads ML models (~1–2 GB). Subsequent runs use cached models."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"%pip install -q --progress-bar off --no-warn-conflicts docling \"ag2[openai]>=0.11.4,<1.0\" pandas"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import json\n",
|
||
"import os\n",
|
||
"\n",
|
||
"from autogen import (\n",
|
||
" AssistantAgent,\n",
|
||
" GroupChat,\n",
|
||
" GroupChatManager,\n",
|
||
" LLMConfig,\n",
|
||
" UserProxyAgent,\n",
|
||
")\n",
|
||
"\n",
|
||
"from docling.datamodel.base_models import ConversionStatus\n",
|
||
"from docling.document_converter import DocumentConverter\n",
|
||
"\n",
|
||
"# Set your OpenAI API key (or configure via .env / Colab secrets)\n",
|
||
"# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Document Conversion with Docling\n",
|
||
"\n",
|
||
"First, let's convert a sample document and inspect the output. We use the\n",
|
||
"[Docling Technical Report](https://arxiv.org/pdf/2408.09869) as the demo document."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"DOC_SOURCE = \"https://arxiv.org/pdf/2408.09869\"\n",
|
||
"\n",
|
||
"converter = DocumentConverter()\n",
|
||
"result = converter.convert(DOC_SOURCE)\n",
|
||
"\n",
|
||
"print(f\"Status: {result.status}\")\n",
|
||
"print(f\"Pages: {len(list(result.document.pages))}\")\n",
|
||
"print()\n",
|
||
"\n",
|
||
"# Preview the first 2000 characters of extracted Markdown\n",
|
||
"markdown = result.document.export_to_markdown()\n",
|
||
"print(f\"Markdown length: {len(markdown):,} characters\")\n",
|
||
"print(\"---\")\n",
|
||
"print(markdown[:2000])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Table Extraction\n",
|
||
"\n",
|
||
"Docling automatically detects and extracts tables. Let's inspect them."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "tables = list(result.document.tables)\nprint(f\"Found {len(tables)} table(s)\")\n\nfor i, table in enumerate(tables):\n table_df = table.export_to_dataframe(doc=result.document)\n print(f\"\\n### Table {i + 1} (shape: {table_df.shape})\")\n print(table_df.to_markdown())"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## AG2 Multi-Agent Setup\n",
|
||
"\n",
|
||
"Now we set up AG2 agents that use Docling as their document processing backend.\n",
|
||
"\n",
|
||
"**Architecture:**\n",
|
||
"- `document_processor` — calls Docling tools to convert documents and extract tables\n",
|
||
"- `analyst` — analyzes the extracted content and produces a structured summary\n",
|
||
"- `user_proxy` — orchestrates the conversation, executes tool calls\n",
|
||
"\n",
|
||
"The agents communicate via a `GroupChat` managed by a `GroupChatManager`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"llm_config = LLMConfig(\n",
|
||
" {\n",
|
||
" \"model\": \"gpt-4o-mini\",\n",
|
||
" \"api_key\": os.environ.get(\"OPENAI_API_KEY\"),\n",
|
||
" \"api_type\": \"openai\",\n",
|
||
" }\n",
|
||
")\n",
|
||
"\n",
|
||
"MAX_CONTENT_CHARS = 15000 # Truncation limit to stay within LLM context\n",
|
||
"\n",
|
||
"\n",
|
||
"def is_termination_msg(msg):\n",
|
||
" content = msg.get(\"content\", \"\") or \"\"\n",
|
||
" return \"TERMINATE\" in content\n",
|
||
"\n",
|
||
"\n",
|
||
"proxy = UserProxyAgent(\n",
|
||
" name=\"user_proxy\",\n",
|
||
" human_input_mode=\"NEVER\",\n",
|
||
" max_consecutive_auto_reply=10,\n",
|
||
" code_execution_config=False,\n",
|
||
" is_termination_msg=is_termination_msg,\n",
|
||
")\n",
|
||
"\n",
|
||
"processor = AssistantAgent(\n",
|
||
" name=\"document_processor\",\n",
|
||
" system_message=(\n",
|
||
" \"You are a document processing agent. Use the convert_document tool to \"\n",
|
||
" \"extract text from a document, and extract_tables to get structured table \"\n",
|
||
" \"data. Always call convert_document first, then extract_tables if the user \"\n",
|
||
" \"asks about tables or data.\"\n",
|
||
" ),\n",
|
||
" llm_config=llm_config,\n",
|
||
")\n",
|
||
"\n",
|
||
"analyst = AssistantAgent(\n",
|
||
" name=\"analyst\",\n",
|
||
" system_message=(\n",
|
||
" \"You are a document analyst. Based on the content extracted by the \"\n",
|
||
" \"document_processor, provide a clear and structured analysis including:\\n\"\n",
|
||
" \"- A concise summary of the document\\n\"\n",
|
||
" \"- Key findings or contributions\\n\"\n",
|
||
" \"- Notable data from any tables\\n\\n\"\n",
|
||
" \"When your analysis is complete, end your message with TERMINATE.\"\n",
|
||
" ),\n",
|
||
" llm_config=llm_config,\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Tool Registration\n",
|
||
"\n",
|
||
"We register Docling operations as AG2 tools. The `converter` instance created earlier\n",
|
||
"is reused — `DocumentConverter` is stateless and thread-safe."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "@proxy.register_for_execution()\n@processor.register_for_llm(\n description=\"Convert a document (PDF, DOCX, HTML, or URL) to markdown text\"\n)\ndef convert_document(source: str) -> str:\n \"\"\"Convert a document to markdown using Docling.\"\"\"\n conv_result = converter.convert(source)\n if conv_result.status == ConversionStatus.FAILURE:\n return f\"Error: Document conversion failed for {source}\"\n md = conv_result.document.export_to_markdown()\n if len(md) > MAX_CONTENT_CHARS:\n return (\n md[:MAX_CONTENT_CHARS]\n + f\"\\n\\n[Truncated — showing first {MAX_CONTENT_CHARS:,} of {len(md):,} characters]\"\n )\n return md\n\n\n@proxy.register_for_execution()\n@processor.register_for_llm(\n description=\"Extract tables from a document as JSON. Returns a list of tables, each as a list of row records.\"\n)\ndef extract_tables(source: str) -> str:\n \"\"\"Extract tables from a document using Docling.\"\"\"\n conv_result = converter.convert(source)\n if conv_result.status == ConversionStatus.FAILURE:\n return f\"Error: Document conversion failed for {source}\"\n tables = list(conv_result.document.tables)\n if not tables:\n return \"No tables found in the document.\"\n table_data = []\n for i, table in enumerate(tables):\n table_df = table.export_to_dataframe(doc=conv_result.document)\n table_data.append(\n {\n \"table_index\": i + 1,\n \"rows\": table_df.shape[0],\n \"columns\": table_df.shape[1],\n \"data\": table_df.to_dict(orient=\"records\"),\n }\n )\n return json.dumps(table_data, indent=2)\n\n\nprint(f\"Tools registered on proxy: {list(proxy._function_map.keys())}\")"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Run the Multi-Agent Analysis\n",
|
||
"\n",
|
||
"The `user_proxy` sends a task to the group chat. The `document_processor` will use\n",
|
||
"Docling tools to extract content, and the `analyst` will synthesize the findings."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"group_chat = GroupChat(\n",
|
||
" agents=[proxy, processor, analyst],\n",
|
||
" messages=[],\n",
|
||
" max_round=10,\n",
|
||
")\n",
|
||
"\n",
|
||
"manager = GroupChatManager(\n",
|
||
" groupchat=group_chat,\n",
|
||
" llm_config=llm_config,\n",
|
||
" is_termination_msg=is_termination_msg,\n",
|
||
")\n",
|
||
"\n",
|
||
"result = proxy.run(\n",
|
||
" manager,\n",
|
||
" message=(\n",
|
||
" f\"Analyze the document at {DOC_SOURCE} — \"\n",
|
||
" \"summarize its key findings and extract any tables.\"\n",
|
||
" ),\n",
|
||
").process()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Further Reading\n",
|
||
"\n",
|
||
"- [Docling documentation](https://docling-project.github.io/docling/)\n",
|
||
"- [AG2 documentation](https://docs.ag2.ai/)\n",
|
||
"- [Docling examples](https://docling-project.github.io/docling/examples/)\n",
|
||
"- [AG2 tool use guide](https://docs.ag2.ai/user-guide/agentchat-user-guide/tutorial/tool-use)"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbformat_minor": 2,
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.0"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
} |