docling/docs/examples/ag2_multiagent_document_analysis.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/docling-project/docling/blob/main/docs/examples/ag2_multiagent_document_analysis.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-Agent Document Analysis with AG2 and Docling\n",
    "\n",
    "| Step | Tech | Execution |\n",
    "| --- | --- | --- |\n",
    "| Document conversion | [Docling](https://docling-project.github.io/docling/) | 💻 Local |\n",
    "| Multi-agent orchestration | [AG2](https://docs.ag2.ai/) | 🌐 Remote (LLM) |\n",
    "\n",
    "This example demonstrates how to combine **Docling** for document conversion with **AG2** for\n",
    "multi-agent analysis. Docling converts PDF, DOCX, HTML, and other formats into structured\n",
    "Markdown and tables. AG2 agents then collaborate to analyze the extracted content.\n",
    "\n",
    "The pipeline:\n",
    "1. A **Document Processor** agent uses Docling tools to convert documents and extract tables.\n",
    "2. An **Analyst** agent synthesizes the extracted content into a structured summary.\n",
    "3. A **UserProxy** orchestrates the conversation via a GroupChat."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "- 👉 For best conversion speed, use GPU acceleration whenever available; e.g. if running on Colab, use GPU-enabled runtime.\n",
    "- Requires an OpenAI API key set as the `OPENAI_API_KEY` environment variable.\n",
    "- First run downloads ML models (~1–2 GB). Subsequent runs use cached models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -q --progress-bar off --no-warn-conflicts docling \"ag2[openai]>=0.11.4,<1.0\" pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "\n",
    "from autogen import (\n",
    "    AssistantAgent,\n",
    "    GroupChat,\n",
    "    GroupChatManager,\n",
    "    LLMConfig,\n",
    "    UserProxyAgent,\n",
    ")\n",
    "\n",
    "from docling.datamodel.base_models import ConversionStatus\n",
    "from docling.document_converter import DocumentConverter\n",
    "\n",
    "# Set your OpenAI API key (or configure via .env / Colab secrets)\n",
    "# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Document Conversion with Docling\n",
    "\n",
    "First, let's convert a sample document and inspect the output. We use the\n",
    "[Docling Technical Report](https://arxiv.org/pdf/2408.09869) as the demo document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DOC_SOURCE = \"https://arxiv.org/pdf/2408.09869\"\n",
    "\n",
    "converter = DocumentConverter()\n",
    "result = converter.convert(DOC_SOURCE)\n",
    "\n",
    "print(f\"Status: {result.status}\")\n",
    "print(f\"Pages: {len(list(result.document.pages))}\")\n",
    "print()\n",
    "\n",
    "# Preview the first 2000 characters of extracted Markdown\n",
    "markdown = result.document.export_to_markdown()\n",
    "print(f\"Markdown length: {len(markdown):,} characters\")\n",
    "print(\"---\")\n",
    "print(markdown[:2000])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Table Extraction\n",
    "\n",
    "Docling automatically detects and extracts tables. Let's inspect them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "tables = list(result.document.tables)\nprint(f\"Found {len(tables)} table(s)\")\n\nfor i, table in enumerate(tables):\n    table_df = table.export_to_dataframe(doc=result.document)\n    print(f\"\\n### Table {i + 1} (shape: {table_df.shape})\")\n    print(table_df.to_markdown())"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## AG2 Multi-Agent Setup\n",
    "\n",
    "Now we set up AG2 agents that use Docling as their document processing backend.\n",
    "\n",
    "**Architecture:**\n",
    "- `document_processor` — calls Docling tools to convert documents and extract tables\n",
    "- `analyst` — analyzes the extracted content and produces a structured summary\n",
    "- `user_proxy` — orchestrates the conversation, executes tool calls\n",
    "\n",
    "The agents communicate via a `GroupChat` managed by a `GroupChatManager`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm_config = LLMConfig(\n",
    "    {\n",
    "        \"model\": \"gpt-4o-mini\",\n",
    "        \"api_key\": os.environ.get(\"OPENAI_API_KEY\"),\n",
    "        \"api_type\": \"openai\",\n",
    "    }\n",
    ")\n",
    "\n",
    "MAX_CONTENT_CHARS = 15000  # Truncation limit to stay within LLM context\n",
    "\n",
    "\n",
    "def is_termination_msg(msg):\n",
    "    content = msg.get(\"content\", \"\") or \"\"\n",
    "    return \"TERMINATE\" in content\n",
    "\n",
    "\n",
    "proxy = UserProxyAgent(\n",
    "    name=\"user_proxy\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    max_consecutive_auto_reply=10,\n",
    "    code_execution_config=False,\n",
    "    is_termination_msg=is_termination_msg,\n",
    ")\n",
    "\n",
    "processor = AssistantAgent(\n",
    "    name=\"document_processor\",\n",
    "    system_message=(\n",
    "        \"You are a document processing agent. Use the convert_document tool to \"\n",
    "        \"extract text from a document, and extract_tables to get structured table \"\n",
    "        \"data. Always call convert_document first, then extract_tables if the user \"\n",
    "        \"asks about tables or data.\"\n",
    "    ),\n",
    "    llm_config=llm_config,\n",
    ")\n",
    "\n",
    "analyst = AssistantAgent(\n",
    "    name=\"analyst\",\n",
    "    system_message=(\n",
    "        \"You are a document analyst. Based on the content extracted by the \"\n",
    "        \"document_processor, provide a clear and structured analysis including:\\n\"\n",
    "        \"- A concise summary of the document\\n\"\n",
    "        \"- Key findings or contributions\\n\"\n",
    "        \"- Notable data from any tables\\n\\n\"\n",
    "        \"When your analysis is complete, end your message with TERMINATE.\"\n",
    "    ),\n",
    "    llm_config=llm_config,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tool Registration\n",
    "\n",
    "We register Docling operations as AG2 tools. The `converter` instance created earlier\n",
    "is reused — `DocumentConverter` is stateless and thread-safe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "@proxy.register_for_execution()\n@processor.register_for_llm(\n    description=\"Convert a document (PDF, DOCX, HTML, or URL) to markdown text\"\n)\ndef convert_document(source: str) -> str:\n    \"\"\"Convert a document to markdown using Docling.\"\"\"\n    conv_result = converter.convert(source)\n    if conv_result.status == ConversionStatus.FAILURE:\n        return f\"Error: Document conversion failed for {source}\"\n    md = conv_result.document.export_to_markdown()\n    if len(md) > MAX_CONTENT_CHARS:\n        return (\n            md[:MAX_CONTENT_CHARS]\n            + f\"\\n\\n[Truncated — showing first {MAX_CONTENT_CHARS:,} of {len(md):,} characters]\"\n        )\n    return md\n\n\n@proxy.register_for_execution()\n@processor.register_for_llm(\n    description=\"Extract tables from a document as JSON. Returns a list of tables, each as a list of row records.\"\n)\ndef extract_tables(source: str) -> str:\n    \"\"\"Extract tables from a document using Docling.\"\"\"\n    conv_result = converter.convert(source)\n    if conv_result.status == ConversionStatus.FAILURE:\n        return f\"Error: Document conversion failed for {source}\"\n    tables = list(conv_result.document.tables)\n    if not tables:\n        return \"No tables found in the document.\"\n    table_data = []\n    for i, table in enumerate(tables):\n        table_df = table.export_to_dataframe(doc=conv_result.document)\n        table_data.append(\n            {\n                \"table_index\": i + 1,\n                \"rows\": table_df.shape[0],\n                \"columns\": table_df.shape[1],\n                \"data\": table_df.to_dict(orient=\"records\"),\n            }\n        )\n    return json.dumps(table_data, indent=2)\n\n\nprint(f\"Tools registered on proxy: {list(proxy._function_map.keys())}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the Multi-Agent Analysis\n",
    "\n",
    "The `user_proxy` sends a task to the group chat. The `document_processor` will use\n",
    "Docling tools to extract content, and the `analyst` will synthesize the findings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "group_chat = GroupChat(\n",
    "    agents=[proxy, processor, analyst],\n",
    "    messages=[],\n",
    "    max_round=10,\n",
    ")\n",
    "\n",
    "manager = GroupChatManager(\n",
    "    groupchat=group_chat,\n",
    "    llm_config=llm_config,\n",
    "    is_termination_msg=is_termination_msg,\n",
    ")\n",
    "\n",
    "result = proxy.run(\n",
    "    manager,\n",
    "    message=(\n",
    "        f\"Analyze the document at {DOC_SOURCE} — \"\n",
    "        \"summarize its key findings and extract any tables.\"\n",
    "    ),\n",
    ").process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further Reading\n",
    "\n",
    "- [Docling documentation](https://docling-project.github.io/docling/)\n",
    "- [AG2 documentation](https://docs.ag2.ai/)\n",
    "- [Docling examples](https://docling-project.github.io/docling/examples/)\n",
    "- [AG2 tool use guide](https://docs.ag2.ai/user-guide/agentchat-user-guide/tutorial/tool-use)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbformat_minor": 2,
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}