Files
docling/AGENTS.md
Christoph Auer ce87c1f091 chore: Add Agents.md instructions (#3426)
* Add instructions

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Add statement about tests

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

* Shorten rules

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>

---------

Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
2026-05-11 15:14:12 +02:00

57 lines
2.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Docling
This file provides guidance to AI coding agents when working with code in this
repository.
## Project overview
Docling is a Python SDK and CLI for converting PDFs, Office files, HTML,
Markdown, audio, images, XML, and other formats into a unified
`DoclingDocument` representation for downstream AI workflows.
## Project structure
```text
docling/ # main Python package
packages/docling/ # full docling meta-package
packages/docling-slim/ # slim package readme
tests/ # pytest suite and test data
docs/ # MkDocs documentation and examples
scripts/ # project maintenance scripts
```
## Key commands
```bash
make setup # install CI-style dev environment
make test # run pytest
make check # run read-only local checks
make validate # run mutating hooks on the current changeset
```
## Code standards
- Keep public APIs typed and compatible with Python 3.10+.
- Use `uv add` or project-local dependency patterns when dependencies change.
- Add focused tests for behavior changes; regenerate reference data only when
conversion outputs intentionally change.
- Prefer structured models over loose dictionaries for durable schema-like data. Use Pydantic models or dataclasses when data crosses module boundaries, is serialized, or represents a stable contract. Exceptions may apply for internal datatypes and trivial data structures.
- Prefer `pathlib.Path` for path-handling code. Use `Path` operations instead of `os.path` in new or edited code unless an API explicitly requires string paths.
- Avoid `hasattr(...)`, broad `getattr(...)`, and similar attribute-probing patterns. These usually hide interface uncertainty. If such a check is genuinely required for compatibility with a documented third-party API, keep it narrowly scoped and explain in a comment.
- Do not add trivial or self-validating tests. Tests should verify meaningful application behavior, regressions, or integration boundaries, not restate assumptions about well-established library functionality or implementation details introduced only to validate the agents own code changes. Avoid mock-heavy tests unless mocking is the clearest way to exercise a real contract or failure mode.
## When making changes
1. Keep edits scoped and consistent with the surrounding module.
2. Update docs/examples when user-facing behavior changes.
3. Run targeted tests for touched behavior.
4. For reference output changes, use `DOCLING_GEN_TEST_DATA=1 uv run pytest`
and review generated data carefully.
## Before finishing
Run `make validate` before considering a task complete. If hooks modify files,
review the changes and rerun `make validate` until it passes cleanly. Also run
the affected tests for the files or behavior you changed. Use `make check` when
you need a read-only verification pass.