mirror of
https://github.com/docling-project/docling.git
synced 2026-05-17 13:10:38 +00:00
ce87c1f091
* Add instructions Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Add statement about tests Signed-off-by: Christoph Auer <cau@zurich.ibm.com> * Shorten rules Signed-off-by: Christoph Auer <cau@zurich.ibm.com> --------- Signed-off-by: Christoph Auer <cau@zurich.ibm.com>
57 lines
2.8 KiB
Markdown
57 lines
2.8 KiB
Markdown
# Docling
|
||
|
||
This file provides guidance to AI coding agents when working with code in this
|
||
repository.
|
||
|
||
## Project overview
|
||
|
||
Docling is a Python SDK and CLI for converting PDFs, Office files, HTML,
|
||
Markdown, audio, images, XML, and other formats into a unified
|
||
`DoclingDocument` representation for downstream AI workflows.
|
||
|
||
## Project structure
|
||
|
||
```text
|
||
docling/ # main Python package
|
||
packages/docling/ # full docling meta-package
|
||
packages/docling-slim/ # slim package readme
|
||
tests/ # pytest suite and test data
|
||
docs/ # MkDocs documentation and examples
|
||
scripts/ # project maintenance scripts
|
||
```
|
||
|
||
## Key commands
|
||
|
||
```bash
|
||
make setup # install CI-style dev environment
|
||
make test # run pytest
|
||
make check # run read-only local checks
|
||
make validate # run mutating hooks on the current changeset
|
||
```
|
||
|
||
## Code standards
|
||
|
||
- Keep public APIs typed and compatible with Python 3.10+.
|
||
- Use `uv add` or project-local dependency patterns when dependencies change.
|
||
- Add focused tests for behavior changes; regenerate reference data only when
|
||
conversion outputs intentionally change.
|
||
- Prefer structured models over loose dictionaries for durable schema-like data. Use Pydantic models or dataclasses when data crosses module boundaries, is serialized, or represents a stable contract. Exceptions may apply for internal datatypes and trivial data structures.
|
||
- Prefer `pathlib.Path` for path-handling code. Use `Path` operations instead of `os.path` in new or edited code unless an API explicitly requires string paths.
|
||
- Avoid `hasattr(...)`, broad `getattr(...)`, and similar attribute-probing patterns. These usually hide interface uncertainty. If such a check is genuinely required for compatibility with a documented third-party API, keep it narrowly scoped and explain in a comment.
|
||
- Do not add trivial or self-validating tests. Tests should verify meaningful application behavior, regressions, or integration boundaries, not restate assumptions about well-established library functionality or implementation details introduced only to validate the agent’s own code changes. Avoid mock-heavy tests unless mocking is the clearest way to exercise a real contract or failure mode.
|
||
|
||
## When making changes
|
||
|
||
1. Keep edits scoped and consistent with the surrounding module.
|
||
2. Update docs/examples when user-facing behavior changes.
|
||
3. Run targeted tests for touched behavior.
|
||
4. For reference output changes, use `DOCLING_GEN_TEST_DATA=1 uv run pytest`
|
||
and review generated data carefully.
|
||
|
||
## Before finishing
|
||
|
||
Run `make validate` before considering a task complete. If hooks modify files,
|
||
review the changes and rerun `make validate` until it passes cleanly. Also run
|
||
the affected tests for the files or behavior you changed. Use `make check` when
|
||
you need a read-only verification pass.
|