Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Installation and Setup
Configure Docker for Transcribe
The transcribe server embeds the llama.cpp binary directly in the Docker image. The AI models must be downloaded separately and mounted as a volume.
1. Create data directory and download models
mkdir -p ./data/models
chmod 755 ./data
wget -O ./data/models/Model-7.6B-Q4_K_M.gguf https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf/resolve/main/Model-7.6B-Q4_K_M.gguf
wget -O ./data/models/mmproj-model-f16.gguf https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf/resolve/main/mmproj-model-f16.gguf
2. Configure environment
- Copy
.env-transcribe-sampleto your Docker configuration directory. - Rename it to
.env-transcribe. - Set
API_KEYto a secure value.
3. Run the server
docker run --rm --env-file .env-transcribe -p 4567:4567 \
-v ./data:/data \
joplin/transcribe:amd64-latest
The container automatically creates the following inside /data:
images/- uploaded imagesmodels/- AI models (you provide these)queue.sqlite3- job queue database
Using Docker Compose
The minimal configuration is provided in .env-sample and docker-compose.server.yml.
-
Run
cp .env-sample .env -
Update any options you need in
.env -
Start the server:
docker compose -f docker-compose.server.yml --profile full up --detached
For advanced configuration, refer to .env-sample-transcribe.
Security
The transcribe container runs with these security measures:
- Non-root user: The application runs as the
transcribeuser, not root - Read-only filesystem: The container filesystem is read-only (only
/app/packages/transcribe/imagesand/tmpare writable) - Resource limits: Memory and CPU limits prevent runaway processes
- No Docker socket: Unlike previous versions, no Docker socket mount is required
Development Setup
Testing
Integration tests requiring the full model do not run by default (including on CI). Be cautious when modifying the model or prompts.
The disabled test is located at: workers/JobProcessor.test.ts.
Run all tests with:
yarn test-all
Starting the Server
From packages/transcribe, run:
yarn start
Environment variables
Required:
API_KEY: Authentication key for API requestsDATA_DIR: Base directory for all data (images, models, database)HTR_CLI_BINARY_PATH: Path to the llama-mtmd-cli binary
Optional:
QUEUE_DRIVER:sqlite(default in Docker) orpgfor PostgreSQL
The following paths are automatically derived from DATA_DIR:
$DATA_DIR/images- uploaded images$DATA_DIR/models- AI models$DATA_DIR/queue.sqlite3- SQLite database (when using sqlite driver)
API Endpoints
All requests must include the Authorization header with the value set to your API_KEY.
POST /transcribe
Creates a transcription job. The uploaded image is resized, stored on disk, and assigned to a job record in the database.
Request Body:
- Content-Type:
multipart/form-data - Field:
file(required) – the image file to process
Response:
{
"jobId": "bcd2e633-eb10-44cb-a280-bf723238c12e"
}
Example (cURL):
curl --request POST \
--url http://localhost:4567/transcribe \
--header 'Authorization: api-key' \
--header 'Content-Type: multipart/form-data' \
--form file=@/home/js/Pictures/2025-07-24_17-42_1.png
GET /transcribe/{jobId}
Fetches the result of a transcription job created with POST /transcribe.
Request:
- Requires a valid
jobId.
Example Responses:
{
"id": "57ebd2e2-b496-40ab-9008-5f861bcb7858",
"state": "created"
}
{
"id": "07f09553-f5e9-467e-b98d-406778e61969",
"state": "active"
}
{
"id": "57ebd2e2-b496-40ab-9008-5f861bcb7858",
"completedOn": "2025-06-11T18:20:22.000Z",
"output": {
"result": "markdown\r\n# Main title\r\n\r\nSome text here. This should take more than one line.\r\n\r\n## Sub title\r\n\r\n- One kind\r\n - of list\r\n - sub-item\r\n\r\n## Conclusion\r\n\r\nLet's finish here."
},
"state": "completed"
}
Example (cURL):
curl --request GET \
--url http://localhost:4567/transcribe/57ebd2e2-b496-40ab-9008-5f861bcb7858 \
--header 'Authorization: api-key'