SETUP.md: - element-admin access URL: port 8081 → 8091 - Replace incorrect "no authentication by default" note with correct description of MAS/OIDC auth and required env vars - netstat grep: port 8081 → 8091 PRODUCTION_DEPLOYMENT.md: - Backend port list: 8090 → 8083 (element-call) + add 8091 (element-admin) BRIDGE_SETUP_GUIDE.md: - doublepuppet url: "" → url: null (empty string causes Synapse transaction retry loops; null tells Synapse the appservice has no HTTP endpoint) - WhatsApp/Signal encryption examples: remove non-existent fields allow_key_sharing and self_sign; fix section nesting (top-level in megabridge format, not nested under bridge:) - Script description: rewrite to match what setup-bridges.sh actually does (hostname 0.0.0.0, file permissions, conditional Telegram) - Telegram: document TELEGRAM_API_ID/HASH requirement before running script - Future encryption section: remove outdated self_sign field, update note to reflect current status (MAS appservice login not yet implemented) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9.6 KiB
Production Deployment Guide
This guide covers deploying the Matrix stack to production with all critical fixes applied.
Overview
Two production architectures are supported:
-
Single-Machine Deployment (Recommended for most users)
- All services on one server
- Caddy handles SSL termination with Let's Encrypt
- Simpler setup and maintenance
-
Multi-Machine Deployment (Advanced)
- Caddy on separate SSL termination server
- Matrix services on application server
- Better for high-traffic scenarios
Critical Fixes Applied
All production deployments include fixes for:
- Issue #9: PostgreSQL data persistence across deployments
- Issue #10: DNS resolution and certificate trust (not needed in production with real DNS/certs)
- Authelia Optional: Can deploy with or without Authelia SSO
- Bridge Support: Automatic bridge directory mounting in Synapse
- MAS Healthcheck: Disabled for distroless image compatibility
Single-Machine Production Deployment
Prerequisites
- Ubuntu/Debian Linux server with public IP
- Domain name with DNS configured
- Ports 80, 443 open in firewall
- Docker and Docker Compose installed
DNS Configuration
Configure A/AAAA records for your domain:
matrix.yourdomain.com → Your server IP
element.yourdomain.com → Your server IP
auth.yourdomain.com → Your server IP
authelia.yourdomain.com → Your server IP (if using Authelia)
###Step-by-Step Deployment
1. Clone and Configure
cd /opt
git clone <your-repo> matrix-stack
cd matrix-stack
2. Run Deployment Script
sudo ./deploy.sh
Choose:
- Deployment type: Production
- Include Authelia: Yes or No (your choice)
Provide:
- Your domain (e.g.,
yourdomain.com) - Email for Let's Encrypt notifications
3. Review Generated Configs
The script creates:
.envwith all secretsdocker-compose.production.yml(already updated with fixes)caddy/Caddyfile.productionfor single-machine setup- All service configurations
4. Start Services
Without Authelia:
sudo docker compose -f docker-compose.production.yml up -d
With Authelia:
sudo docker compose -f docker-compose.production.yml --profile authelia up -d
5. Verify Services
# Check all services running
sudo docker compose -f docker-compose.production.yml ps
# Check logs
sudo docker compose -f docker-compose.production.yml logs
# Test endpoints
curl https://matrix.yourdomain.com/_matrix/client/versions
curl https://auth.yourdomain.com/.well-known/openid-configuration
6. Configure Bridges (Optional)
sudo ./setup-bridges.sh
This automatically configures Telegram, WhatsApp, and Signal bridges. Users will need to link their accounts by messaging the bridge bots in Element.
Certificate Management
Caddy automatically:
- Obtains Let's Encrypt certificates
- Renews certificates before expiry
- Handles HTTPS redirects
Certificates are stored in caddy/data/ and persist across restarts.
Firewall Configuration
Required ports:
# HTTP (Let's Encrypt validation)
sudo ufw allow 80/tcp
# HTTPS
sudo ufw allow 443/tcp
# Federation (if using Matrix federation)
sudo ufw allow 8448/tcp
Multi-Machine Production Deployment
Architecture
Internet
↓
Caddy Server (SSL termination)
↓
Matrix Server (Synapse, MAS, Element, Bridges)
↓
PostgreSQL
Caddy Server Setup
- Install Caddy on SSL termination server
- Copy generated
caddy/Caddyfile.productionto Caddy server - Update IP addresses in Caddyfile to point to Matrix server
- Start Caddy
Matrix Server Setup
- Run deployment script, choose production mode
- Use
docker-compose.production.ymlbut remove Caddy service - Expose ports 8008, 8080, 8083, 8091 to Caddy server only (firewall)
- Start services
Data Persistence and Backups
Important Directories
postgres/data/ # PostgreSQL database (CRITICAL)
synapse/data/ # Synapse state and media
mas/data/ # MAS sessions and state
bridges/*/config/ # Bridge configurations and sessions
caddy/data/ # SSL certificates
Backup Strategy
Daily backups:
#!/bin/bash
# backup-matrix.sh
BACKUP_DIR="/backup/matrix-$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Stop services
cd /opt/matrix-stack
docker compose -f docker-compose.production.yml stop
# Backup data directories
tar -czf "$BACKUP_DIR/postgres.tar.gz" postgres/data/
tar -czf "$BACKUP_DIR/synapse.tar.gz" synapse/data/
tar -czf "$BACKUP_DIR/mas.tar.gz" mas/data/
tar -czf "$BACKUP_DIR/bridges.tar.gz" bridges/
cp .env "$BACKUP_DIR/"
# Restart services
docker compose -f docker-compose.production.yml up -d
PostgreSQL dumps:
# Dump database (can be done while running)
docker exec matrix-postgres pg_dumpall -U synapse > backup-$(date +%Y%m%d).sql
Monitoring and Maintenance
Health Checks
# Check service status
docker compose -f docker-compose.production.yml ps
# Check resource usage
docker stats
# Check logs for errors
docker compose -f docker-compose.production.yml logs --tail=100 | grep -i error
Log Rotation
Configure Docker log rotation in /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Certificate Renewal
Caddy handles automatic renewal. Monitor logs:
docker compose -f docker-compose.production.yml logs caddy | grep -i "renew\|cert"
Troubleshooting
Issue: Services Won't Start After Update
Cause: PostgreSQL data directory password mismatch (Issue #9)
Solution:
# Deploy script now detects this automatically
# If you encounter it manually:
docker compose stop
sudo rm -rf postgres/data/
# Re-run deploy.sh or manually recreate with correct password
Issue: Element Can't Connect to Homeserver
Cause: CORS or MAS delegation not configured
Solution:
# Check Synapse delegates to MAS
curl https://matrix.yourdomain.com/.well-known/matrix/client
# Should return:
# {"m.homeserver":{"base_url":"https://matrix.yourdomain.com"},
# "m.authentication":{"issuer":"https://auth.yourdomain.com/"}}
Issue: Bridges Keep Restarting
Cause: Configuration not complete or registration not loaded
Solution:
# Run bridge setup script
sudo ./setup-bridges.sh
# Check bridge logs
docker compose -f docker-compose.production.yml logs mautrix-whatsapp
# Ensure Synapse has bridge registrations mounted
grep "app_service_config_files" synapse/data/homeserver.yaml
Issue: Let's Encrypt Certificate Fails
Causes:
- DNS not propagated yet
- Port 80 blocked
- Rate limit hit
Solutions:
# Check DNS
dig matrix.yourdomain.com
# Check port 80 accessible
curl http://matrix.yourdomain.com
# Check Caddy logs
docker compose logs caddy
# If rate limited, use staging:
# Edit Caddyfile, add to global options:
# acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
Security Hardening
1. Enable Authelia 2FA
If using Authelia, enable two-factor authentication in authelia/config/configuration.yml:
default_2fa_method: "totp"
access_control:
default_policy: two_factor # Require 2FA for everything
2. Restrict Admin API
Caddy admin API should not be publicly accessible:
# Use firewall to restrict to localhost only
sudo ufw deny 2019/tcp
Or in Caddyfile:
admin localhost:2019
3. Regular Updates
# Update Docker images
cd /opt/matrix-stack
docker compose -f docker-compose.production.yml pull
docker compose -f docker-compose.production.yml up -d
4. Review Permissions
# In MAS config (mas/config/config.yaml)
policy:
registration:
enabled: false # Disable open registration in production
require_email: true
5. PostgreSQL Security
# Restrict PostgreSQL to internal network only
# Already configured - exposed only to Docker network
Upgrading
Minor Updates (Docker Images)
cd /opt/matrix-stack
docker compose -f docker-compose.production.yml pull
docker compose -f docker-compose.production.yml up -d
Major Updates (Configuration Changes)
- Backup everything first
- Pull latest deployment scripts
- Review BUGFIXES.md for any new issues
- Test in local deployment first
- Apply to production during maintenance window
Performance Tuning
PostgreSQL
Edit postgres/init/postgres-config.sql:
-- For production server with 8GB RAM:
ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET maintenance_work_mem = '512MB';
ALTER SYSTEM SET work_mem = '32MB';
Synapse
Edit synapse/data/homeserver.yaml:
# Increase cache sizes for production
caches:
global_factor: 2.0 # Double default cache sizes
# Enable media retention
media_retention:
local_media_lifetime: 90d
remote_media_lifetime: 14d
Federation
To enable federation with other Matrix servers:
- Ensure port 8448 is open
- Configure federation in Synapse (
synapse/data/homeserver.yaml):federation: enabled: true - Verify
.well-known/matrix/serverserves correct federation endpoint
Test federation:
curl https://matrix.yourdomain.com/_matrix/federation/v1/version
Support and Resources
-
Matrix Synapse docs: https://element-hq.github.io/synapse/
-
MAS docs: https://element-hq.github.io/matrix-authentication-service/
-
Authelia docs: https://www.authelia.com/
-
Caddy docs: https://caddyserver.com/docs/
-
Project BUGFIXES.md: Documents all critical undocumented issues
-
QUICK_REFERENCE.md: Common operations and commands