## Summary Extends the existing Linux-only `--udp-recvmmsg` flag from the UDP listener socket to also cover **connected per-session UDP relay sockets**, so steady-state client→relay and peer→relay traffic on plain UDP is read in batches of up to 16 datagrams per `recvmmsg(2)` instead of one `recvmsg` per packet. DTLS sessions still go through the SSL read path and are unchanged. The flag stays **opt-in**: receive-side batching works correctly, but on the current `m=1` / `m=100` benchmarks throughput is flat to slightly negative — the bottleneck has moved past receive (see results below). ## What's in the change - **Shared receive helpers** (`src/apps/relay/ns_ioalib_engine_impl.c`, `src/apps/relay/ns_ioalib_impl.h`): - `ioa_parse_udp_recvmsg_cmsg()` — single TTL/TOS/`IP_RECVERR` cmsg parser used by both `udp_recvfrom()` and the new batch path. Replaces the duplicated parser previously inlined in `dtls_listener.c` and `udp_recvfrom()`. - `ioa_init_recvmmsg_hdr()` — single initializer for `mmsghdr`/`iovec`/cmsg/source-address fields, also used by the listener. - New `IOA_UDP_RECVMMSG_MAX_BATCH = 16` constant; both listener and relay paths now share it. - **Connected relay batch read** (`socket_udp_read_batch_recvmmsg` in `ns_ioalib_engine_impl.c`): called from `socket_input_worker` for non-SSL UDP sockets when `--udp-recvmmsg` is on. Allocates per-message `stun_buffer_list_elem`s, calls `recvmmsg(MSG_DONTWAIT)`, dispatches each datagram through the existing `read_cb` path, and falls back cleanly on `ENOSYS`/`EINVAL`/`EOPNOTSUPP` (auto-disables the flag) and on `EAGAIN`/short-batch (releases unused buffers). - **Per-engine scratch state**: the `mmsghdr[16]` / `iovec[16]` / cmsg / src-addr arrays live on `ioa_engine`, not on every socket — keeps memory flat at thousands of allocations. - **TTL/TOS-sized cmsg buffers** in the listener: the listener previously over-allocated `64 KiB` per slot; it now uses the same TTL+TOS sizing as the relay path. - **Opt-in occupancy stats** behind a new `--udp-recvmmsg-log` flag: every 10 s the relay logs `udp-recvmmsg stats: calls=… packets=… avg_batch=… wouldblock=… unavailable=… no_buffer=… hist_1=… hist_2=… hist_3_4=… hist_5_8=… hist_9_16=…`. Counters are always tracked (cheap); the periodic log is gated by the new flag so default operation is silent. - **CLI plumbing**: `--udp-recvmmsg-log` long option in `mainrelay.c`/`mainrelay.h`, `cli_print_flag` entry in `turn_admin_server.c`, doc updates in `README.turnserver`. - **Docs**: `docs/PerformanceIterationLog.md` records the iteration steps, validation, and two rounds of DigitalOcean A/B numbers. `CLAUDE.md` load-test instructions updated to mention the new flag and the `tot_recv_msgs` / `tot_recv_bytes` workaround.
13 KiB
AGENT.md — Coturn
Coturn is a TURN/STUN server written in C11, implementing RFC 5766, RFC 5389, and related NAT traversal protocols. It supports multiple database backends (SQLite, PostgreSQL, MySQL, Redis, MongoDB), multiple auth mechanisms, and a fuzzing harness for OSS-Fuzz.
Build
# Standard build
mkdir build && cd build
cmake ..
make -j$(nproc)
# Debug build
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j$(nproc)
# Fuzzer build (requires clang or AppleClang)
CC=clang CXX=clang++ cmake -S . -B build -DFUZZER=ON
cmake --build build -j$(nproc)
Key CMake options:
-DFUZZER=ON— build OSS-Fuzz targets (requires Clang or AppleClang)-DCMAKE_BUILD_TYPE=Debug|Release-DWITH_MYSQL=ON/OFF,-DWITH_PGSQL=ON/OFF,-DWITH_MONGO=ON/OFF,-DWITH_REDIS=ON/OFF
Required validation
Every code change must be validated before it is considered complete. Run the unit tests, the local/system tests, and the fuzzing smoke tests. When working from macOS, also validate the Linux build and Docker image tests inside Docker containers.
# Local build + unit tests
cmake -S . -B build -DBUILD_TESTING=ON
cmake --build build --parallel $(nproc)
ctest --test-dir build --output-on-failure
# Local/system tests
cd examples
./run_tests.sh
./run_tests_conf.sh
./run_tests_prom.sh # only when Prometheus support is built
cd ..
# Fuzzing smoke tests (increase runs/time for fuzzing-related changes)
fuzzing/run-local.sh ASan 0 -runs=1
fuzzing/run-local.sh ASan 1 -runs=1
Treat any FAIL line from the example scripts as a test failure even if the
script exits with status 0.
On macOS, run the Linux validation in Docker as well. The fuzzing smoke tests
build the reusable coturn-fuzz-local image; then run a clean Linux build and
system-test pass against a copied checkout so no root-owned build artifacts are
left in the working tree:
docker run --rm \
-v "$PWD:/src:ro" \
--entrypoint bash \
coturn-fuzz-local \
-lc 'apt-get update && apt-get install -y --no-install-recommends git && \
cp -a /src /tmp/coturn && \
cd /tmp/coturn && \
cmake -S . -B build-linux -DBUILD_TESTING=ON && \
cmake --build build-linux --parallel $(nproc) && \
ctest --test-dir build-linux --output-on-failure && \
rm -rf build && ln -s build-linux build && \
cd examples && ./run_tests.sh && ./run_tests_conf.sh'
Also validate the packaged Docker image:
cd docker/coturn
make docker.image dockerfile=debian tag=codex-local platform=linux/arm64
make test.docker tag=codex-local platform=linux/arm64/v8
cd ../..
Use platform=linux/amd64 on x86_64 hosts. On Apple Silicon, build with
platform=linux/arm64 and run the Bats image tests with
platform=linux/arm64/v8, which is the spelling expected by
docker/coturn/tests/main.bats.
Code style
All C source — including src/, fuzzing/, and tests/ — must be formatted
with clang-format-15 using the project's .clang-format.
The CI job .github/workflows/clang.yml runs
make lint and fails the build on any formatting drift, so any commit
containing C/H files must be formatted before it is created.
# Format the entire repo (uses the Makefile target — equivalent to
# `find . -iname "*.c" -o -iname "*.h" | xargs clang-format -i`):
make format
# Verify formatting matches CI (zero output = clean):
make lint
Mandatory pre-commit step for any session that edits C/H files:
find . -iname "*.c" -o -iname "*.h" | xargs clang-format -i
Run this before git commit whenever the staged diff touches *.c or *.h,
even when only one file was edited. The find form above does not require
./configure to have been run, so it works in worktrees and fresh clones.
Key style rules (LLVM-based):
- Indent: 2 spaces, no tabs
- Column limit: 120
- Pointer alignment: right (
int *p) - Brace style: attach (K&R)
- Zero-initialize stack buffers at declaration:
uint8_t buf[N] = {0}orSomeStruct s = {0}
Tests
# Protocol conformance (RFC 5769 test vectors)
cd examples && ./scripts/rfc5769.sh
# Basic TURN relay test (run server first, then client)
cd examples && ./scripts/basic/relay.sh
cd examples && ./scripts/basic/udp_c2c_client.sh
# Full test suite
cd examples && ./run_tests.sh
Load Test on DigitalOcean
Use two same-region CPU-optimized droplets for repeatable load tests. The last
known setup used Ubuntu 24.04 c-4 droplets in nyc1:
- turnserver droplet private IP:
10.116.0.2 - loadgen droplet private IP:
10.116.0.3 - current public IPs: turnserver
157.230.3.102, loadgen167.99.153.216 - build: current branch archived with
git archive - important baseline: turnserver is not run with
--udp-recvmmsg --udp-recvmmsgis opt-in and covers both UDP listener receive and plain connected relay UDP receive on Linux; DTLS session sockets still use the SSL read path- add
--udp-recvmmsg-logwith--udp-recvmmsgto logudp-recvmmsg statsevery 10 seconds; use those lines to check batch occupancy per relay thread
Never paste DigitalOcean tokens into logs or files. Use a local environment
variable such as DIGITALOCEAN_TOKEN, and revoke temporary tokens after the
run.
Local source package and upload:
git archive --format=tar HEAD -o /tmp/coturn.tar
scp /tmp/coturn.tar root@TURN_PUBLIC_IP:/root/coturn.tar
scp /tmp/coturn.tar root@LOADGEN_PUBLIC_IP:/root/coturn.tar
Install dependencies and build on both droplets:
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y build-essential cmake pkg-config libssl-dev libevent-dev \
libsqlite3-dev libhiredis-dev git iproute2 sysstat
rm -rf /root/coturn
mkdir /root/coturn
tar -xf /root/coturn.tar -C /root/coturn
cmake -S /root/coturn -B /root/coturn/build -DCMAKE_BUILD_TYPE=Release
cmake --build /root/coturn/build --target turnserver turnutils_uclient turnutils_peer -j$(nproc)
Start turnserver on the server droplet. This is the baseline command used for
the final run; add --udp-recvmmsg only when intentionally comparing batched
Linux UDP receive:
pkill -x turnserver || true
sysctl -w net.core.rmem_max=134217728 net.core.wmem_max=134217728 \
net.core.netdev_max_backlog=250000 || true
ulimit -n 1048576
nohup /root/coturn/build/bin/turnserver \
--use-auth-secret \
--static-auth-secret=secret \
--realm=north.gov \
--allow-loopback-peers \
--listening-ip=10.116.0.2 \
--relay-ip=10.116.0.2 \
--min-port=49152 \
--max-port=65535 \
--no-cli \
--no-tls \
--no-dtls \
--log-file=stdout \
--simple-log \
> /root/turnserver.log 2>&1 &
echo $! > /root/turnserver.pid
Start the UDP peer on the loadgen droplet:
pkill -x turnutils_peer || true
sysctl -w net.core.rmem_max=134217728 net.core.wmem_max=134217728 \
net.core.netdev_max_backlog=250000 || true
ulimit -n 1048576
nohup /root/coturn/build/bin/turnutils_peer -L 10.116.0.3 -p 3480 \
> /root/peer.log 2>&1 &
echo $! > /root/peer.pid
Optional server-side monitor, run on the turnserver droplet before each test:
cat > /root/start_monitor.sh <<'EOF'
#!/bin/bash
label=$1
pid=$(cat /root/turnserver.pid)
rm -f /root/${label}_*.txt
nohup bash -c "pidstat -h -u -r -p $pid 1 14 > /root/${label}_pidstat.txt & \
mpstat 1 14 > /root/${label}_mpstat.txt & \
sar -n DEV 1 14 > /root/${label}_sar.txt & wait" \
> /root/${label}_monitor.out 2>&1 &
echo $! > /root/${label}_monitor.pid
EOF
chmod +x /root/start_monitor.sh
Connectivity smoke from loadgen:
/root/coturn/build/bin/turnutils_uclient \
-Y packet -m 1 -n 1000 -l 120 \
-e 10.116.0.3 -r 3480 -X -g \
-u user -W secret \
10.116.0.2
Packet relay sweep from loadgen:
for m in 1 2 4 8 16 32; do
log=/root/packet_m${m}.log
timeout -s INT 12s /root/coturn/build/bin/turnutils_uclient \
-Y packet -m "$m" -l 120 \
-e 10.116.0.3 -r 3480 -X -g \
-u user -W secret \
10.116.0.2 > "$log" 2>&1 || true
tail -20 "$log"
done
For higher -m values, the load generator can finish its default work before
the timeout. Add -n 1000 when you want a longer many-connection run, and if
the final log line omits tot_recv_msgs, derive receive count from
tot_recv_bytes / message_length.
Monitored packet run:
# on turnserver
/root/start_monitor.sh packet_m1_mon
# on loadgen
timeout -s INT 12s /root/coturn/build/bin/turnutils_uclient \
-Y packet -m 1 -l 120 \
-e 10.116.0.3 -r 3480 -X -g \
-u user -W secret \
10.116.0.2 > /root/packet_m1_mon.log 2>&1 || true
Packet-only CPU profile, useful when checking the relay bottleneck. Build with
-DCMAKE_BUILD_TYPE=RelWithDebInfo if you want readable user-space symbols.
Run once without --udp-recvmmsg, then restart turnserver with
--udp-recvmmsg and rerun the same commands with labels such as
recvmmsg_off and recvmmsg_on:
# on turnserver
sysctl -w kernel.perf_event_paranoid=-1 kernel.kptr_restrict=0 || true
pid=$(cat /root/turnserver.pid)
label=recvmmsg_off
(pidstat -h -u -r -p "$pid" 1 14 > /root/${label}_pidstat.txt & \
mpstat 1 14 > /root/${label}_mpstat.txt & \
sar -n DEV 1 14 > /root/${label}_sar.txt & wait) \
> /root/${label}_monitor.out 2>&1 &
perf record -F 99 -g -p "$pid" -o /root/${label}.perf.data -- sleep 14
perf report --stdio -i /root/${label}.perf.data --no-children \
--sort comm,dso,symbol > /root/${label}_perf.report
perf report --stdio -i /root/${label}.perf.data --children \
--sort symbol,dso > /root/${label}_perf.children
# on loadgen, started about one second after perf starts
timeout -s INT 12s /root/coturn/build/bin/turnutils_uclient \
-Y packet -m 1 -l 120 \
-e 10.116.0.3 -r 3480 -X -g \
-u user -W secret \
10.116.0.2 > /root/${label}_packet_m1.log 2>&1 || true
Invalid-packet flood:
# on turnserver
/root/start_monitor.sh invalid_m1_mon
# on loadgen
timeout -s INT 12s /root/coturn/build/bin/turnutils_uclient \
-Y invalid -m 1 -l 16 \
10.116.0.2 > /root/invalid_m1_mon.log 2>&1 || true
Restart turnserver after invalid-packet tests before allocation tests. The
last run saw rapid RSS growth during invalid flood, so avoid chaining tests on
the same server process.
Allocation flood:
# on turnserver
/root/start_monitor.sh alloc_10000_mon
# on loadgen
/root/coturn/build/bin/turnutils_uclient \
-Y alloc -m 50 -n 200 \
-L 10.116.0.3 \
-u user -W secret \
10.116.0.2 > /root/alloc_10000.log 2>&1
Useful summaries:
grep -h 'send_pps=' /root/packet_m*.log /root/*_mon.log | tail -50
grep -h 'total_allocations=' /root/alloc_*.log | tail -20
ps -o pid,rss,vsz,pcpu,pmem,comm -p $(cat /root/turnserver.pid)
tail -20 /root/*_pidstat.txt
tail -20 /root/*_sar.txt
Unit tests (Unity, opt-in via BUILD_TESTING=ON)
Unity is fetched on demand via CMake FetchContent; nothing is vendored.
Tests live under tests/ and link against the existing
turnclient static library.
# CMake direct
cmake -S . -B build -DBUILD_TESTING=ON
cmake --build build -j --target check # builds tests, runs ctest
cmake --build build -j --target test_ioaddr # build a single binary
ctest --test-dir build --output-on-failure # run already-built tests
# Legacy Makefile bridge (after ./configure; requires cmake on PATH)
make unit-tests # bootstraps build/unit-tests/, builds + runs Unity tests
Adding a new test: drop tests/test_<name>.c and append
coturn_add_test(test_<name>) in tests/CMakeLists.txt.
The check target picks it up automatically.
See docs/Testing.md for database setup and extended test scenarios.
Source layout
src/
client/ # TURN client library (C)
client++/ # TURN client library (C++)
server/ # Core TURN/STUN server logic
apps/
relay/ # turnserver main process, listeners, netengine
uclient/ # CLI test client
include/turn/ # Public headers
fuzzing/ # OSS-Fuzz targets and seed corpora
examples/ # Test scripts and sample configs
turndb/ # Database schema and setup scripts
docs/ # Protocol notes and configuration docs
Common patterns
- Port types: use
uint16_tfor port fields and parameters (notint); port 0 means OS-assigned ephemeral - Buffer initialization: zero-initialize stack buffers at declaration (
= {0}), not just before first use - HMAC output buffers: declare as
uint8_t buf[MAXSHASIZE] = {0}— the buffer is written into the message before HMAC runs, so uninitialized bytes would be briefly present in the packet - Uninitialized structs: use
= {0}for stack-allocated address structs (e.g.,ioa_addr) - Counter overflow in
turn_ports.c:_turnportsusesuint32_t low/highcounters; comparisons must be overflow-safe (use subtraction, not>=) - Port bounds checks: use
<= USHRT_MAX(not< USHRT_MAX) when validating that anintholds a valid port — port 65535 is valid - Error handling: check return values of all OpenSSL/libevent calls; use
ERR_clear_error()before HMAC operations - Logging: use
TURN_LOG_FUNCmacros, notfprintf/perror